NOT IN vs NOT EXISTS: A Silent SQL Mistake

1mo

One SQL mistake that can silently break your results: 👉 NOT IN vs NOT EXISTS Looks similar. But behaves very differently. Example: SELECT * FROM orders WHERE customer_id NOT IN (SELECT customer_id FROM customers); 💡 Seems correct, right? But here’s the problem 👇 👉 If the subquery contains even one NULL → The entire result becomes empty ❌ You get no rows Even when data exists 💡 Why? SQL doesn’t know how to compare NULL → Result becomes UNKNOWN 💡 Better approach: SELECT * FROM orders o WHERE NOT EXISTS ( SELECT 1 FROM customers c WHERE o.customer_id = c.customer_id ); 👉 NOT EXISTS handles NULLs safely 💡 Real-world impact: Missing data in reports Incorrect filtering Hard-to-debug issues Lesson: 👉 Similar-looking queries ≠ same behavior 👉 Always think about NULLs If you understand this, you’re already ahead of many developers. Follow for more practical SQL insights. 🙂 #SQL #DataEngineering #Learning #Analytics

To view or add a comment, sign in

More Relevant Posts

Trisha Ghosh
2w
Report this post
Most SQL mistakes don’t come from syntax.They come from not understanding the data. One thing I learned early while working on customer datasets at scale: Before writing ANY complex SQL… I always spend 5 to 10 minutes just understanding the data. Here’s my quick checklist: 👉 SELECT COUNT(*) → How big is the dataset? 👉 SELECT COUNT(DISTINCT customer_id) → Unique entities 👉 GROUP BY key columns → Any unexpected duplicates? 👉 WHERE column IS NULL → Missing data check 👉 LIMIT 10 → Sanity check rows It sounds basic, but this habit has saved me from: • incorrect aggregations • duplicate joins • wrong business conclusions • and painfully long debugging sessions In real-world analytics, wrong insights are worse than no insights. Clean thinking > complex queries. Curious , what’s one SQL habit that saved you time? #SQL #DataAnalytics #DataScienceTips #AnalyticsLife #DataEngineering #LearnSQL #WomenInTech #DataCareer
Like Comment
To view or add a comment, sign in
Aastha Ahirkar
2w
Report this post
📊 𝐂𝐎𝐔𝐍𝐓 𝐢𝐧 𝐒𝐐𝐋: 𝐒𝐦𝐚𝐥𝐥 𝐃𝐞𝐭𝐚𝐢𝐥, 𝐁𝐢𝐠 𝐈𝐦𝐩𝐚𝐜𝐭 Counting seems like the easiest operation in SQL. But this is exactly where many analyses quietly go wrong. 𝐂𝐎𝐔𝐍𝐓(*) 𝐜𝐨𝐮𝐧𝐭𝐬 𝐚𝐥𝐥 𝐫𝐨𝐰𝐬. 𝐂𝐎𝐔𝐍𝐓(𝐜𝐨𝐥𝐮𝐦𝐧) 𝐜𝐨𝐮𝐧𝐭𝐬 𝐨𝐧𝐥𝐲 𝐧𝐨𝐧-𝐍𝐔𝐋𝐋 𝐯𝐚𝐥𝐮𝐞𝐬. At first, the difference feels small. In real data, it’s not. 💡𝐖𝐡𝐚𝐭 𝐚𝐜𝐭𝐮𝐚𝐥𝐥𝐲 𝐡𝐚𝐩𝐩𝐞𝐧𝐬? In most datasets, missing values (NULLs) are common. When you use COUNT(column), SQL automatically ignores those NULLs. • You’re no longer counting rows. • You’re counting available values. And that difference matters more than it seems. ⚠️𝐖𝐡𝐲 𝐭𝐡𝐢𝐬 𝐜𝐫𝐞𝐚𝐭𝐞𝐬 𝐩𝐫𝐨𝐛𝐥𝐞𝐦𝐬 • KPIs get undercounted • Conversion rates become inaccurate • Data completeness is misunderstood 𝐄𝐱𝐚𝐦𝐩𝐥𝐞: If 100 users exist but only 80 have values, COUNT(column) = 80 👉 It may look like only 80 records exist — but that’s not true. 🚀𝐖𝐡𝐚𝐭 𝐚 𝐠𝐨𝐨𝐝 𝐚𝐧𝐚𝐥𝐲𝐬𝐭 𝐝𝐨𝐞𝐬 • Understands the data before counting • Checks for NULL values explicitly • Chooses COUNT logic based on the problem #SQL #DataAnalytics #DataAnalyst #LearningSQL #SQLConcepts #DataCleaning
Like Comment
To view or add a comment, sign in
Dhrumil Pitaliya
5d
Report this post
I used to write SELECT * in almost every SQL query. Honestly, I never thought much about it it worked, so I just kept doing it. But after working with bigger datasets, I started noticing things getting slower and harder to manage. That’s when I realized: I was pulling way more data than I actually needed Now instead of: SELECT * FROM sales; I write only what’s required: SELECT customer_id, revenue FROM sales; It may look like a small change, but it actually made a difference: ✔ Queries run faster ✔ Data is easier to read ✔ Less unnecessary load Still learning to write better SQL step by step. Curious did you also start with SELECT * or just me? #SQL #DataAnalytics #LearningJourney #Performance
Like Comment
To view or add a comment, sign in
Sagar Nalawade
1mo
Report this post
🗓️ SQL Challenge Day #21: Immediate Food Delivery II 🔹 Calculate percentage of immediate first orders! 🚚 🔹 Problem: Find percentage of immediate orders in first orders: ✅ Immediate = order_date = customer_pref_delivery_date ✅ First order = earliest order per customer ✅ Round to 2 decimal places 🔹 Solution: SELECT ROUND(AVG(CASE WHEN order_date = customer_pref_delivery_date THEN 1 ELSE 0 END) * 100.0, 2) AS immediate_percentage FROM ( SELECT MIN(order_date) AS order_date, MIN(customer_pref_delivery_date) AS customer_pref_delivery_date FROM delivery GROUP BY customer_id ) t; ✅ Result: Accepted 💡 Key Takeaway: **MIN() in subquery + AVG(CASE)** is perfect for "first occurrence" problems! The subquery isolates first orders, while AVG(CASE) elegantly calculates the percentage without separate counts. 👇 Your turn: What's the most interesting real-world metric you've calculated using window functions or subqueries? #SQL #LeetCode #DataEngineering #ProblemSolving #Coding #LearningInPublic #Database #DataAnalytics
Like Comment
To view or add a comment, sign in
Sapna Nimbalkar
1w Edited
Report this post
📌 This SQL query looks 100% correct… but returns ZERO rows No error. No warning. Just empty results. At first glance, nothing looked wrong 👇 SELECT CustomerID FROM Customers WHERE CustomerID NOT IN ( SELECT CustomerID FROM Orders ); Everything checked out: ✔ Data ✔ Logic ✔ Joins Still… EMPTY result. 🔍 The hidden culprit => NULL Just one NULL in the subquery can break everything. What SQL actually does internally Subquery returns: (2, 3, NULL) SQL interprets this as: CustomerID <> 2 AND CustomerID <> 3 AND CustomerID <> NULL Now here’s the catch 👇 1 <> NULL → UNKNOWN So the condition becomes: TRUE AND TRUE AND UNKNOWN = UNKNOWN And SQL behaves like this: TRUE → keep FALSE → remove UNKNOWN → also remove 💥 Result → EMPTY DATASET Even when valid rows exist. And that’s how one NULL can silently invalidate your entire dataset. ✅ The safer approach → NOT EXISTS SELECT c.CustomerID FROM Customers c WHERE NOT EXISTS ( SELECT 1 FROM Orders o WHERE o.CustomerID = c.CustomerID ); ✔ Works row-by-row ✔ Stop early (better performance) ✔ NULL doesn’t break logic 🔥 Learning NOT IN doesn’t fail loudly… it fails silently. 💡 Rule to follow: - Default → NOT EXISTS - Use NOT IN → only when you're 100% sure NO NULLs exist #SQL #SQLServer #DataAnalytics

19 Comments
Like Comment
To view or add a comment, sign in
Victoria Muchoki
2w
Report this post
Day 4 of posting about Data Analytics. Stop writing the same SQL queries over and over. It’s time to let Stored Procedures do the heavy lifting. ⚡ If you’re still sending long, repetitive scripts to your database, you’re missing out on one of the best ways to streamline your workflow. Think of a Stored Procedure as a "saved recipe" for your data.write it once, call it whenever you need it. Why should you care? Speed: They are pre-compiled, meaning the database executes them faster. Security: You can grant access to the procedure without exposing the raw tables. Efficiency: One command (CALL MyProcedure) replaces lines and lines of code. Consistency: Change the logic in one place, and it updates everywhere. Whether you're identifying "High-Value Orders" or automating monthly reports, stored procedures turn manual tasks into a one-click process. Below is an image showing a Stored procedure I created today. #DataAnalytics #DataScience #SQL #Buildinginpublic
Like Comment
To view or add a comment, sign in
Harish Chatla
1w Edited
Report this post
THIS QUERY LOOKS CORRECT. IT IS NOT. Most people think this query is correct. It runs. It returns results. But the logic is completely broken. Business problem: Find the latest product review for each customer based on their most recent completed order. Tables involved: orders - order_id - customer_id - order_date payments - payment_id - order_id - status - payment_date reviews - review_id - order_id - review_text - review_date At first glance, the logic seems simple: Join orders → payments → reviews and pick the latest order per customer. But real data doesn’t behave like that. - One order can have multiple payments - One order can have multiple reviews (edits / updates) - Joins create duplicate rows - “Latest” becomes ambiguous if not handled carefully So even if your query runs, you might be picking the wrong review. Think before answering: Are you selecting the latest order? Or the latest review? Or a random row created by joins? Fix the logic, not just the syntax. Comment your answer. Repost if this made you think. Follow Harish Chatla more real-world data problems. Subscribe to practice on our platform. #DataRejected #SQL #DataEngineering #DataAnalytics #DataScience #LearnByDoing #TechCareers #Analytics #CodingPractice
11 Comments
Like Comment
To view or add a comment, sign in
Sanjay Gatti
2w
Report this post
🧠 SQL Challenge 4/100 (IN 🆚 EXISTS) 🚀 🚀 Find customers who never placed any order 👉 Return: customer_id, name ⚠️ Catch There is a NULL in Orders.customer_id A very common approach will return 0 rows 😶 🔥 Caption This looks like a basic question… but one NULL breaks most solutions. If your query uses NOT IN, double check it 👀 Do you know the correct way? Drop your answer 👇 #SQL #DataEngineering #LearnSQL #Analytics #TechCareers
4 Comments
Like Comment
To view or add a comment, sign in
Aman Gambhir
2w
Report this post
𝗖𝗼𝗺𝗺𝗼𝗻 𝗦𝗤𝗟 𝗠𝗶𝘀𝘁𝗮𝗸𝗲𝘀 (𝗮𝗻𝗱 𝗛𝗼𝘄 𝘁𝗼 𝗙𝗶𝘅 𝗧𝗵𝗲𝗺) Over time, I’ve seen a few SQL mistakes that can silently break logic or performance. Here are some common ones and how to avoid them: 1. 𝗙𝗼𝗿𝗴𝗲𝘁𝘁𝗶𝗻𝗴 𝘁𝗵𝗲 𝗪𝗛𝗘𝗥𝗘 𝗖𝗹𝗮𝘂𝘀𝗲 Running 𝗗𝗘𝗟𝗘𝗧𝗘 or 𝗨𝗣𝗗𝗔𝗧𝗘 without a 𝗪𝗛𝗘𝗥𝗘 clause can wipe out entire tables. Always double-check your conditions and use transactions when working with critical data. One small miss can lead to massive data loss. 2. 𝗢𝘃𝗲𝗿𝘂𝘀𝗶𝗻𝗴 𝗦𝗘𝗟𝗘𝗖𝗧 * Using 𝗦𝗘𝗟𝗘𝗖𝗧 * fetches unnecessary columns, slows down queries, and makes code less readable. Instead, select only the columns you need—it improves performance and keeps queries future-proof. 3. 𝗖𝗼𝗺𝗽𝗮𝗿𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝗡𝗨𝗟𝗟 𝗜𝗻𝗰𝗼𝗿𝗿𝗲𝗰𝘁𝗹𝘆 𝗡𝗨𝗟𝗟 is not a value, so = 𝗡𝗨𝗟𝗟 won’t work. Always use 𝗜𝗦 𝗡𝗨𝗟𝗟 or 𝗜𝗦 𝗡𝗢𝗧 𝗡𝗨𝗟𝗟. This ensures correct filtering and avoids unexpected empty results. 4. 𝗚𝗿𝗼𝘂𝗽𝗶𝗻𝗴 𝗜𝘀𝘀𝘂𝗲𝘀 𝗶𝗻 𝗦𝗘𝗟𝗘𝗖𝗧 Every non-aggregated column in your 𝗦𝗘𝗟𝗘𝗖𝗧 must be in the 𝗚𝗥𝗢𝗨𝗣 𝗕𝗬. Ignoring this leads to errors or incorrect results. Follow SQL standards for clean and accurate aggregation. 5. 𝗜𝗻𝗰𝗼𝗿𝗿𝗲𝗰𝘁 𝗚𝗥𝗢𝗨𝗣 𝗕𝗬 𝗨𝘀𝗮𝗴𝗲 Grouping without proper structure can make your results confusing. Use meaningful groupings and ensure your query clearly reflects the business logic behind the data. 6. 𝗠𝗶𝘀𝘀𝗶𝗻𝗴 𝗣𝗮𝗿𝗲𝗻𝘁𝗵𝗲𝘀𝗲𝘀 𝗶𝗻 𝗖𝗼𝗺𝗽𝗹𝗲𝘅 𝗟𝗼𝗴𝗶𝗰 When combining 𝗔𝗡𝗗 and 𝗢𝗥, operator precedence can change results. Always use parentheses to define logic explicitly; it improves readability and prevents logical bugs. 💡 𝗙𝗶𝗻𝗮𝗹 𝗧𝗵𝗼𝘂𝗴𝗵𝘁: Small SQL mistakes can lead to big data issues. Writing clean, intentional queries is just as important as getting the result. If you’ve faced similar issues, I would love to hear your experiences 👇 Follow Aman Gambhir for more content like this. #SQL #sqltips #sqlquery #query #sqlmistakes #optimization
1 Comment
Like Comment
To view or add a comment, sign in
John Christoper Carlos
2w
Report this post
For months, every "per customer" metric I needed started the same way: GROUP BY, then a self-join back to the table to get the rows I actually wanted. Then a senior looked at my 40-line query and said: "You know PARTITION BY exists, right?" That was the AHA. GROUP BY collapses rows. PARTITION BY keeps them. The trap: you were ask "what's the average order value per customer, next to each order"? Your brain jumps to GROUP BY. But GROUP BY can't show the order AND the average in the same row — so you aggregate, join back, filter, and pray that there is no duplicates. PARTITION BY does it in one line. A few things I wish I'd understood sooner: → GROUP BY answers "one row per group". → PARTITION BY answers "every row, with its group's detail attached". → If you're GROUP BY-ing and joining back to the same table — stop. That's a window function. What's a SQL pattern you used to over-engineer before realizing there was a one-line version? #DataAnalytics #SQL #WindowFunctions #DataAnalyst
Like Comment
To view or add a comment, sign in

2,355 followers

43 Posts

View Profile Follow

NOT IN vs NOT EXISTS: A Silent SQL Mistake

More Relevant Posts

Explore content categories