Common SQL Mistakes That Crash Production and How to Fix Them

🚨 Common SQL Mistakes That CRASH Production (And How to Fix Them) 🚨 As a Data Analyst with 5+ years optimizing queries at scale, I've seen these SQL blunders cause prod failures, slow dashboards, and endless firefighting. Here's my top 7 that bite hardest – with fixes to bulletproof your code. 1. SELECT * Everywhere Pulls unnecessary columns, bloating memory and breaking when schemas change. ✅ Fix: SELECT order_id, customer_name FROM orders; – explicit columns only. 2. Missing WHERE in UPDATE/DELETE The ultimate prod killer – wipes entire tables accidentally. ✅ Fix: Always test with SELECT first, then add WHERE. Use transactions: BEGIN TRANSACTION; 3. Functions on Indexed Columns WHERE YEAR(order_date) = 2025 kills indexes, forces full scans. ✅ Fix: WHERE order_date >= '2025-01-01' AND order_date < '2026-01-01' 4. NOT IN with NULLs Subquery has NULL? Entire result vanishes silently. ✅ Fix: Use NOT EXISTS or LEFT JOIN WHERE alias.col IS NULL 5. No Indexes on JOIN/WHERE Columns Fine in dev, crawls in prod with real data. ✅ Fix: Index foreign keys, frequent filters: CREATE INDEX idx_order_date ON orders(date); 6. Subqueries vs JOINs Correlated subs run per row – N+1 hell. ✅ Fix: Rewrite as JOINs for massive speedups. 7. DISTINCT Overuse Masks dupes but sorts everything, tanks perf. ✅ Fix: Fix root cause with proper GROUP BY or DISTINCT ON (Postgres). Pro Tip: Always check execution plans before prod. What's your worst SQL war story? 👇 #SQL #DataEngineering #Database #PowerBI #DataAnalytics #TechTips

To view or add a comment, sign in

More Relevant Posts

Fimijoba Micheal Oladokun
1w
Report this post
LEAD and LAG are two of the most underused window functions in SQL and once you understand them, you will wonder how you ever wrote self-joins to access adjacent row values. LAG looks backward. LEAD looks forward. Together they give you the complete picture of how values change across a sequence without joins, without subqueries, and without the performance overhead that comes with them. Days until next event, forward-looking gap analysis, detecting upcoming threshold crossings — all clean, readable, single-query solutions with LEAD. If you are still writing self-joins to access the next row, it is time to make the switch. Read the full post here: https://lnkd.in/eU-6hqz6 #SQL #DataEngineering #DataAnalysis #Database #Analytics #WindowFunctions

SQL LEAD Function Explained Step by Step https://codewithfimi.com
Like Comment
To view or add a comment, sign in
Fimijoba Micheal Oladokun
1w
Report this post
Before window functions, comparing a row to its predecessor in SQL required self-joins or correlated subqueries — complex, slow, and hard to maintain. The LAG function changed that completely. One function call gives you the previous row's value, the ability to partition by group, a configurable offset for looking back multiple rows, and a default value for when no previous row exists. Month-over-month growth, consecutive drop detection, time between events all solved cleanly with LAG. If you are still writing self-joins to access previous row data, it is time to make the switch. Read the full post here: https://lnkd.in/ee58uE4F #SQL #DataEngineering #DataAnalysis #Database #Analytics #WindowFunctions

SQL LAG Function Example With Real Dataset https://codewithfimi.com
Like Comment
To view or add a comment, sign in
Chamathka Maddugoda
3w Edited
Report this post
Do you use GROUP BY to find duplicates in SQL? That’s usually the first thing most of us learn and it works well to detect duplicates. But here’s something we get stuck: 👉 What if you actually need to remove duplicates? 👉 How do you identify which rows are the exact duplicates? GROUP BY won’t help much there as it only gives counts, not row-level detail. To handle this properly, you need a way to work at the row level and that’s where ROW_NUMBER() with PARTITION BY becomes useful. I’ve written a short 2-minute tech blog explaining this with a simple example.If you're learning SQL or working with real datasets, this might be useful 👇 #SQL #ROW_NUMBER() #Tech_blog

Filtering Duplicate Rows in SQL with ‘ROW_NUMBER()’ medium.com

2 Comments
Like Comment
To view or add a comment, sign in
HARSH ARORA
2w
Report this post
(12-04-2026) From Data Entry to Data Analytics It was a complete deep dive into the "Analytical Power" of SQL. I’ve moved past just retrieving rows and started performing complex calculations and data transformations. 📈 The goal today was Manipulation & Aggregation. I wanted to learn how to take raw, messy data and turn it into a structured report. Here’s the toolkit I mastered today: 1. Organizing the Output (Ordering) ORDER BY: Learned how to sort my results in ASC (Ascending) or DESC (Descending). It’s simple, but essential for making data readable. 2. The Function Library (Transformation) I explored the built-in functions that allow me to modify data on the fly: String Functions: CONCAT, LOWER, UPPER, TRIM, SUBSTRING, REPLACE, LENGTH, and LEFT/RIGHT. 🔠 Numeric Functions: ABS, ROUND, CEIL, FLOOR, POW, SQRT, and MOD. 🔢 3. Data Summarization (Aggregates) This is where the real power lies. I mastered the "Big 5" Aggregate functions: COUNT(), SUM(), AVG(), MIN(), and MAX(). 4. The Analytics Duo: GROUP BY & HAVING This was the highlight of the day. GROUP BY: I can now categorize data to see the "big picture" (e.g., total sales per city or average grade per class). HAVING: I learned why we can't use WHERE with aggregate functions and mastered HAVING to filter my grouped data. It’s one thing to see 10,000 rows; it’s another thing to summarize them into 5 meaningful insights in a single query. #SQL #DataAnalytics #DataScience #MySQL #GroupBy #CodingLife #Day4 #RelationalDatabases

1 Comment
Like Comment
To view or add a comment, sign in
Akshita Goel
3w
Report this post
NULLs in SQL Joins: The Interview Question That Trips Everyone 😵💫 If you’ve ever written a SQL query that looked perfect… but the output still felt off — this might be why. 👉 The culprit: NULL 🔍 The Problem You write a clean join: SELECT * FROM orders o LEFT JOIN users u ON o.user_id = u.id Looks correct, right? But suddenly: Some rows don’t match Data goes missing Counts feel off No errors. No warnings. Just wrong results. 💣 The Silent Killer: NULL In SQL, NULL doesn’t behave like a normal value. NULL = NULL → UNKNOWN NULL = 5 → UNKNOWN 👉 Not TRUE 👉 Not FALSE 👉 Just… ignored 🔗 Why this breaks joins If your join key contains NULL: o.user_id = u.id And o.user_id is NULL: 👉 The condition becomes UNKNOWN 👉 The join FAILS silently 👉 You get NULLs on the right side 😬 Real Impact Missing city names Incomplete mappings Wrong aggregations Misleading dashboards And the worst part? You don’t even realize it’s happening. 🧠 Key Learnings ✅ NULL ≠ NULL ✅ Use IS NULL, not = NULL ✅ NULL never matches in joins ✅ LEFT JOIN can hide issues ✅ Always validate join keys 💡 Pro Tip Before trusting your data: 👉 Check NULLs in join columns 👉 Validate mapping coverage 👉 Use COALESCE() when needed 🚀 Final Thought SQL doesn’t fail loudly. It fails silently. And NULL is one of the biggest reasons why. If you're working with data, mastering NULL handling isn’t optional — it's essential. #SQL #DataAnalytics #DataEngineering #Analytics #LearningSQL
Like Comment
To view or add a comment, sign in
SHAILESH KUSHWAHA
2w
Report this post
The Only SQL Cheat Sheet You'll Ever Need 🗄️ SQL is the backbone of data analytics — and mastering it means knowing more than just SELECT * FROM table. Here's a complete breakdown of every SQL concept category, from basics to advanced. Bookmark this. 🧵 ⚙️ The Basics Core clauses: SELECT FROM WHERE GROUP BY HAVING ORDER BY LIMIT Operators: = != < >= BETWEEN IN NOT ∑ Aggregate Functions min() max() avg() count() median() mode() stddev() Use with: GROUP BY HAVING DISTINCT 🔤 String Manipulation concat() replace() reverse() trim() upper() lower() len() str() Pattern matching: LIKE ILIKE wildcards % 📅 Date Manipulation day() month() year() getdate() date_add() datediff() date_trunc() date_format() — format output precisely 🔗 Joins INNER LEFT OUTER SELF joins ANTI JOIN — find non-matching rows Join on multiple keys or a condition 🧹 Cleaning & Transformation cast() coalesce() ifnull() iif() CASE WHEN — conditional logic in queries UNION UNION ALL INTERSECT MINUS 🪟 Window Functions Aggregates: sum() count() avg() max() min() Ranking: row_number() rank() denserank() Offset: lead() lag() with OVER(PARTITION BY... ORDER BY...) 🧠 Advanced SQL CTEs — Common Table Expressions for readable, modular queries Subqueries — correlated vs. uncorrelated; nested logic inside queries UDFs — User Defined Functions to reuse custom logic Data Modeling — structuring tables for performance and scalability 💡 The real SQL progression: Basics → Aggregates → Joins → Window Functions → CTEs & Advanced. Most analysts stop at Joins. Go further — Window Functions alone will set you apart in 90% of interviews. Which SQL category do you use most in your day-to-day work? Drop it in the comments 👇 — and save this post so you always have the reference handy! #SQL #DataAnalytics #DataScience #DataEngineering #WindowFunctions #DatabaseManagement #TechCareer #LearnSQL #BigData #Analytics
Like Comment
To view or add a comment, sign in
Aman Gambhir
2w
Report this post
𝗖𝗼𝗺𝗺𝗼𝗻 𝗦𝗤𝗟 𝗠𝗶𝘀𝘁𝗮𝗸𝗲𝘀 (𝗮𝗻𝗱 𝗛𝗼𝘄 𝘁𝗼 𝗙𝗶𝘅 𝗧𝗵𝗲𝗺) Over time, I’ve seen a few SQL mistakes that can silently break logic or performance. Here are some common ones and how to avoid them: 1. 𝗙𝗼𝗿𝗴𝗲𝘁𝘁𝗶𝗻𝗴 𝘁𝗵𝗲 𝗪𝗛𝗘𝗥𝗘 𝗖𝗹𝗮𝘂𝘀𝗲 Running 𝗗𝗘𝗟𝗘𝗧𝗘 or 𝗨𝗣𝗗𝗔𝗧𝗘 without a 𝗪𝗛𝗘𝗥𝗘 clause can wipe out entire tables. Always double-check your conditions and use transactions when working with critical data. One small miss can lead to massive data loss. 2. 𝗢𝘃𝗲𝗿𝘂𝘀𝗶𝗻𝗴 𝗦𝗘𝗟𝗘𝗖𝗧 * Using 𝗦𝗘𝗟𝗘𝗖𝗧 * fetches unnecessary columns, slows down queries, and makes code less readable. Instead, select only the columns you need—it improves performance and keeps queries future-proof. 3. 𝗖𝗼𝗺𝗽𝗮𝗿𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝗡𝗨𝗟𝗟 𝗜𝗻𝗰𝗼𝗿𝗿𝗲𝗰𝘁𝗹𝘆 𝗡𝗨𝗟𝗟 is not a value, so = 𝗡𝗨𝗟𝗟 won’t work. Always use 𝗜𝗦 𝗡𝗨𝗟𝗟 or 𝗜𝗦 𝗡𝗢𝗧 𝗡𝗨𝗟𝗟. This ensures correct filtering and avoids unexpected empty results. 4. 𝗚𝗿𝗼𝘂𝗽𝗶𝗻𝗴 𝗜𝘀𝘀𝘂𝗲𝘀 𝗶𝗻 𝗦𝗘𝗟𝗘𝗖𝗧 Every non-aggregated column in your 𝗦𝗘𝗟𝗘𝗖𝗧 must be in the 𝗚𝗥𝗢𝗨𝗣 𝗕𝗬. Ignoring this leads to errors or incorrect results. Follow SQL standards for clean and accurate aggregation. 5. 𝗜𝗻𝗰𝗼𝗿𝗿𝗲𝗰𝘁 𝗚𝗥𝗢𝗨𝗣 𝗕𝗬 𝗨𝘀𝗮𝗴𝗲 Grouping without proper structure can make your results confusing. Use meaningful groupings and ensure your query clearly reflects the business logic behind the data. 6. 𝗠𝗶𝘀𝘀𝗶𝗻𝗴 𝗣𝗮𝗿𝗲𝗻𝘁𝗵𝗲𝘀𝗲𝘀 𝗶𝗻 𝗖𝗼𝗺𝗽𝗹𝗲𝘅 𝗟𝗼𝗴𝗶𝗰 When combining 𝗔𝗡𝗗 and 𝗢𝗥, operator precedence can change results. Always use parentheses to define logic explicitly; it improves readability and prevents logical bugs. 💡 𝗙𝗶𝗻𝗮𝗹 𝗧𝗵𝗼𝘂𝗴𝗵𝘁: Small SQL mistakes can lead to big data issues. Writing clean, intentional queries is just as important as getting the result. If you’ve faced similar issues, I would love to hear your experiences 👇 Follow Aman Gambhir for more content like this. #SQL #sqltips #sqlquery #query #sqlmistakes #optimization
1 Comment
Like Comment
To view or add a comment, sign in
Sandhya Paghdar
1mo
Report this post
A lot of you have been reaching out asking where to start with data and SQL. So I wrote this down. Most people learn INNER JOIN, LEFT JOIN, maybe a SELF JOIN - and stop there. That's enough to pass an interview. It's not enough to build real pipelines. The joins that actually matter in production? The ones that break your data when you get them wrong? Those deserve a deeper look. I covered all of it in my latest blog - the full spectrum of SQL joins, when to use each one, and the edge cases nobody warns you about. Check out the blog Link in the comment section. If you're starting out or levelling up - this one's for you. Comment "JOINS" below and I'll drop practice questions your way. And tell me - what do you want me to go on next? Window functions? Indexing? Query optimization? CTEs? or something else, I’m writing for you. ↓ Drop it in the comments. #SQL #DataEngineering #DataCareer #Analytics #

The SQL Playbook: Mastering Basics & Every Join You’ll Ever Need medium.com

2 Comments
Like Comment
To view or add a comment, sign in
leepa mishra
2w
Report this post
SQL is the one skill every data engineer needs — regardless of your stack. Here are 3 tricks I use constantly. 🔥 𝟭. Window Functions Instead of joining aggregated subqueries, use OVER() to calculate rankings, running totals, and moving averages without collapsing your rows. ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) This gives you the latest order per customer — in one clean query. 𝟮. CTEs (Common Table Expressions) Stop nesting subquery inside subquery. CTEs make your SQL readable AND debuggable. WITH cleaned AS (SELECT * FROM raw WHERE status = 'active') SELECT * FROM cleaned WHERE amount > 100 𝟯. CASE WHEN for inline logic Instead of multiple queries for different conditions, use CASE WHEN to categorize data in a single pass. CASE WHEN revenue > 10000 THEN 'High' WHEN revenue > 5000 THEN 'Mid' ELSE 'Low' END AS tier These three alone will make your queries faster, cleaner, and easier to maintain. Save this post for your next SQL interview! 💾 #SQL #DataEngineering #DataAnalysis #TechTips
Like Comment
To view or add a comment, sign in
Kavita Kumari
1w
Report this post
SQL CASE Statements — Conditional Logic Inside Your Query Most analysts I know write three separate queries to segment data and then manually label each result in Excel. Stop doing that. ✋ CASE statements are the SQL version of if/else logic. They allow you to apply conditional categorization directly in your query—no code, no manual work, no exporting. SQL: SELECT customer_id, total_spent, CASE WHEN total_spent >= 500 THEN 'High Value' WHEN total_spent >= 100 THEN 'Mid Tier' ELSE 'Low Value' END AS customer_segment FROM orders; - Top-down evaluation: The query stops at the first match. - ELSE: Handles everything that doesn’t meet your conditions. Real-world use case: Flagging Delivery Status SQL: SELECT order_id, CASE WHEN delivered_at IS NULL AND created_at < NOW() - INTERVAL '7 days' THEN 'Overdue' WHEN delivered_at IS NULL THEN 'Pending' ELSE 'Delivered' END AS delivery_status FROM orders; One query. Clean labels. Ready to aggregate or filter immediately. Where can you use it? The power of CASE is that it works anywhere a column reference works: - SELECT: Create new label columns. - WHERE: Filter based on conditional logic. - GROUP BY: Group by derived categories. - SUM(CASE WHEN ...): Conditional aggregation (a game-changer for pivot-style tables). The day I stopped exporting data to Excel to add "segment" columns was the day CASE statements finally clicked for me. #SQL #DataAnalytics #DataScience #BerlinTech #Analytics #DailyTips
Like Comment
To view or add a comment, sign in

1,077 followers

19 Posts

View Profile Connect

Common SQL Mistakes That Crash Production and How to Fix Them

More Relevant Posts

Explore content categories