Divyanshi Garg’s Post

I reviewed 200 SQL submissions from data engineering candidates last year. 90% had the same problem — and it wasn't wrong answers. They were writing SQL to get results. Senior engineers write SQL their teammates can debug at 3am during an incident. That's the gap nobody talks about. These are the 7 patterns that make the difference: 𝟬𝟭 — 𝗪𝗶𝗻𝗱𝗼𝘄 𝗳𝘂𝗻𝗰𝘁𝗶𝗼𝗻𝘀 — stop writing subqueries that run once per row. SUM() OVER (PARTITION BY...) does it in one scan. 𝟬𝟮 — 𝗟𝗔𝗚 / 𝗟𝗘𝗔𝗗 — stop self-joining tables to compare rows. Two lines of window syntax replaces 12 lines of JOIN logic. 𝟬𝟯 — 𝗚𝗮𝗽𝘀 & 𝗜𝘀𝗹𝗮𝗻𝗱𝘀 — date minus ROW_NUMBER creates a constant for consecutive dates. This one pattern solves 80% of streak problems. 𝟬𝟰 — 𝗖𝗼𝗻𝗱𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗮𝗴𝗴𝗿𝗲𝗴𝗮𝘁𝗶𝗼𝗻 — COUNT(DISTINCT CASE WHEN channel='paid' THEN user_id END) gives you a full pivot in one scan, zero PIVOT syntax. 𝟬𝟱 — 𝗦𝗺𝗮𝗿𝘁 𝗱𝗲𝗱𝘂𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻 — never SELECT DISTINCT in production. ROW_NUMBER() OVER (PARTITION BY id ORDER BY updated_at DESC) encodes your business rule. 𝟬𝟲 — 𝗥𝗲𝗰𝘂𝗿𝘀𝗶𝘃𝗲 𝗖𝗧𝗘 — org trees, hierarchies, graph traversal. Always add WHERE depth < N. Without it, cyclic data crashes your job every time. 𝟬𝟳 — 𝗦𝗲𝘀𝘀𝗶𝗼𝗻𝗶𝘀𝗮𝘁𝗶𝗼𝗻 — LAG detects the inactivity gap. Cumulative SUM assigns the session ID. Two window functions. One scan. No self-join. The real insight: Every one of these replaces a slow, hard-to-read subquery or self-join with a single readable window function. 𝗧𝗵𝗮𝘁 𝗶𝘀 𝘄𝗵𝗮𝘁 𝘀𝗲𝗻𝗶𝗼𝗿𝘀 𝗿𝗲𝘃𝗶𝗲𝘄 𝗳𝗼𝗿. 𝗡𝗼𝘁 𝗰𝗼𝗿𝗿𝗲𝗰𝘁𝗻𝗲𝘀𝘀. 𝗥𝗲𝗮𝗱𝗮𝗯𝗶𝗹𝗶𝘁𝘆 𝗮𝘁 𝘀𝗰𝗮𝗹𝗲. Save this image before your next SQL interview or code review. Which of these 7 do you still reach for last — and which one completely changed how you write SQL? Drop it in the comments 👇 #DataEngineering #SQL #DataEngineer #WindowFunctions #SQLInterview

  • graphical user interface, website

I agree with number 1. Number 2 is correct for simple comparisons, but for more complex comparisons a self join or even a cross apply are inevitable. Didn’t understand 3 and 4 - I’ll be happy for an example please. 5 - how does that de duplicate rows? It just shows the last updated row per group and it’s not necessarily the correct one. 6 - recursive CTEs are very important, since they perform better than WHILE loops, but are very difficult to write - requires in depth documentation for others to understand. 7 - didn’t understand as well. Would like an explanation for that one. Great post! Keep it up.

See more comments

To view or add a comment, sign in

Explore content categories