I've read a lot of SQL written by smart people. Most of it is unreadable. Not because the logic is wrong. But three months later, nobody can tell you why it does what it does. Including the person who wrote it. After 8 years of writing SQL daily, here's the system I use to keep queries readable, debuggable, and easy to hand off: 🔢 Step 0: Check the row count before writing anything Run SELECT SUM(1) FROM main_table and put the number in a comment at the top. This one habit has saved me from fan-out disasters more times than I can count. 🔗 Always use explicit JOINs Never connect tables through the WHERE clause. The join logic and the filter logic should live in separate, predictable places. And always default to LEFT JOIN — INNER JOIN silently drops rows, which silently corrupts your results. 🏷️ Meaningful aliases, not alphabet soup If you're aliasing tables a, b, c — your colleagues are not thanking you. Two or three descriptive characters (ai, sf, pc) is all you need. 🧱 One CTE, one job Break complex logic into named CTEs. Each one does exactly one thing. Structure them Source → Filtered → Aggregated → Final. You can read it like a story. 💬 Comment the why, not the what The code shows what's happening. Comments should explain why a decision was made — the business rules, the edge cases, the intentional exclusions. Readable SQL is a form of communication. It signals you're thinking about the person who comes after you, not just getting the right answer today. I wrote up the full breakdown with real code examples on the blog — link in the comments. #SQL #DataAnalytics #DataEngineering #BI #Analytics #BestPractices #DataAnalysis #DataFam
The only one I would quibble with is defaulting to a left join. I typically use an inner join unless I'm specifically wanting records from the first table regardless of there's a match in the second table. But in either case, it's important to understand your data model to know which join type to use, and be explicit.
Zachary Thurston I 100% agree with this. There are cases where the join SHOULD be an inner join, even by default. I've found 9x out of 10 the dimension table is missing a value AND and someone needs to get it updated. If you're doing your job as an analyst and validating the data, you'll know its off. Then you go an fix the source. If and only if you are unable to work with someone to update the source should you then use left join. Unless of course the model was designed for left join, the use that.
Unreadable SQL was not written by smart people. Thank you.
Love the Step 0 point. Setting expectations for how many rows should have be returned and context about it matters.
Organized and documented SQL code? Brilliant!!
https://tamasszabo.org/blog/how-i-structure-sql-queries/