Shift to Modular SQL for Readable Code

Stop writing SQL for the database engine. Start writing it for the human who has to maintain it (probably you). We’ve all inherited that query. You know the one: 1,000 lines of monolithic code, nested subqueries seven levels deep, and zero comments. It runs, but modifying it feels like playing Jenga with production data. The engine doesn't care about your messy code, but your team's agility does. The shift every Data Analyst needs to make is toward Modular SQL. Modular code is readable code. Readable code is enhanceable code. Here is the blueprint for SQL that survives schema changes and business logic updates: ✅ DO: 1. Use CTEs (Common Table Expressions) to break complex logic into isolated steps. 2. Select explicit columns, never SELECT * in production. 3. Leverage Window Functions over messy self-joins. 4. Comment on WHY the logic exists, not how it works. ❌ DON'T: 1. Nest subqueries deeper than three levels. (Convert them to CTEs!) 2. Use SELECT * (protect your query from table schema evolution). 3. Perform raw date manipulation in WHERE clauses (isolate it in a CTE). 4. Adopt modular SQL. Save future-you hours of debugging. Less firefighting = More analysis. Check out the cheat sheet below. What’s the worst SQL anti-pattern you've encountered in code review? Share your pain below. 👇 #SQL #DataAnalytics #DataEngineering #CodingBestPractices #Analytics #DataScience #CareerGrowth

  • text

I don't know that I've ever written a query for the database engine or another person tbh. I mean, it is a structured query language, isn't it? Maintain the structure and, surprise! It's written for the database engine and another person! Amazing! Seriously, it's STRUCTURED. If you understand the structure, and build it based on the structure (like there would be a way to do it otherwise).... sometimes I think I'm talking to a wall here. Please double check your work before posting (or committing code you've written because *red flag*) your 4th DON'T should actually be a DO, and you failed to mention using clear table aliases in your DOs despite it clearly being #3 on the cheat sheet. QA is sending that work back from where it came my friend. Now your whole project has fallen behind because you didn't bother reviewing your work! Anyway, aside from that the picture is solid on the benefits of the best practices it mentions.

In the era of AI/LLMs assistance, the 'why' in comments seems to matter far more than the 'what' — which is what we typically write for every block, regardless of the query/code language. Small note on the post — the first two DON'Ts are already called out within the DOs themselves, feels a bit redundant. Also, DON'T #4 reads opposite of post intent — likely a typo, but as written it appears ≈ don't adopt modular SQL!

Love this perspective Manpreet Singh! Even just correcting smaller queries, I've seen how quickly SQL can become a maze. Modular SQL and focusing on 'WHY' in comments are absolute game-changers for readability and long-term sanity. My biggest pain point is always queries that try to do way too much in one go (thats a nightmare) Thanks for sharing this blueprint!

Writing SQL for humans at the expense of the engine increases cost and slows reporting. It is possible to do both and optimize queries while also improving security, but not for those who can't be bothered to add code comments, learn to leverage DDL, or model data for efficiency.

Writing SQL for humans instead of just the engine is such a game changer, future you will thank you every time you revisit that query 😄 Manpreet Singh

Strong point, poorly structured SQL doesn’t just slow queries, it slows teams readability is a scalability problem, not just a style choice. Manpreet Singh

See more comments

To view or add a comment, sign in

Explore content categories