Database Performance Regrets and Schema Changes

I absolutely love this post. I've seen a lot of databases where it seemed that the people designing it did not realize they were making performance decisions they would later regret. Once people feel the pain of those decisions, the databse and app have often been around so long that it is often too late to convince them to change it because changing the database schema means app layer changes they don't want to make. These folks will frequently say the recommended changes can't be done "because it would be too much work." Then those same people will often ask, "Is there anything else we can do to make it perform better?" I understand the desire for some other answer that involves less work. But sometimes there are no other answers. A house sitting on a busted foundation can only be "prettied up" so much. #database

Haider Z.

1mo

Database Performance Tuning Rule #8: Your schema is a performance decision you made. What that means: Every data model decision has a query cost attached to it. Storing status as a VARCHAR instead of a smallint: → Every index on that column is larger. → Every comparison takes more CPU. → Trivial individually. Expensive at 500M rows. Normalizing a user profile into six tables: → Every profile page load needs six joins. → At 10,000 users/day: fine. → At 10,000 users/minute: those joins become the bottleneck. Putting created_at as a nullable column: → Now your time-range queries need IS NOT NULL checks. → The planner estimates null proportion from statistics. → Bad statistics → wrong plan → wrong index. Ask these questions at schema design time: • What is the most frequent query this table will serve? • What is the write rate at peak? • How will this table look at 10x current row count? • Which columns will appear in WHERE clauses? Are they indexed-friendly types? • Are there any nullable FK columns? (Each one is a potential lock incident waiting to happen in MySQL.) Schema reviews are performance reviews. Most teams treat them as correctness reviews. Correctness is table stakes. Performance is what you'll be pinged about. ───────────────────────────────────────────────── ♻️ Repost to every engineer who designs tables. Follow Haider Z. 📩 Real incidents, RCAs, SQL queries every week → https://lnkd.in/d3M5-pJA

4 Comments

Louis Davidson 2w

“Putting created_at as a nullable column: → Now your time-range queries need IS NOT NULL checks. → The planner estimates null proportion from statistics. → Bad statistics → wrong plan → wrong index.” Columns that should never be null should be NOT NULL for logical reasons, not performance reasons. But even so this feels like a problem that just as easily occurs with not null columns (bad stats) and if you have no null data, why would the stats imply you do?

Billy Rhodes 1w

I am dealing with a similar situation with a SaaS provider at the moment. I have streamlined their cloud and hardware architecture as much as possible to give them the strongest runway to address issues within their product. However, when the underlying storage is capable of nearly 100 GB per second at the disk level while the database layer is only achieving approximately 50 MB per second, it is difficult to avoid the conclusion that the bottleneck lies within the application or database design rather than the infrastructure.

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

Madhuri Gundlapally
2w
Report this post
Most people use SQL every day — but do you know the 12 rules that define a TRUE relational database? 📋 CODD'S 12 RULES — SIMPLIFIED ━━━━━━━━━━━━━━━━━━━━ ✅ Rule 1 — Information Rule All data is stored in tables (rows & columns). Period. ✅ Rule 2 — Guaranteed Access Reach any value using: Table Name + Primary Key + Column Name ✅ Rule 3 — NULL Handling NULL ≠ zero or blank. It means UNKNOWN. Treat it consistently. ✅ Rule 4 — Active Online Catalog DB metadata (schema, tables) is stored AS data — queryable like any table. ✅ Rule 5 — Powerful Language One language (SQL) covers DDL + DML + transactions. No exceptions. ✅ Rule 6 — View Update Rule Views must be updatable wherever logically possible. ✅ Rule 7 — Relational Operations INSERT / UPDATE / DELETE + set operations (UNION, INTERSECT) all supported. ✅ Rule 8 — Physical Independence Change your storage engine or file structure? Your app shouldn't care. ✅ Rule 9 — Logical Independence Add or rename columns? Existing user views stay unaffected. ✅ Rule 10 — Integrity Independence Constraints belong in the DATABASE — not scattered across your application code. ✅ Rule 11 — Distribution Independence Data can live across multiple servers — users see one unified system. ✅ Rule 12 — Non-Subversion Rule No backdoor bypass of integrity rules. Low-level access cannot override constraints.
Like Comment
To view or add a comment, sign in
Jay Patel
2w
Report this post
I used to think indexes were the 𝗲𝗮𝘀𝗶𝗲𝘀𝘁 way to 𝗳𝗶𝘅 𝘀𝗹𝗼𝘄 𝗾𝘂𝗲𝗿𝗶𝗲𝘀. Query slow? 👉 Add an index. Another query slow? 👉 Add another index. For a while, it actually works. ⚡ Queries become faster. 📊 Dashboards load quickly. Everyone is happy. But something interesting starts happening later. 🐢 Writes begin to slow down. INSERT, UPDATE, and DELETE operations take longer than expected. And the reason is simple: Every time data changes, the database must also update every related index. So if a table has too many indexes, each write operation becomes heavier. ⚖️ That’s the 𝘁𝗿𝗮𝗱𝗲-𝗼𝗳𝗳 many developers discover a bit late. Indexes are powerful, but creating them blindly can introduce new problems. Some common side effects: ✅ 𝗣𝗿𝗼𝘀 of adding indexes 🔎 Faster search and filtering (WHERE) 🔗 Faster joins between tables 📈 Better performance for sorting and grouping 🗂️ Large datasets become manageable ⚠️ 𝗖𝗼𝗻𝘀 of adding indexes 𝗯𝗹𝗶𝗻𝗱𝗹𝘆 🐌 Slower inserts, updates, and deletes 💾 Extra disk space for each index ⚙️ More work for the database to maintain them ❓ Some indexes may never even get used That’s why indexing is less about adding more, and 𝘮𝘰𝘳𝘦 𝘢𝘣𝘰𝘶𝘵 𝘢𝘥𝘥𝘪𝘯𝘨 𝘵𝘩𝘦 𝘳𝘪𝘨𝘩𝘵 𝘰𝘯𝘦𝘴. 𝘼 𝙜𝙤𝙤𝙙 𝙞𝙣𝙙𝙚𝙭 𝙪𝙨𝙪𝙖𝙡𝙡𝙮 𝙘𝙤𝙢𝙚𝙨 𝙛𝙧𝙤𝙢 𝙪𝙣𝙙𝙚𝙧𝙨𝙩𝙖𝙣𝙙𝙞𝙣𝙜 𝙝𝙤𝙬 𝙩𝙝𝙚 𝙙𝙖𝙩𝙖 𝙞𝙨 𝙖𝙘𝙩𝙪𝙖𝙡𝙡𝙮 𝙦𝙪𝙚𝙧𝙞𝙚𝙙. 🧠 Databases reward thoughtful design. Blind optimization rarely stays optimal for long. #realMoneyLearnings #Databases #MySQL #SQL #DatabasePerformance #BackendEngineering #SoftwareEngineering #SystemDesign #PerformanceOptimization #DatabaseIndexes #LearningInPublic

18 Comments
Like Comment
To view or add a comment, sign in
Ahmad Tabash
1w
Report this post
⚡ Stop Writing Slow SQL Queries — 6 Fixes That Actually Work A slow query in development is a disaster in production. I've seen queries that took 30 seconds get down to 200ms with these fixes **❌ 01 — Never Use SELECT *** SELECT * FROM Users -- ❌ fetches every column SELECT Id, Name FROM Users -- ✅ only what you need Less data = faster query. Always. 📇 02 — Index Your WHERE Columns CREATE INDEX IX_Users_Email ON Users(Email); If you filter by a column — it must be indexed. No index = full table scan. 🔄 03 — Avoid the N+1 Problem 1 query for orders + 1 query per order item = disaster at scale. Use JOIN in SQL or Include() in Entity Framework to fetch everything in one shot. 📄 04 — Always Paginate -- ❌ Returns 1 million rows SELECT * FROM Orders -- ✅ Returns 20 rows SELECT * FROM Orders ORDER BY Id OFFSET 0 ROWS FETCH NEXT 20 ROWS ONLY 🔍 05 — Use the Execution Plan In SSMS: Ctrl+M → run your query. Index Seek = fast ✅ Table Scan = missing index ❌ This one tool will show you exactly why your query is slow. ⚡ 06 — Avoid Functions in WHERE WHERE YEAR(CreatedAt) = 2024 -- ❌ breaks the index WHERE CreatedAt >= '2024-01-01' -- ✅ index is used Wrapping a column in a function prevents the query engine from using the index. 💡 Query optimization is not magic — it's just knowing what the database engine is doing under the hood. Which of these mistakes have you seen most in real projects? #SQL #SQLServer #Database #BackendDevelopment #QueryOptimization #CSharp #SoftwareEngineering
Like Comment
To view or add a comment, sign in
Surjit Singh
2w Edited
Report this post
🚀 SQL Optimization Case Study: Fixing Concurrency & Performance in Series Generation Worked on optimizing a stored procedure responsible for generating unique reference numbers in a high-concurrency system. Before (Problem) • Separate SELECT + UPDATE → race condition risk • Multiple IF blocks → duplicate code • NOLOCK → dirty reads / inconsistent data 👉 Result: Duplicate IDs, slow performance, and unreliable behaviour under load. After (Solution) 🔹 Atomic Update (Key Fix) UPDATE Series WITH (UPDLOCK, ROWLOCK) SET CurrentSeries = CurrentSeries + 1 OUTPUT inserted.CurrentSeries ✔ Single operation → no race condition ✔ Ensures thread-safe sequence generation 🔹 Removed Redundant Queries • Eliminated repeated SELECT blocks • Used OUTPUT to fetch updated values directly ✔ Reduced query count ✔ Improved execution speed 🔹 Improved Locking Strategy • Used UPDLOCK → prevents concurrent updates • Removed NOLOCK → avoids dirty reads ✔ Better data consistency + reliability 🔹 Index Optimization CREATE NONCLUSTERED INDEX IX_Series_Type_Active ON Series (SeriesType, IsActive) INCLUDE (CurrentSeries, SeriesUpto, Prefix); ✔ Faster lookup ✔ Reduced table scans 📊 Impact 🚫 Eliminated duplicate reference numbers ⚡ Improved performance under concurrency 🔒 Stronger data integrity 🧩 Cleaner & maintainable code 💡 Takeaway: For high-volume systems, always ensure: • Atomic operations > separate SELECT + UPDATE • Proper locking > NOLOCK shortcuts • Efficient functions > convenience functions 👉 Small SQL changes can create big performance gains. #SQLServer #DatabaseOptimization #Concurrency #PerformanceTuning #BackendEngineering #SystemDesign
Like Comment
To view or add a comment, sign in
Patrick Sama
3w
Report this post
Core SQL Interview Questions With Answers - Part 8 71 What is SQL Deadlock - Two transactions block each other - Both wait for locks held by other - Solution: shorter transactions, proper indexing 72 What are Table Locks vs Row Locks - Table lock: entire table blocked - Row lock: only affected rows blocked - Row locks more granular, better concurrency 73 What is Isolation Level - Controls transaction visibility - READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, SERIALIZABLE - Balance consistency vs performance 74 What is a Temp Table - Session-specific table for intermediate results - #temp in SQL Server, CREATE TEMP TABLE in PostgreSQL - Faster than CTE for complex multi-step queries 75 What is Table Variable - Memory-based, scoped to batch - @table in SQL Server - No statistics, good for small datasets 76 What does COALESCE do - Returns first non-NULL value - Alternative to ISNULL - Example: COALESCE(phone1, phone2, 'No phone') 77 What is NULLIF - Returns NULL if equal, else first arg - Divides safely: amount/NULLIF(count,0) - Prevents divide by zero errors 78 What are PIVOT and UNPIVOT - PIVOT: rows to columns - UNPIVOT: columns to rows - Dynamic crosstabs without CASE statements 79 What is MERGE statement - UPSERT operation (update if exists, insert if not) - Single statement for insert/update/delete - Example: MERGE target USING source ON match_condition 80 Interview tip you must remember - Always ask about data volume in performance questions - Mention transaction isolation for concurrency scenarios - Practice deadlock scenarios with 2+ transactions - Know your DBMS lock escalation rules
Like Comment
To view or add a comment, sign in
Preksha Mahajan
1w
Report this post
Most data engineers focus on writing SQL queries while ignoring what lies under the hood. Below is the execution of the query broken down into simpler steps: 1. When a SQL query is issued, it reaches down to the database engine. 2. This engine is responsible for the compilation of SQL by parsing the code to check for proper semantics, syntax and permissions to access the database objects. 3. Once the engine understands your intent, it translates that human-readable SQL into bytecode. This is a machine-readable format that represents the logical steps required to fetch your data. 4. Next comes the most critical part of the process i.e. the Query Optimiser. It analyses the bytecode in order to decide the most efficient path to execute the query. It might: - push down the predicate by filtering rows as early as possible to reduce I/O. - choose between different types of joins. - decide if it's faster to scan an index or the full table. - determine how many CPU cores can work on the task simultaneously. 5. Finally, the execution engine follows the optimized plan, pulls the records from storage by interacting with the storage engine and serves the results back to your screen. So, now you know your 'SELECT * FROM users' query is not as innocent as it looks. #DataEngineering #SQL #MySQL #Queries #QueryOptimisation #SystemDesign
Like Comment
To view or add a comment, sign in
Dhiraj Kumar
3w Edited
Report this post
Stop memorizing SQL queries. Start recognizing patterns. Most real-world data engineering problems (about 80% of them) boil down to the same 25 reusable patterns. If you understand the pattern, the syntax becomes secondary. Whether it's an interview or a production bug, you'll know exactly which tool to grab: 🔹 Window Functions: For Top-N analysis and running totals. 🔹 Self-Joins: For hierarchical data and comparisons. 🔹 CTEs: For cleaning and de-duplication logic. 🔹 Cohorts/Funnels: For user retention tracking. The biggest mistake? Solving random questions without a system. Don't just "code"—think in patterns. - Follow Dhiraj Kumar for more practical data engineering & SQL content Document Credit qoes to respective owner.. #SQL #DataEngineering #BackendDevelopment Oracle MySQL #sde #swe #mysql #fullstackdeveloper #softwaredeveloper
Like Comment
To view or add a comment, sign in
Gagandeep Kaur
1w
Report this post
🚀 Finding Slow Queries in SQL Server (Quick Guide for Busy Developers) Slow systems rarely fail suddenly—they degrade over time. Often, the root cause is a few long-running queries silently consuming resources. 👉 Good news: You don’t need expensive tools to detect them. 🔍 Spot Slow Queries in Real-Time Use SQL Server’s Dynamic Management Views (DMVs) to see what’s running right now. Query: Identify queries running longer than 5 seconds SELECT s.session_id, r.total_elapsed_time / 1000 AS runtime_seconds, DB_NAME(r.database_id) AS database_name, t.text AS query_text, s.login_name, s.host_name FROM sys.dm_exec_sessions s JOIN sys.dm_exec_requests r ON r.session_id = s.session_id CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t WHERE r.total_elapsed_time > 5000 ORDER BY r.total_elapsed_time DESC; ✅ Result: Instant visibility into problematic queries. ⚠️ Why It Matters --Long-running queries can: --Slow down other operations --Cause blocking and deadlocks --Increase CPU and IO usage 🤖 Simple Automation Idea Instead of checking manually, automate detection: SELECT COUNT(*) FROM sys.dm_exec_requests WHERE total_elapsed_time > 5000; 💡 Tip: If count > 0 → trigger an alert or log the details. 💡 Final Thought Performance tuning isn’t just about fixing problems— it’s about detecting them early. 👉 Start small. Even a simple DMV query can save hours of troubleshooting later. #SQLServer #DatabasePerformance #SQLTips #DataEngineering #DBA #PerformanceTuning #TechLearning #Developers #DataAnalytics
Like Comment
To view or add a comment, sign in
Sudha Rajamanickam
2w
Report this post
A few concepts that changes how we think about queries: https://lnkd.in/gYkxFyN7 📄 𝗗𝗮𝘁𝗮 𝗶𝘀 𝗿𝗲𝗮𝗱 𝗶𝗻 𝗽𝗮𝗴𝗲𝘀, 𝗻𝗼𝘁 𝗿𝗼𝘄𝘀 → even a single row fetch loads the full page 📍 𝗜𝗻𝗱𝗲𝗫𝗲𝘀 𝗱𝗼𝗻’𝘁 𝗽𝗼𝗶𝗻𝘁 𝘁𝗼 𝗿𝗼𝘄𝘀 → they point to page + offset 📊 𝗖𝗮𝗿𝗱𝗶𝗻𝗮𝗹𝗶𝘁𝘆 𝗱𝗲𝗰𝗶𝗱𝗲𝘀 𝗶𝗻𝗱𝗲𝘅 𝘂𝘀𝗲𝗳𝘂𝗹𝗻𝗲𝘀𝘀 (low-card columns often get ignored) 🔍 𝗜𝗻𝗱𝗲𝘅 𝗦𝗰𝗮𝗻 𝘃𝘀 𝗜𝗻𝗱𝗲𝘅-𝗢𝗻𝗹𝘆 𝘃𝘀 𝗕𝗶𝘁𝗺𝗮𝗽 → same query, completely different I/O pattern 🕸️ 𝗕𝗶𝘁𝗺𝗮𝗽 𝘀𝗰𝗮𝗻𝘀 exist mainly to avoid thousands of random reads 📜 𝗪𝗔𝗟 + 𝗔𝗖𝗜𝗗 → why your data survives crashes ⚖️ 𝗟𝗼𝗰𝗸𝗶𝗻𝗴 & 𝗿𝗲𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻 → where performance vs consistency trade-offs show up 🚀 𝗣𝗮𝗿𝘁𝗶𝘁𝗶𝗼𝗻𝗶𝗻𝗴 𝘃𝘀 𝘀𝗵𝗮𝗿𝗱𝗶𝗻𝗴 → scale decisions, not just design choices #DatabaseInternals #SQLPerformance #PostgreSQL #MySQL #DatabaseArchitecture #BackendDevelopment #SoftwareEngineering #DataScaling #ACID #DBA #Database #SQL #BackendEngineering #SystemDesign #Performance

Relational DB Concepts That Helps to See Things Better medium.com
Like Comment
To view or add a comment, sign in
Aswinkumar Senthilkumaran
1w Edited
Report this post
Writing the same SQL query again and again? Use 𝗩𝗶𝗲𝘄𝘀. A View is like a 𝘀𝗮𝘃𝗲𝗱 𝗦𝗤𝗟 𝗾𝘂𝗲𝗿𝘆 that you can treat like a table. Instead of rewriting complex queries, you just do: 𝗦𝗘𝗟𝗘𝗖𝗧 * 𝗙𝗥𝗢𝗠 𝗮𝗰𝘁𝗶𝘃𝗲_𝘂𝘀𝗲𝗿𝘀_𝘃𝗶𝗲𝘄; Clean. Simple. Reusable. Why Views are powerful in complex queries: • Hide complicated joins and logic • Reuse the same query across multiple places • Provide a simplified “read-only” layer • Restrict access to sensitive data (security layer) Real-world example: Instead of writing a big query joining users + orders + payments… Create a view 𝗼𝗻𝗰𝗲, and use it 𝗲𝘃𝗲𝗿𝘆𝘄𝗵𝗲𝗿𝗲. Now the important part What happens when you INSERT, UPDATE, DELETE? For simple views (single table, no aggregation) You can perform insert/update/delete For complex views (joins, group by, etc.) Mostly read-only Because the database can’t always figure out how to map changes back to original tables. Types of Views: 🔹 Simple View → Based on one table 🔹 Complex View → Multiple tables, joins, functions 🔹 Materialized View → Stores data physically (faster reads ⚡) But here’s the catch: Views don’t store data (except materialized ones) So performance depends on the underlying query. Real insight Views don’t just simplify queries… They simplify how you think about data. Next time your SQL looks messy, don’t rewrite it… 𝗪𝗿𝗮𝗽 𝗶𝘁. #Database #SQL #PostgreSQL #RelationalDatabase #QueryOptimization #BackendDevelopment #SoftwareEngineering #Developers #Programming #SpringFramework #SpringBoot #ScalableSystems #Microservices #aswintech
Like Comment
To view or add a comment, sign in

4,887 followers

617 Posts

View Profile Follow

Database Performance Regrets and Schema Changes

More Relevant Posts

Explore content categories