A question for every dev who's ever designed a database table: Did you design it for 10,000 rows or 10 million? Because when schemas are designed for the demo. For the MVP. For "let's just ship it and optimize later." And "later" arrives as a 3 AM P1 page, 18 months down the road, when the table that "works fine" has grown 1000x and suddenly nothing works fine. We call these the haunting patterns. Schema decisions that feel harmless at small scale and become structural nightmares at large scale. Data Drop #6 covers the big three: → UUIDs as primary keys — Random values fragment your B-tree indexes. At 500M rows, your index is bloated, your writes scatter across random pages, and your cache hit ratio craters. Sequential IDs exist for a reason. → "Just make it nullable" — The path of least resistance at design time. The source of a thousand bugs at query time. NULL doesn't equal NULL. Your aggregations silently skip rows. Your joins produce unexpected results. Nullable should be a conscious choice, not a default. → The EAV trap — Entity-Attribute-Value: the schema pattern that says "I don't want to commit to a data model." Congratulations, you now have a key-value store with the performance of a relational database and the flexibility of neither. Design for the table size you're going to have. Not the one you have today. Data Drop #6. Day 6 of 23. #AprilDataDrops #PostgreSQL #DataDrop6 #SchemaDesign #Database #Performance #OpenSourceDB OpenSource DB | Lahari Giddi
More Relevant Posts
-
How Do You Optimize a Query with Multiple Joins, Filters, and Pagination? 🤔 If you think pagination will optimize DB performance, then bro you are 𝘄𝗿𝗼𝗻𝗴 — because pagination is only applied at the end of the filtration and joins. That does not really optimize the query. It only optimizes the DB network bandwidth, nothing else. 📡 In this case, we need to carefully analyze the query and database structure. 🧠 If we join all tables first and then apply filters and pagination, let’s see what the database actually does. • First, it joins the tables and creates a temporary table in memory. 🗂️ • Then it applies filters. 🔎 • And at the end, it applies pagination. 📄 All the memory and CPU utilization happens during joins and filters. ⚙️ If there is a GROUP BY clause, it will require even more processing power. 📊 If the dataset is too large, our DB processor can 𝗲𝗮𝘀𝗶𝗹𝘆 𝘀𝗽𝗶𝗸𝗲. 📈 Now the question is: 𝗵𝗼𝘄 𝗱𝗼 𝘄𝗲 𝗼𝗽𝘁𝗶𝗺𝗶𝘇𝗲 𝗶𝘁? 🤔 If our DB design and query allow step-by-step filtering, like: • First apply filters on the user table • Then apply filters on another table • And so on... Then we can reduce the join data stage by stage by using the 𝗪𝗜𝗧𝗛 clause, which is known as a 𝗖𝗼𝗺𝗺𝗼𝗻 𝗧𝗮𝗯𝗹𝗲 𝗘𝘅𝗽𝗿𝗲𝘀𝘀𝗶𝗼𝗻 (𝗖𝗧𝗘). 🧩 CTEs help reduce the temporary table size during joins, and our filters can be applied faster. ⚡ Then the next stage joins only with the already filtered data. Finally, we can apply pagination. 📄 If a join is only required for data viewing and not for filtering, we can also apply that 𝗷𝗼𝗶𝗻 𝗮𝗳𝘁𝗲𝗿 𝗽𝗮𝗴𝗶𝗻𝗮𝘁𝗶𝗼𝗻, when the dataset is already very small. This also helps optimize our query. 🚀 I have been using the WITH clause (CTE) in many of my large queries, and it has helped me a lot in improving query performance. 💡 #realMoneyLearnings #Databases #SQL #MySQL #DatabasePerformance #QueryOptimization #BackendEngineering #SoftwareEngineering #SystemDesign #TechLearning #LearningInPublic
To view or add a comment, sign in
-
-
Your Database Is Lying to You About Performance 🗄️ It works fine in staging. 50ms response times, clean query plans, zero complaints. Then you hit production. 3 million rows later - everything falls apart. Here's what nobody tells you about database optimization: 1. An index on the wrong column is worse than no index. Indexes cost you on every INSERT and UPDATE. If your query doesn't use it, you're paying the write penalty for nothing. Run EXPLAIN ANALYZE. Look at what's actually being scanned. 2. N+1 queries are silent killers. One endpoint. Looks innocent. Under the hood it fires 1 query to get users, then 1 query per user to get their orders. 200 users = 201 queries. Your ORM is very good at hiding this from you. 3. LIKE '%keyword%' ignores every index you have. Leading wildcard = full table scan. Every time. If you need full-text search - use full-text search (PostgreSQL's tsvector, Elasticsearch, whatever fits). Don't fight SQL with SQL. 4. Pagination with OFFSET doesn't scale. OFFSET 100000 LIMIT 20 doesn't skip 100000 rows - it reads them all and throws them away. Use keyset pagination. Cursor-based. Your DBAs will stop avoiding eye contact with you. 5. The query that runs in 10ms on 10k rows runs in 40 seconds on 10M. Linear doesn't stay linear. Test with production-scale data. Always. 6. Slow query log is your best friend you never talk to. Enable it. Most performance issues announce themselves long before they become incidents. The database is rarely the problem. The queries are. What's the most expensive query bug you've ever shipped to production? 👇 #Database #SQL #BackendEngineering #Performance #SoftwareEngineering
To view or add a comment, sign in
-
-
I used to think indexes were the 𝗲𝗮𝘀𝗶𝗲𝘀𝘁 way to 𝗳𝗶𝘅 𝘀𝗹𝗼𝘄 𝗾𝘂𝗲𝗿𝗶𝗲𝘀. Query slow? 👉 Add an index. Another query slow? 👉 Add another index. For a while, it actually works. ⚡ Queries become faster. 📊 Dashboards load quickly. Everyone is happy. But something interesting starts happening later. 🐢 Writes begin to slow down. INSERT, UPDATE, and DELETE operations take longer than expected. And the reason is simple: Every time data changes, the database must also update every related index. So if a table has too many indexes, each write operation becomes heavier. ⚖️ That’s the 𝘁𝗿𝗮𝗱𝗲-𝗼𝗳𝗳 many developers discover a bit late. Indexes are powerful, but creating them blindly can introduce new problems. Some common side effects: ✅ 𝗣𝗿𝗼𝘀 of adding indexes 🔎 Faster search and filtering (WHERE) 🔗 Faster joins between tables 📈 Better performance for sorting and grouping 🗂️ Large datasets become manageable ⚠️ 𝗖𝗼𝗻𝘀 of adding indexes 𝗯𝗹𝗶𝗻𝗱𝗹𝘆 🐌 Slower inserts, updates, and deletes 💾 Extra disk space for each index ⚙️ More work for the database to maintain them ❓ Some indexes may never even get used That’s why indexing is less about adding more, and 𝘮𝘰𝘳𝘦 𝘢𝘣𝘰𝘶𝘵 𝘢𝘥𝘥𝘪𝘯𝘨 𝘵𝘩𝘦 𝘳𝘪𝘨𝘩𝘵 𝘰𝘯𝘦𝘴. 𝘼 𝙜𝙤𝙤𝙙 𝙞𝙣𝙙𝙚𝙭 𝙪𝙨𝙪𝙖𝙡𝙡𝙮 𝙘𝙤𝙢𝙚𝙨 𝙛𝙧𝙤𝙢 𝙪𝙣𝙙𝙚𝙧𝙨𝙩𝙖𝙣𝙙𝙞𝙣𝙜 𝙝𝙤𝙬 𝙩𝙝𝙚 𝙙𝙖𝙩𝙖 𝙞𝙨 𝙖𝙘𝙩𝙪𝙖𝙡𝙡𝙮 𝙦𝙪𝙚𝙧𝙞𝙚𝙙. 🧠 Databases reward thoughtful design. Blind optimization rarely stays optimal for long. #realMoneyLearnings #Databases #MySQL #SQL #DatabasePerformance #BackendEngineering #SoftwareEngineering #SystemDesign #PerformanceOptimization #DatabaseIndexes #LearningInPublic
To view or add a comment, sign in
-
I was debugging a slow query. Turns out… the database was just taking the scenic route 🌳 Everything looked fine. Indexes? Present. Query? Clean. Still… latency said “not today.” Went one layer deeper. Not the query. Not the data. 👉 The tree underneath it. Databases like MySQL and MongoDB don’t magically fetch data. They walk a path. And if that path is messy… your query is basically sightseeing before doing real work. 💡 What actually fixed it: Not rewriting the query 10 times. Instead: Rebuilding / optimizing indexes (hello B-Trees 👀) Fixing index choice based on access patterns Dropping useless indexes that were bloating traversal Checking if data distribution was skewing the tree ⚙️ The shift: I stopped asking: ❌ “Is my query efficient?” and started asking: ✅ “Is my data structure forcing extra work?” Because in production: Bad query → obvious problem Bad tree → silent performance killer And yeah… Sometimes the fix isn’t clever code. It’s just admitting: 👉 your tree grew wild and needs pruning. Next time your query is slow, don’t just optimize SQL. Trim the tree—you might save milliseconds (and your weekend). #BackendEngineering #SystemDesign #PerformanceEngineering #Databases #DSA #SoftwareEngineering
To view or add a comment, sign in
-
Why you can't "just" concurrently index partitioned tables in Postgres We’ve been working on partitioning some of our fastest-growing tables recently. Since these are high-traffic, write-heavy tables, we tried to add indexes concurrently to avoid downtime. That's when we hit a classic wall: ERROR: cannot create index on partitioned table "..." concurrently. The "Why": CREATE INDEX CONCURRENTLY is designed to avoid locking your table. But since a partitioned table is a virtual parent, Postgres can’t coordinate a "no-lock" build across multiple child tables at once from a single command. It defaults to an Access Exclusive Lock, which, in a production environment, is a dealbreaker for performance. The "Create-and-Link" Pattern we used instead: • The Shell: Run CREATE INDEX ... ON ONLY parent_table. This adds the metadata to the parent but keeps it Invalid at first. • The Background Build: Run CREATE INDEX CONCURRENTLY on each individual partition. This builds the index without blocking our production traffic. • The Link: Attach each partition index to the parent: ALTER INDEX parent_idx ATTACH PARTITION child_idx. The Result: Once the final partition is linked, the parent index automatically becomes Valid. We got zero downtime, and any future partitions will now inherit this index automatically. It's a bit more manual work, but it's a lifesaver for keeping a high-growth system responsive. #PostgreSQL #DatabaseEngineering #SystemDesign #Backend #Scalability #SDE
To view or add a comment, sign in
-
Most developers add database indexes expecting instant magic speed… …but many accidentally slow down their entire system instead. Here’s exactly how database indexing works under the hood — and why it’s a double-edged sword: Indexes are separate data structures that store a sorted map of your column values and point directly to the actual rows in the table. Instead of scanning every single row (a slow full table scan), the database can quickly jump to the right data — often in just a few steps. The Major Advantages: Lightning-fast reads: B-Tree indexes (the default in most databases) give O(log n) search time. They efficiently handle equality (=), range queries (>, <, BETWEEN), sorting, and JOINs. Specialized indexes unlock extra power: Hash indexes deliver true O(1) speed for exact matches, Bitmap indexes excel with low-cardinality data in analytics, and GiST/GIN handle full-text or spatial searches beautifully. Result: Queries that dragged for seconds now return in milliseconds, even on million-row tables. The Real Trade-Offs (Where It Hurts): Extra storage cost: Indexes can easily double or triple the size of your table. Slower writes: Every INSERT, UPDATE, or DELETE has to update all related indexes. This adds significant overhead and disk I/O, especially on high-write workloads. Maintenance burden: Choosing the wrong index type (like Hash for range queries) or creating too many indexes wastes space and can actually hurt performance. The smart approach: Focus indexes on columns frequently used in WHERE, ORDER BY, or JOIN conditions — especially on read-heavy tables. Regularly check which indexes are actually being used and drop the unused ones. Test changes carefully. Mastering this trade-off is what turns good backend systems into highly scalable ones. What’s your biggest indexing win — or the hardest lesson you learned about indexes? Drop it in the comments 👇 I read every single one. #DatabaseEngineering #SQL #PerformanceOptimization #BackendDevelopment #PostgreSQL #MySQL #DataEngineering #SystemDesign
To view or add a comment, sign in
-
-
The most important engineering decision isn't which framework to use. It's your database schema. What I've learned building with PostgreSQL: 1. Design for the queries you'll actually run. Write your 10 most critical queries first. Then design the schema. 2. Normalise - but know when to stop. 3NF is great until you're doing 7-table JOINs for a simple dashboard. 3. Index what you filter and sort on. Every column in WHERE, ORDER BY, or JOIN on high-traffic tables needs an index. 4. Use UUIDs as primary keys for public-facing data. 5. Add created_at and updated_at to every table. Every. Single. Table. 6. Never delete data. Add a deleted_at column instead. Soft deletes are recoverable. What's the hardest schema decision you've had to make? Follow Sachin Shah - backend engineering decisions that matter in production. #PostgreSQL #BackendDevelopment #SystemDesign #SoftwareEngineering #BuildInPublic #SQL #TechTips
To view or add a comment, sign in
-
-
🚀 Database Indexing (Part 1): The Foundation of Fast Queries Before scaling systems with partitioning or distributed caching, the first step is Database Indexing. If your queries are slow, you’re likely missing the right indexes. 🔹 What is Database Indexing? Database Indexing is a technique used to improve query performance by creating a structure that allows faster data lookup. 👉 Like a book index — jump directly to the data instead of scanning everything. 🔹 How It Works Without Index ❌ ➡ Full Table Scan (O(n)) With Index ✅ ➡ Faster Lookup (O(log n)) 🔹 Types of Indexes 1️⃣ B-Tree Index (Most Common) Default index in most databases Supports: Equality (=) Range (>, <, BETWEEN) Sorting 2️⃣ Hash Index Best for exact match (=) Very fast lookup 👉 Limitation: ❌ No range queries ❌ No sorting 3️⃣ Composite Index Multiple columns Example: (user_id, created_at) 👉 Follows left-to-right rule 4️⃣ Unique Index Ensures no duplicate values Example: email, username 5️⃣ Full-Text Index Used for search functionality Example: product search, keyword search 🔹 Benefits ✅ Faster query execution ✅ Efficient searching ✅ Reduced full table scans ✅ Better performance for large datasets 💬 In Part 2, I’ll cover real-world problems, trade-offs, and best practices. #Database #BackendDevelopment #Java #SQL #Performance #Optimization
To view or add a comment, sign in
-
-
Two things from last night's SQL session that I will not forget. First: a database is just a disk. All the complexity of schemas, queries, indexes and transactions is software built on top of a file sitting on your hard drive. That one realization made databases feel less magical and more understandable. Second: never DELETE without a WHERE clause. Ever. Without WHERE you are not deleting a row. You are deleting everything. One missing condition and your entire table is gone. No undo button. We also covered SQL vs NoSQL differences, creating and altering tables, primary keys, filtering with WHERE, AND OR logic, pattern matching with LIKE and ILIKE, pagination with LIMIT and OFFSET, aggregations like COUNT SUM AVG MIN MAX and multi column aggregations. A lot in one session. Grateful to Hitesh Choudhary and Piyush Garg and the Chai Aur Code team for sessions that make you think. #SQL #Database #Backend #ChaiCode
To view or add a comment, sign in
-
Many developers model everything as a table in PostgreSQL without knowing there's a feature that can make queries faster and simplify the database structure in certain cases: Composite Types. The idea is simple: you define a structured data type and use it as a column in any table When it makes sense to use: - The data has no identity of its own, it only exists alongside the parent record - You'll never query that data in isolation - You don't need history, auditing or traceability - It's a fixed, immutable value that just enriches the main row When it doesn't make sense and a 1:1 table is better: - The data can be referenced by other tables - You need history (the address changed and you want to keep the previous one) - It will grow over time and get new columns - Other parts of the application need to access that data directly The main tradeoff is: Composite Type eliminates the JOIN and makes reads faster, but you give up flexibility. If requirements change and that data needs to become its own entity, the migration is a pain. *If you use Prisma, pay attention: it doesn't natively map composite types, it treats the column as `Unsupported`. To read the data, you need to unpack the fields directly in SQL via `$queryRaw`. The rule I use to decide: if the data is a value that describes something, make it a type. If the data is an entity with a life of its own, make it a table. Follow me for more technical posts like this. #PostgreSQL #SQL #Database #Backend #FullStack #DatabaseDesign #Prisma
To view or add a comment, sign in
-
More from this author
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development