Database Sharding & Partitioning Strategies for High QPS

Database Sharding & Partitioning Strategies for 1M+ QPS – What Actually Works in Production 📊 After tuning databases that crossed 1M+ queries per second, I realized one hard truth: “Just add more replicas” is a myth at real scale. You need smart sharding + partitioning designed from day one. Here’s the practical decision framework I use in production: Sharding Strategies – When to Choose What: Range Sharding → Perfect for time-series data, logs, or sequential IDs (e.g., orders by order_date) Hash Sharding → Best for even distribution on high-cardinality keys like user_id, session_id, or tenant_id Composite / Directory-based → When you need both flexibility and low-latency routing PostgreSQL Declarative Partitioning (Still a Game-Changer in 2026): PostgreSQL’s native partitioning has matured beautifully. My go-to patterns: Range Partitioning — Time-based data + easy archiving (monthly/weekly) List Partitioning — Status, region, or category-based queries Hash Partitioning — Massive tables needing even row distribution My Real-World Checklist Before Sharding Anything: 1. Max out connection pooling, indexes, and query tuning first 2. Choose a shard key that covers 80%+ of your query patterns 3. Always plan for future re-sharding (it will happen) 4. Use native partitioning as long as possible — go to Citus or Vitess only when you need true horizontal distribution across nodes 5. Maintain a global lookup / routing table — never do blind hashing in the application layer Pro Tip: Partition pruning is your best friend. Make sure your most frequent WHERE clauses include the partition key. Backend & Database engineers — what sharding or partitioning strategy actually saved (or broke) your system at scale? Drop your war stories below 👇 Let’s exchange real architecture lessons! #DatabaseOptimization #PostgreSQL #Sharding #Partitioning #HighScaleSystems #SystemDesign #BackendDevelopment #Citus #JavaBackend #SeniorDeveloper #SpringBoot

To view or add a comment, sign in

More Relevant Posts

Michael Shumilov
2w
Report this post
🚀 SequelPG v0.11.1 is live If you work with PostgreSQL every day, this will feel familiar: You run a query You tweak it You come back to something from yesterday You try to remember what actually worked Most tools treat query history as just a log. I don’t think that’s enough. In this release, I rebuilt Query History from scratch. Now it’s something you actually use: Quickly find past queries Reuse them without rewriting Debug faster with less context switching I also refactored the Database Tools layer. You won’t “see” most of it — but you’ll feel it: More consistency Better performance Stronger foundation for what’s coming next I’m not trying to add more features. I’m trying to reduce friction when working with data. Full release notes: https://lnkd.in/dFmaV_xH If you use PostgreSQL, I’d really value your feedback. #PostgreSQL #DeveloperTools #IndieHacker #BuildInPublic #SwiftUI #DX

SequelPG v0.1.11: database tools, query history, and a catalog-query security pass sequelpg.com
Like Comment
To view or add a comment, sign in
MyDBA.dev

33 followers
3w
Report this post
A PostgreSQL table with 500 million rows doesn't just slow down queries. It slows down everything: VACUUM takes hours, index builds lock the table, and deleting old data generates massive WAL and leaves bloat behind. Table partitioning splits a large table into smaller physical pieces while keeping it as a single logical table. The query planner uses partition pruning to scan only the relevant partitions. Here's what matters in practice: 1. The partition key must appear in your most common WHERE clauses. If 90% of your queries filter on event_timestamp, partition by timestamp. A partition key that queries don't filter on provides zero benefit and only adds planning overhead. Always verify with EXPLAIN before committing to a scheme -- if pruning doesn't activate, the partitioning isn't helping. 2. Partition count matters more than people think. Monthly partitions for 5 years = 60 partitions (reasonable). Daily partitions for 5 years = 1,825 partitions (the planner slows down noticeably). The planner evaluates each partition during query planning. Keep it manageable, or use TimescaleDB which is specifically optimized for high partition counts. 3. The biggest operational win is instant data removal. DROP TABLE on a partition takes milliseconds and generates no WAL. Compare that to DELETE FROM events WHERE event_timestamp < '2024-02-01' on 100 million rows -- that takes an hour, generates massive WAL, and leaves dead tuples for VACUUM to clean up. The biggest gotcha: you can't convert an existing table to a partitioned table in place. You need to create a new partitioned table, migrate data in batches, and swap with a rename in a single transaction. For zero-downtime migrations, add a trigger or logical replication to capture writes during the migration. Automate partition lifecycle with pg_partman or cron. A missing future partition causes INSERT failures. A forgotten old partition wastes storage. Full guide with range, list, hash strategies, migration patterns, and pg_partman setup: https://lnkd.in/eGUj8zXC #PostgreSQL #TablePartitioning #DatabasePerformance #DataEngineering #DevOps #SRE
Like Comment
To view or add a comment, sign in
Ameya Kasture
1w Edited
Report this post
Moving to a new database sounds easy, right? I thought the same—until I dove deep into the challenges and hidden pitfalls. It's way harder than just copying files. You need to extract millions of rows without slowing down production. You need to make sure nothing gets lost or duplicated. And you need the whole thing to work seamlessly with a completely different database engine. Researching the problem space and exploring YugabyteDB’s solution has been both insightful and inspiring, showcasing a powerful approach to solving complex distributed database challenges. Heres how Yugabyte 𝗩𝗼𝘆𝗮𝗴𝗲𝗿 solves the crucial issue—but here's the clever part: instead of reinventing the wheel, it leverages PostgreSQL's existing tools. The architecture is simple: Voyager runs on its own machine (not on your database servers), connects over the network, and orchestrates the whole migration without risking your production system. This "𝘂𝘀𝗲 𝘁𝗵𝗲 𝗿𝗶𝗴𝗵𝘁 𝘁𝗼𝗼𝗹 𝗳𝗼𝗿 𝗲𝗮𝗰𝗵 𝗷𝗼𝗯" philosophy is worth studying, regardless of whether you're moving databases or building distributed systems. The result? A migration engine that's both 𝗽𝗼𝘄𝗲𝗿𝗳𝘂𝗹 𝗮𝗻𝗱 𝗽𝗿𝗮𝗴𝗺𝗮𝘁𝗶𝗰. For a deeper understanding and more insights, check out my full article : https://lnkd.in/gEpbVA6B

Beyond pg_dump: How YugabyteDB Voyager Turns Familiar Tools into a Migration Engine medium.com
Like Comment
To view or add a comment, sign in
Anis Rahman
4w Edited
Report this post
🐘 10 Powerful Things You Can Do with PostgreSQL PostgreSQL is more than just a relational database — it’s a complete data platform. Here’s how it can power modern applications 👇 🔹 1. Relational Database Strong ACID compliance with tables, joins, and foreign keys. 🔹 2. Document Store Store and query flexible JSON data using JSONB. 🔹 3. Time-Series Database Efficient handling of time-based data with partitioning & extensions like TimescaleDB. 🔹 4. Graph Database Run graph queries using Recursive CTEs and extensions like Apache AGE. 🔹 5. Geospatial Database Perform location-based queries with PostGIS. 🔹 6. Full-Text Search Built-in search using tsvector and tsquery with ranking & stemming. 🔹 7. Message Queue Use LISTEN/NOTIFY for lightweight event-driven systems. 🔹 8. Key-Value Store Leverage JSONB, hstore, and GIN indexes for fast lookups. 🔹 9. Vector Database Store embeddings and perform similarity search using pgvector. 🔹 10. Cron Scheduler Automate jobs inside DB using pg_cron. 💡 PostgreSQL isn’t just a database — it’s a multi-model powerhouse that can replace multiple tools in your architecture. 👉 If you're building backend systems, mastering PostgreSQL can significantly simplify your stack. #PostgreSQL #Database #BackendDevelopment #SystemDesign #SoftwareEngineering
Like Comment
To view or add a comment, sign in
Vijay Papanaboina
2w
Report this post
Database Indexing: A High-Level Explanation Your query worked fine yesterday. Today it is painfully slow. At small scale, databases can scan an entire table and the cost is barely noticeable. As data grows, that sequential scan increasingly dominates execution time. This shift in access cost is the core problem indexing addresses. An index is a separate data structure that helps the engine locate rows more efficiently. Like a book index, it allows the database to narrow the search space instead of examining every record. The engine maintains this structure and uses it to map searchable values to row locations. B-tree indexes are the default in most relational systems. They keep keys sorted and are structured to maintain shallow depth, allowing lookups to scale logarithmically as datasets grow. Because ordering is preserved, they support range conditions and ORDER BY operations naturally. Hash indexes trade ordering for faster equality lookups. They can be effective for exact matches but do not support ranges or sorting. For that reason, they are situational rather than general-purpose. Clustered indexes store table data in index order, shaping how rows are physically organized. Only one clustered index can exist per table. Non-clustered indexes, by contrast, store keys and references back to the underlying rows. That additional lookup step can still be beneficial when it significantly reduces the amount of data scanned. Composite indexes span multiple columns. Column order matters: the leftmost prefix rule determines which query patterns can take advantage of the structure. A well-designed composite index can often replace several single-column indexes. Indexes introduce trade-offs. They improve read efficiency but add write overhead. They consume storage and may require maintenance over time. Index columns that are frequently filtered, joined, or sorted. Prefer high-cardinality columns where selectivity meaningfully reduces search space. Index foreign keys to keep joins efficient. Avoid indexing tiny tables or low-cardinality flags. Be cautious with heavy indexing on write-intensive workloads such as logs or event streams. For wide text fields, consider partial or full-text indexing strategies. Measure first. Add the index second. Verify the execution plan always. #Database #DatabaseIndexing #SQL #PostgreSQL #MySQL #BackendDevelopment #SystemDesign #DevOps #DistributedSystems #Infrastructure #CloudEngineering
Like Comment
To view or add a comment, sign in
Rejith Krishnan
3w Edited
Report this post
Database migrations don't have to be a war of attrition. We just published a detailed case study on migrating Microsoft's WideWorldImporters OLTP database from SQL Server to PostgreSQL 15 using AI agents. The schema wasn't trivial: 26 sequences, 21 tables, 47 stored procedures, and all the edge cases that come with real enterprise workloads, including temporal tables, computed columns, and complex JSON update patterns. Instead of manual conversion, we built a toolchain of Claude Code slash commands, each handling a discrete stage of the migration pipeline. The result was repeatability at a level manual migrations simply can't match. Every transformation decision was captured in companion audit files. Smoke tests validated each function within minutes against live PostgreSQL containers. And the pipeline scales to hundreds of objects without accumulating technical debt. 🔹 Dependency analysis and DDL conversion handled by purpose-built slash commands ⚡ 45 stored procedures translated from MSSQL OPENJSON patterns to PostgreSQL jsonb_to_record 🧠 Temporal tables, computed columns, and PostGIS types converted with documented semantic decisions 🚀 Container-based smoke tests validated every function before any code was committed ✅ Full audit trail: every decision captured in reviewable markdown files #AgenticAI #EnterpriseArchitecture #PostgreSQL #DatabaseMigration Tagging some brilliant minds in this space: Rajagopal Nair Arvind Mehrotra Dr. Anil Kumar P Pradeep Chandran Rashid Siddiqui Ancy Paul
1 Comment
Like Comment
To view or add a comment, sign in
Manish Singh Bisht
1w
Report this post
You don't need a separate vector database. You already have one — it's called PostgreSQL. Enter pgvector — the open-source extension that turns your Postgres instance into a fully capable vector store, right alongside your relational data. Here's why engineers are choosing it: - Semantic search without the infrastructure tax Store embeddings (OpenAI, Cohere, Sentence Transformers — your choice) and query by cosine similarity, L2 distance, or inner product. No new DB to spin up, no new ops burden. - Hybrid queries in a single SQL statement Filter by user_id, date range, AND semantic similarity — all in one query. Try doing that cleanly with a standalone vector DB. - HNSW & IVFFlat indexes pgvector ships with approximate nearest-neighbor (ANN) indexing. HNSW gives you fast, high-recall search at scale. IVFFlat is great for memory-constrained environments. - Transactions, RBAC, backups — all inherited Your vectors live in the same ACID-compliant, battle-tested system as the rest of your data. No sync jobs. No consistency nightmares. - Real use cases shipping today: → RAG pipelines (retrieval-augmented generation) → Product recommendation engines → Duplicate detection & deduplication → Semantic document search The catch? At very large scale (billions of vectors), dedicated vector DBs like Pinecone or Weaviate still have an edge. But for most production workloads — pgvector is more than enough, and the operational simplicity wins every time. If you're already on Postgres, there's zero reason not to try it. #PostgreSQL #pgvector #VectorDatabase #MachineLearning #RAG #AIEngineering #BackendEngineering #DatabaseEngineering #LLM #OpenSource
Like Comment
To view or add a comment, sign in
Matty Stratton
1w
Report this post
Every Postgres queue built on SKIP LOCKED + DELETE eventually turns into a VACUUM problem. You ship it, it works for a week, and then dead tuples start piling up. Index bloat. Autovacuum chasing its tail. The dashboard that was green last Tuesday is suddenly the reason you're on a call at 9pm. Nikolay Samokhvalov just shipped PgQue, a resurrection of PgQ (the queue architecture Skype built for messaging hundreds of millions of users back in the day). Pure PL/pgSQL. One SQL file. pg_cron to tick. No sidecar daemon. The trick is snapshot-based batching plus TRUNCATE-based table rotation instead of per-row deletes. Rotate partitions, truncate the old one, done. No bloat because there are no dead tuples to clean up. The tradeoff is end-to-end delivery latency in the 1-2 second range, which for plenty of workloads is fine. https://lnkd.in/gC6HTbfP This is the kind of thing I love about the Postgres ecosystem. Someone looked at an architectural pattern that's been quietly battle-tested for a decade, noticed the zero-bloat property, and packaged it as SQL you can read in an afternoon. No new infrastructure. No vendored runtime. Just the database you already run. What's your current queue setup looking like? #PostgreSQL #DatabaseEngineering #Backend #OpenSource

GitHub - NikolayS/pgque: PgQue – Zero-bloat Postgres queue. One SQL file to install, pg_cron to tick. github.com

1 Comment
Like Comment
To view or add a comment, sign in
OpenSource DB

1,894 followers
2w
Report this post
Pop quiz: what happens when Postgres runs out of transaction IDs? A) Queries slow down B) You get a warning in the logs C) The entire database goes read-only and your application stops working The answer is C. And it's not a theoretical edge case. It's a countdown that has ticked on every Postgres database that isn't vacuuming properly. Welcome to Week 2 of April Data Drops and its our very own birthday gal Chandini kurada talking about VACUUM. Here's the short version: Postgres uses a multi-version concurrency model. When you update or delete a row, the old version sticks around. These dead tuple versions pile up. VACUUM is the process that cleans them out. Autovacuum does this automatically. In theory. In practice, autovacuum's defaults are tuned for politeness, not performance. On big, busy tables, it falls behind. Dead tuples stack up. Tables bloat. Performance degrades gradually — so gradually you don't notice until it's a crisis. And if VACUUM falls far enough behind, Postgres starts running out of usable transaction IDs. When the counter gets close to wrapping around, Postgres does the only safe thing it can: It shuts down all writes. Completely. We saw a SaaS company 48 hours away from this exact scenario. A 200GB table. Autovacuum falling behind for weeks. Nobody noticed until the warning showed up in the logs. 48 hours from total write shutdown on a production database. Today's video: → How VACUUM and autovacuum actually work under the hood → Queries to check if your tables are falling behind right now → Which autovacuum settings to tune (and what to set them to) → The wraparound doomsday clock and how to keep it far from midnight VACUUM is not optional. It's oxygen. #AprilDataDrops #PostgreSQL #DataDrop8 #VACUUM #Database #DevOps #OpenSourceDB

1 Comment
Like Comment
To view or add a comment, sign in
Postgres Women India

1,997 followers
4d Edited
Report this post
Shoutout to Chandini kurada from OpenSource DB for breaking down a critical topic that often gets ignored until it’s too late. Pop quiz: what happens when Postgres runs out of transaction IDs? A) Queries slow down B) You get a warning in the logs C) The entire database goes read-only and your application stops working The answer is C. And it’s not theoretical — it’s a ticking clock in every system where VACUUM isn’t keeping up. Most teams rely on autovacuum and assume it will handle everything. But in real-world workloads, it often falls behind. Dead tuples build up. Tables bloat. Performance drops — slowly, then suddenly. And if it falls too far behind? Postgres runs out of usable transaction IDs → and shuts down all writes. Completely. In this Data Drop, Chandini covers: → How VACUUM and autovacuum work under the hood → How to check if your tables are falling behind → What to tune in autovacuum → How to avoid transaction ID wraparound VACUUM is not optional. It’s oxygen. We are happy to share this #AprilDataDrops initiative of our supporting partner OpenSource DB — one PostgreSQL video, every day this April. Check all Data Drop videos here: https://lnkd.in/gqidmmQp Aarti NR | Kalyani M | Keerthi Seetha | Praveena Sivasankar #PostgresWomenIndia #AprilDataDrops #PostgreSQL #DataDrop8 #VACUUM #autovacuum #performance #WomenInTech

OpenSource DB

1,894 followers
2w

Pop quiz: what happens when Postgres runs out of transaction IDs? A) Queries slow down B) You get a warning in the logs C) The entire database goes read-only and your application stops working The answer is C. And it's not a theoretical edge case. It's a countdown that has ticked on every Postgres database that isn't vacuuming properly. Welcome to Week 2 of April Data Drops and its our very own birthday gal Chandini kurada talking about VACUUM. Here's the short version: Postgres uses a multi-version concurrency model. When you update or delete a row, the old version sticks around. These dead tuple versions pile up. VACUUM is the process that cleans them out. Autovacuum does this automatically. In theory. In practice, autovacuum's defaults are tuned for politeness, not performance. On big, busy tables, it falls behind. Dead tuples stack up. Tables bloat. Performance degrades gradually — so gradually you don't notice until it's a crisis. And if VACUUM falls far enough behind, Postgres starts running out of usable transaction IDs. When the counter gets close to wrapping around, Postgres does the only safe thing it can: It shuts down all writes. Completely. We saw a SaaS company 48 hours away from this exact scenario. A 200GB table. Autovacuum falling behind for weeks. Nobody noticed until the warning showed up in the logs. 48 hours from total write shutdown on a production database. Today's video: → How VACUUM and autovacuum actually work under the hood → Queries to check if your tables are falling behind right now → Which autovacuum settings to tune (and what to set them to) → The wraparound doomsday clock and how to keep it far from midnight VACUUM is not optional. It's oxygen. #AprilDataDrops #PostgreSQL #DataDrop8 #VACUUM #Database #DevOps #OpenSourceDB
Like Comment
To view or add a comment, sign in

893 followers

10 Posts

View Profile Follow

Database Sharding & Partitioning Strategies for High QPS

More Relevant Posts

Explore related topics

Explore content categories