"Just use Postgres." Modern software engineering has become a subscription management simulator. I finally stopped the madness and consolidated most of my specialized infrastructure into a single source of truth: Postgres. Postgres has been in active development for three decades. It is basically the Skyrim of databases, a rock-solid foundation you can mod until it replaces your entire stack. The technical reality: NoSQL: JSONB + GIN indices give you document-store flexibility with ACID compliance. Search: TS_VECTOR handles full-text search. I am glad I am not the only one who realized Elasticsearch is usually an expensive layer of overkill. Vector DB: pgvector with HNSW indices solves the hybrid search problem natively. Message Queue: FOR UPDATE SKIP LOCKED creates reliable queues without adding a new service. Time Series: Partitioning + BRIN indices handle massive telemetry without the B-tree bloat. API Layer: Row-Level Security (RLS) can eliminate hundreds of lines of boilerplate middleware. The result is one connection string, one backup strategy, and zero distributed consistency headaches. Stop over-engineering for Google-scale problems you do not have yet. Pick the tool that has been battle-tested since the 90s and just start shipping.
Ditch the complexity, use Postgres
More Relevant Posts
-
I've been heads-down building for a while and haven't shared much here. Changing that. Here's something I keep running into: teams reach for new infrastructure when the real problem is their queries. At a previous company, our API response times were climbing. The conversation started drifting toward caching layers, read replicas, maybe a new service. Before any of that, I spent a day with EXPLAIN ANALYZE and pg_stat_statements. What I found: → A few joins were scanning full tables because indexes didn't match the actual query patterns in production → One N+1 had been there so long everyone assumed "that endpoint is just slow" → A couple of queries were sorting in Ruby that PostgreSQL could have sorted faster itself Three changes. No new infrastructure. API response times dropped by over 60%. The lesson I keep relearning: Most performance problems aren't architecture problems. They're query problems. And query problems are cheap to fix if you measure before you redesign. If your API feels slow, run EXPLAIN ANALYZE before you add a service. You might save yourself months. Everyone's talking about AI-powered observability. Meanwhile, EXPLAIN ANALYZE is free and tells you exactly what's wrong. #postgresql #backendengineering #softwareengineering
To view or add a comment, sign in
-
The observability conversation has matured a lot over the past few years. We talk about instrumenting applications, tracing microservices, monitoring Kubernetes clusters. OpenTelemetry pipelines. SLOs on APIs. Databases are mostly absent from that conversation. Which is strange because 83% of production incidents are root-caused in the database layer. Not the app, not the infra but the database. Slowly, a query plan changes after a deployment. A connection pool saturates at 3am. Replication lag builds up silently. Autovacuum gets blocked. A service account starts reading tables it never touched before. None of that shows up in your Kubernetes dashboard. Database observability is a different discipline. It requires query-level signals, not just resource metrics. Audit trails, not just error logs. Traces that go deep enough to capture the actual SQL, the rows examined, the lock wait time. And when you add the security and compliance dimension ( who accessed what, when from where) it becomes something most teams haven't thought about at all. I've been working on a reference guide (a compilation of my notes, my reading) that covers this end to end: PostgreSQL, MySQL, Redis, MongoDB, VictoriaMetrics, OTel instrumentation, security signals, SOC2/GDPR/PCI-DSS mapping and nine incident playbooks from slow queries to data exfiltration. Link to the guide in comment and all remarks are welcome (it's the V1) Percona MariaDB PostgreSQL Global Development Group MongoDB #observability #devops #sre
To view or add a comment, sign in
-
I recently re-architected a core part of my system (CodeSM) while migrating from MongoDB to Postgres — and it exposed some flawed design decisions I had been ignoring. Old Architecture (Submission Flow) User clicks Submit Create submission entry in DB Enqueue job (BullMQ + Redis) Worker: Downloads testcases from S3 Spins up Docker container Compiles & executes code Stores result in job_results table This worked well for SUBMIT, but I handled RUN very differently. Previous RUN Design (Flawed) For the "Run Code" button: ❌ No DB entry ❌ Directly pushed payload to queue: { "code": "...", "language": "...", "problemId": "..." } Why? Workers had no persistent source of truth. Problems: No debugging or traceability No retry capability Two separate execution pipelines (RUN vs SUBMIT) Harder to scale and maintain New Architecture (Unified & Scalable) I redesigned the system to treat RUN and SUBMIT the same way: ✅ Create a submission entry for both ✅ Add mode = RUN | SUBMIT ✅ Enqueue only submissionId Worker now: Fetches data from DB Executes based on mode Updates submission state New Challenge RUN generates a large amount of temporary data. Solution: Mark RUN submissions as temporary Add cleanup job (cron) Delete entries older than 1–6 hours Key Takeaways Don’t create separate pipelines for similar workflows Persist minimal identifiers, not full payloads Design for retries and debugging from day one Temporary data still needs lifecycle management This redesign made the system: More consistent Easier to debug More scalable under load Still iterating, but this was a big shift from “making it work” to designing for scale and reliability. Sharing the system design diagram as well — would love to hear feedback.
To view or add a comment, sign in
-
-
Our API response time jumped from 120ms → 600ms overnight. No code deployed. No infra change. No incidents reported. Just... slower. Here’s how I debugged it in 40 minutes 👇 Step 1: Isolate the symptom CloudWatch showed the spike started at 11:42 PM. But here’s the interesting part: P95 latency spiked P50 stayed normal That usually means: Large payloads Heavy queries Edge-case traffic Not a full-system slowdown. Step 2: Eliminate the usual suspects I checked the obvious first: Lambda cold starts? ❌ Warm instances were also slow DB connection pool? ❌ Only 42% utilized External APIs? ❌ Not in the request path That narrowed it down to one likely culprit: ➡️ The database query itself. Step 3: Inspect the query plan Ran EXPLAIN ANALYZE on the main trade lookup query. Result: Sequential scan on 2.1M rows Estimated cost: 48,000 Index was no longer being chosen Why? As the table grew, PostgreSQL recalculated cost estimates and changed the execution plan automatically. Silent. Invisible. Expensive. Step 4: Fix it Added a composite index on: (user_id, created_at DESC) Immediately after: Query planner switched to Index Scan P95 dropped from 600ms → 89ms The real lesson Your system can break without deployments. Because performance bugs often come from: Data growth Query planner decisions Traffic shape changes Hidden thresholds EXPLAIN ANALYZE isn’t just an optimization tool. It’s a production survival tool. And if you’re not tracking P95 latency, you’re blind to what power users are experiencing. My takeaway As systems scale, the code may stay the same — but behavior changes. That’s where engineering gets interesting. Curious: what’s the sneakiest production bug you’ve debugged? Drop it in the comments 👇 (Real stories only — those are always the best lessons.) If this was useful, repost it so more engineers see it. #PostgreSQL #BackendEngineering #NodeJS #SystemDesign #SoftwareEngineering #Debugging #AWS #Performance #DevOps
To view or add a comment, sign in
-
-
🚀 𝐁𝐮𝐢𝐥𝐭 𝐚 𝐏𝐫𝐨𝐝𝐮𝐜𝐭𝐢𝐨𝐧-𝐒𝐭𝐲𝐥𝐞 𝐁𝐚𝐜𝐤𝐞𝐧𝐝 𝐢𝐧 𝐆𝐨 𝐰𝐢𝐭𝐡 𝐏𝐨𝐬𝐭𝐠𝐫𝐞𝐒𝐐𝐋 (𝐃𝐨𝐜𝐤𝐞𝐫𝐢𝐳𝐞𝐝) Recently I moved from an in-memory store to a real DB and went beyond basic DB connectivity to build a near production-style backend service in Go. 🔧 𝐓𝐞𝐜𝐡 𝐒𝐭𝐚𝐜𝐤: Go (net/http, database/sql) PostgreSQL (running via Docker) REST API 🧱 𝐖𝐡𝐚𝐭 𝐈 𝐢𝐦𝐩𝐥𝐞𝐦𝐞𝐧𝐭𝐞𝐝: 🔹 Dockerized Database Ran PostgreSQL using Docker 🔹 Clean Architecture Structured project as: main → handler → store → database Separated HTTP logic from database layer 🔹 Database Layer Used INSERT, RETURNING id for efficient writes Implemented: QueryRow for single-row queries Query for multi-row queries 🔹 Production Practices Context-aware DB calls (context.WithTimeout) Connection pooling (SetMaxOpenConns, etc.) Proper error handling (avoiding log.Fatal in business logic) 🔹 API Endpoints POST /products → create product GET /products → fetch all products 💡 𝐊𝐞𝐲 𝐋𝐞𝐚𝐫𝐧𝐢𝐧𝐠𝐬: Difference between driver vs database/sql Why RETURNING matters in PostgreSQL How real backend services are structured 📈 𝐖𝐡𝐚𝐭’𝐬 𝐧𝐞𝐱𝐭: Transactions (for real-world scenarios like payments) Exploring pgx for high-performance database access This project helped me bridge the gap between “it works” and “it’s production-ready.” #golang #postgresql #docker #backend #softwareengineering #learninginpublic
To view or add a comment, sign in
-
-
In databases… “Almost correct” is completely wrong. That's why 𝗧𝗿𝗮𝗻𝘀𝗮𝗰𝘁𝗶𝗼𝗻𝘀 matter. This is exactly why databases like PostgreSQL take transactions very seriously. Because in real-world systems, 𝗽𝗮𝗿𝘁𝗶𝗮𝗹 𝘀𝘂𝗰𝗰𝗲𝘀𝘀 = 𝘁𝗼𝘁𝗮𝗹 𝗳𝗮𝗶𝗹𝘂𝗿𝗲. So what is a Transaction? A transaction is a group of operations that either completely succeed or completely fail. No in-between. Example: Transferring ₹1000 from A → B Deduct from A Add to B (failed) Without transaction → Data is inconsistent With transaction → Everything is rolled back This is powered by 𝗔𝗖𝗜𝗗 properties: 🔹 A – Atomicity (All or Nothing) Either the entire transaction happens… or none of it. 🔹 C – Consistency (Valid State Always) Database always moves from one correct state to another. 🔹 I – Isolation (No Interference) Multiple transactions don’t mess with each other. 🔹 D – Durability (Permanent Changes) Once committed, data stays — even after crashes. Why PostgreSQL stands out: • Strong ACID compliance • Reliable transaction handling • Used in systems where data integrity is critical Real insight Bugs can be fixed. UI can be redesigned. But 𝗰𝗼𝗿𝗿𝘂𝗽𝘁𝗲𝗱 𝗱𝗮𝘁𝗮? That’s a nightmare. Transactions are not just a feature… They’re your safety net. Next time you write a query, don’t just think “does it work?” Think… “what if it 𝗳𝗮𝗶𝗹𝘀 𝗵𝗮𝗹𝗳𝘄𝗮𝘆?” #PostgreSQL #Database #RDBMS #ACID #Transactions #BackendDevelopment #SoftwareEngineering #SQL #DataIntegrity #Developers #CoreJava #SpringFramework #SpringBoot #Hibernate #ORM #MicroServices #aswintech
To view or add a comment, sign in
-
I hit a wall scaling my system for CodeSM — and it forced me to rethink everything. After running load tests, things started breaking in ways I couldn’t ignore: Concurrency issues under parallel requests Docker container startup bottlenecks Worker execution delays System slowdown as load increased At first, I thought it was just infra tuning. It wasn’t. The real problem? My data layer wasn’t built for this scale. I was using MongoDB — which worked fine early on. But as concurrency increased, query patterns and consistency became a bottleneck. So I made a decision: 👉 I’m moving the system to Postgres. Not because “SQL is better” — but because: I need stronger consistency guarantees Better control over complex queries Predictable performance under concurrency This isn’t just a database migration. It’s a shift in how I think about system design. Now the focus is clear: Optimize queries, not just code Design for concurrency, not just features Fix bottlenecks at the architecture level I’m deliberately walking into these problems instead of avoiding them.
To view or add a comment, sign in
-
🚀 Rethinking Backend Responsibility with Row Level Security (RLS) Recently, I explored how Row Level Security (RLS) in PostgreSQL can fundamentally change the way we design application backends. Traditionally, access control is handled at the backend layer — APIs decide what a user can read or modify. But with RLS, this responsibility can be enforced directly at the database level. You can define fine-grained policies that control which rows a user is allowed to access. So even if a client communicates directly with the database, it doesn’t imply unrestricted access. 👉 The database itself becomes the gatekeeper. This led to an interesting realization: • Not every application needs a heavy backend for simple CRUD use cases • Direct client → database interaction can be safe with properly defined policies • Security is not about hiding the database, but about controlling access That said, backend systems are still essential for: • Rate limiting • Caching • Queues • Complex business logic So it’s not about eliminating the backend — it’s about choosing the right level of abstraction based on the problem. This perspective really helped me think more clearly about when to rely on backend logic and when to push responsibility into the database. Learned this while following insights from Piyush Garg sir and exploring the official docs: 📖 https://lnkd.in/gF5kU4E9 Still exploring deeper into policy design and real-world applications of RLS. #postgresql #backend #systemdesign #security #webdevelopment #developers
To view or add a comment, sign in
-
Today we're open sourcing the Xata platform, a cloud-native Postgres platform. Apache 2.0. * Fast copy-on-write branching. * Automatic scale-to-zero and wake-up on new connections. * 100% Vanilla Postgres. We run upstream Postgres, no modifications. * Production grade: HA, failover/switchover, upgrades, PITR, IP filtering, etc. I’m obviously biased, but I think this is the ideal product for the coding agents era. Code is cheap now, but validating it with realistic data is not simple. Synthetic or seeded data is limited, copying the data with pg_dump takes ages. Also, the more agents do dev work for you, the more important scale-to-zero is. Blog post with technical details: https://lnkd.in/dZDethgK GitHub repo: https://lnkd.in/dEiFqivZ Stars on GitHub are appreciated!
To view or add a comment, sign in
-
Multi-master replication in Postgres sounds great until you hit the operational complexity. This is a solid attempt at abstracting that away and making it usable in real systems. Definitely worth checking out if you care about scaling beyond a single writer.
PostgreSQL has natively supported logical replication since version 10. But while those foundational primitives have been there for years, actually configuring a true, active-active multi-master setup has remained painfully manual. Once you choose to move beyond the single-writer bottleneck, you immediately encounter the challenge of full-mesh topology. You have to manually configure, sync, and handle conflict resolution across an increasingly fragile web of database nodes. Reading through the OpenAI blog released in January 2026 reinforced my thoughts on the complexity of multi-master systems and why many teams avoid them. Before that blog release, I had been tinkering with a simple multi-writer system on Docker Desktop from my personal computer. The result of that experimentation led me to build pgconverge. Pgconverge is an open-source CLI tool designed to automate multi-master logical replication in Postgres. It abstracts away the heavy lifting of node synchronisation and the dreaded n(n-1) complexity so you can focus on scaling your infrastructure, not writing custom replication scripts. I have documented what I learned while trying to build Pgconverge into a 7-part series. I have released the first two articles and will be rolling out the remaining five over the coming days. GitHub: https://lnkd.in/e74jx7hu Why Multi-Master? The Problem with Single-Writer Databases: https://lnkd.in/esavkuhu Inside Pgconverge: Navigating the N×(N-1) Complexity of Full Mesh Replication: https://lnkd.in/eSDNMfrp You can also read OpenAI’s blog on how they scaled a single-writer PostgreSQL database to power ChatGPT at massive scale : https://lnkd.in/ew2U58Ct I would love for fellow infrastructure and backend engineers to break it, test it, and share feedback. #PostgreSQL #DistributedSystems #DatabaseArchitecture #BackendEngineering #SystemDesign
To view or add a comment, sign in
-
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development
you should write more often!