Ruslan Mukhamadiarov’s Post

🔥 𝗗𝗜𝗦𝗧𝗜𝗡𝗖𝗧 𝗱𝗶𝗱𝗻’𝘁 𝗳𝗶𝘅 𝘆𝗼𝘂𝗿 𝗾𝘂𝗲𝗿𝘆. 𝗜𝘁 𝗷𝘂𝘀𝘁 𝗵𝗶𝗱 𝘁𝗵𝗲 𝗽𝗿𝗼𝗯𝗹𝗲𝗺. You saw duplicates → added DISTINCT → result “looks correct”. But the database still did the wrong work. Here are 4 real cases where DISTINCT lies to you 👇 1️⃣ 𝗢𝗻𝗲-𝘁𝗼-𝗺𝗮𝗻𝘆 𝗝𝗢𝗜𝗡 (𝗶𝗻𝘀𝘁𝗲𝗮𝗱 𝗼𝗳 𝗘𝗫𝗜𝗦𝗧𝗦) SELECT o.* FROM orders o JOIN order_items oi ON oi.order_id = o.id WHERE oi.status = 'ACTIVE'; 🤕 duplicates 🚑 DISTINCT 🔍 Problem: You multiplied rows. One order → many items → many rows. ✅ Fix: WHERE EXISTS (...) 👉 If you don’t need child data — don’t JOIN it. 2️⃣ 𝗖𝗮𝗿𝘁𝗲𝘀𝗶𝗮𝗻 𝗺𝘂𝗹𝘁𝗶𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻 (𝗺𝘂𝗹𝘁𝗶𝗽𝗹𝗲 𝗝𝗢𝗜𝗡𝘀) FROM orders o JOIN order_items oi ON oi.order_id = o.id JOIN payments p ON p.order_id = o.id 3 items × 4 payments = 12 rows 🤕 data explosion 🚑 DISTINCT 🔍 Problem: You multiplied relationships. ✅ Fix: WHERE EXISTS (...) AND EXISTS (...) or split queries. 👉 Multiple one-to-many JOINs = red flag. 3️⃣ 𝗝𝗢𝗜𝗡 𝗙𝗘𝗧𝗖𝗛 𝗲𝘅𝗽𝗹𝗼𝘀𝗶𝗼𝗻 (𝗝𝗣𝗔) SELECT o FROM Order o JOIN FETCH o.items JOIN FETCH o.payments 🤕 looks fine in Java 🚑 DISTINCT 🔍 Problem: SQL still does 3×4 = 12 rows. Hibernate deduplicates objects, not SQL work. ✅ Fix: fetch one collection only load others separately / batch 👉 JOIN FETCH hides the explosion, not removes it. 4️⃣ 𝗪𝗿𝗼𝗻𝗴 𝗝𝗢𝗜𝗡 𝗰𝗼𝗻𝗱𝗶𝘁𝗶𝗼𝗻 JOIN payments p ON p.user_id = u.id (should be order_id) 🤕 random duplicates 🚑 DISTINCT 🔍 Problem: Wrong relationship → quasi-cartesian result. ✅ Fix: verify FK verify cardinality 👉 DISTINCT can’t fix wrong logic. 🧠 𝗧𝗵𝗲 𝗽𝗮𝘁𝘁𝗲𝗿𝗻 In all cases: you created extra rows then removed them DISTINCT = post-processing, not a fix 💬 𝗧𝗮𝗸𝗲𝗮𝘄𝗮𝘆 If DISTINCT is fixing your result, your query is already wrong. Or simpler: You didn’t remove duplicates. You paid the database to create them and clean them up. #backend #java #sql #databases #performance #systemdesign #softwareengineering #jpa

5 Comments

Ghaith Estaif 2w

Thank you so much for the information I actually face one of these problems I joined a child table and that lead to many rows instead of one row and when I removed that table the problem was solved

1 Reaction

William Warne 2w

Great post. I’d add that a lot of these issues become far less likely when schema design is treated as a first-class concern from day one: correct PK/FK constraints, uniqueness rules, cardinality modeled properly, and indexes aligned to access patterns. Good constraints won’t replace query review though, and this is a great example of what to look out for.

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

JOptimize

731 followers
3w Edited
Report this post
OFFSET pagination is fine… until page 10,000 Your SQL query returns 50 rows. So why does it get slower every week? Because the database is not optimizing for the 50 rows you keep. It is paying for the rows you skip. In a Java + Spring Boot API audit, I saw an admin endpoint paginating a large orders table like this: SELECT id, customer_id, status, created_at FROM orders ORDER BY created_at DESC LIMIT 50 OFFSET 500000; Application side looked harmless: PageRequest.of(page, 50, Sort.by(Sort.Direction.DESC, "createdAt")); At first, everything was fine. Then the table grew to around 12 million rows. And deep pagination became a real problem. 🚨 What happened For early pages: page 1: fast page 10: still fine page 100: acceptable But for deep pages, performance degraded badly: page 1 → ~40 ms page 1000 → ~220 ms page 10000 → ~1.8 s to 3 s Even though the API still returned only 50 rows. Why? Because with offset pagination, the database often has to: scan/index-walk through a huge number of rows discard the first N rows only then return the next 50 So OFFSET 500000 does not mean: “jump instantly to row 500001” It often means: “walk past 500000 rows, then give me 50” 💥 Real production impact On this endpoint: p95 latency kept increasing as data grew DB CPU spiked during heavy admin usage replicas got unnecessary read pressure users felt the app was “randomly slow” The worst part: it looked perfectly fine in dev with a small dataset. ❌ Typical Java code Page<Order> page = orderRepository.findAll( PageRequest.of(pageNumber, 50, Sort.by("createdAt").descending()) ); This is convenient. But on very large tables, convenience gets expensive. ✅ Better approach: keyset pagination Instead of asking for: page 10000 Ask for: the next 50 rows after this last seen value Example: SELECT id, customer_id, status, created_at FROM orders WHERE created_at < :lastCreatedAt ORDER BY created_at DESC LIMIT 50; And if created_at is not unique, use a tie-breaker: SELECT id, customer_id, status, created_at FROM orders WHERE (created_at, id) < (:lastCreatedAt, :lastId) ORDER BY created_at DESC, id DESC LIMIT 50; ✅ Why it scales better With a proper index like: INDEX (created_at DESC, id DESC) the database can seek directly to the next position. That means: no huge skip cost more stable latency much better behavior on large datasets ⚠️ Important nuance Offset pagination is not always wrong. It is often fine for: small tables internal tools with low volume shallow pagination cases where jumping directly to page 7 matters more than raw performance But for: millions of rows infinite scroll APIs under load deep historical navigation keyset pagination usually wins. 🧠 Takeaway OFFSET pagination does not fail when your query gets bigger. It fails when your data gets bigger. Are you still paginating large tables with LIMIT ... OFFSET ... in production? https://www.joptimize.io/ #JavaDev #SpringBoot #PostgreSQL #JavaPerformance #Backend
1 Comment
Like Comment
To view or add a comment, sign in
Yuvaraj V
4w
Report this post
Day 21: 🧑💻 Database Sharding Horizontal Partitioning to Scale Beyond One Server (Java + Spring Boot) What Is Database Sharding? Sharding is horizontal partitioning — splitting a large dataset across multiple databases (shards), each holding a subset of the data. A shard key determines which shard a record belongs to. Without Sharding: 500M users → 1 database server Full table scan → slow Index too large for RAM → slow Max: vertical scale (bigger machine = expensive) With Sharding: 500M users → 4 shard databases Shard 0: userId hash 0-24% → 125M rows Shard 1: userId hash 25-49% → 125M rows Shard 2: userId hash 50-74% → 125M rows Shard 3: userId hash 75-99% → 125M rows Each query hits ONE shard → fast ✅ 🏗️ Four Sharding Strategies 1. Hash Sharding (Most Common) java int shardIndex = Math.abs(userId.hashCode() % totalShards); // userId "u-123" → hash → shard 2 // userId "u-456" → hash → shard 0 // Even distribution, no hotspots 2. Range Sharding java // userId < 1M → Shard 0 // userId 1M-2M → Shard 1 // userId > 2M → Shard 2 // Good for time-series: Jan→Mar shard 0, Apr→Jun shard 1 3. Directory Sharding java // Lookup table: userId → shardId Map<String, Integer> shardDirectory = cache.get("shard-map"); int shardId = shardDirectory.get(userId); // Flexible but directory = single point of failure 4. Geo Sharding java // India users → Shard ap-south-1 // US users → Shard us-east-1 // Europe users → Shard eu-west-1 // Low latency for users in same region Configuration — 4 Shard DataSources yaml # application.yml spring: datasource: shard0: url: jdbc:postgresql://shard0-db:5432/users username: app shard1: url: jdbc:postgresql://shard1-db:5432/users username: app shard2: url: jdbc:postgresql://shard2-db:5432/users username: app shard3: url: jdbc:postgresql://shard3-db:5432/users username: app Key Takeaways: Sharding = split data horizontally across multiple databases Shard key = the field used to decide which shard (choose carefully!) Hash sharding = most common — even distribution, no range queries Same key = same shard — always route same userId to same shard Cross-shard = expensive — avoid COUNT, JOIN, ORDER BY across shards Try first: table partitioning → read replicas → THEN sharding Sharding is hard to undo — choose shard key once and commit Rule: if a single table > 100M rows AND performance is suffering → consider sharding #JavaInProduction #RealWorldJava #Java #SpringBoot #BackendDevelopment #ProductionIssues #DataStructures #DSA #SystemDesign #SoftwareEngineering #JavaDeveloper #Programming #Sharding #Database
Like Comment
To view or add a comment, sign in
Kushal Unune
1w Edited
Report this post
How SQL might lie to you. The short version: SQL equality checks ignore trailing spaces. Java doesn't. Your DB client hides them. If you don't sanitize at system boundaries, dirty data will silently break your pipeline while tests pass. The investigation trail: Steps 1-2 - No errors, data "looked" perfect. A batch service produced zero output with no exceptions, just a log: 0 records processed. The natural assumption was an empty input. But a direct query WHERE CATEGORY = 'Active' returned thousands of rows. Even a DISTINCT query check looked clean. The database said: the data is here and it's fine. Steps 3-4 - The code was fine. The test data wasn't. Since data existed, I suspected the Java code. But the logic was a simple "Active".equals(j.getCategory()). So I seeded the integration tests, copying rows straight from the DB client's UI grid. Every test passed. What I only realized later: that was the contamination point. DB clients silently strip trailing spaces on display, so I had handed the test suite pre-sanitized strings. Step 5 - Inspecting the raw payload. If the DB had data and the code worked, something in between was wrong. Checking the raw JSON payload revealed it: the value wasn't "Active" it was "Active ". Every field had trailing spaces. This was a VARCHAR column, VARCHAR doesn't pad, so the spaces had been physically inserted by an upstream ETL. Step 6 - Why did the DB hide this? Per the ANSI SQL standard, when comparing strings of unequal length with =, the shorter string is space-padded to match. So 'Active ' = 'Active' evaluates to TRUE. The database wasn't lying exactly it was following spec. But the spec is surprising. Step 7 - The full delivery chain: DB stores dirty VARCHAR strings JDBC extracts the exact dirty string Spring JdbcTemplate maps it directly to the DTO Jackson serializes it verbatim No layer sanitizes. But "Active".equals("Active ") returns FALSE. Every record was silently filtered out. Key takeaways: 1️⃣ SQL and Java equals() speak different languages. SQL space-pads for comparison; Java is strictly character-by-character. 2️⃣ Standard SQL diagnostics lie. DISTINCT, =, and GROUP BY are all space-insensitive. Detect dirty data with LENGTH(col) != LENGTH(RTRIM(col)). 3️⃣ DB clients sanitize display output. UI-copied test data will pass against clean strings while prod fails against dirty ones. 4️⃣ Trim at the boundary, not in the logic. Configure Jackson or your RowMapper to trim at deserialization don't scatter .trim() calls across business logic. Over to you: What is the most frustrating "silent bug" you've chased down? Where do you enforce data sanitization the DB, the API boundary, or business logic? #Java #DB2 #SQL #JDBC #Spring #JdbcTemplate #H2 #BDD #Debugging #DataEngineering #BackendEngineering

3 Comments
Like Comment
To view or add a comment, sign in
Sarvesh Kumar Singh
1w
Report this post
🚀 𝐋𝐚𝐳𝐲 𝐯𝐬. 𝐄𝐚𝐠𝐞𝐫 𝐋𝐨𝐚𝐝𝐢𝐧𝐠 𝐢𝐧 𝐇𝐢𝐛𝐞𝐫𝐧𝐚𝐭𝐞 𝐰𝐢𝐭𝐡 𝐒𝐩𝐫𝐢𝐧𝐠 𝐁𝐨𝐨𝐭: 𝐂𝐡𝐨𝐨𝐬𝐢𝐧𝐠 𝐭𝐡𝐞 𝐑𝐢𝐠𝐡𝐭 𝐒𝐭𝐫𝐚𝐭𝐞𝐠𝐲 Is your 𝐒𝐩𝐫𝐢𝐧𝐠 𝐁𝐨𝐨𝐭 𝐚𝐩𝐩𝐥𝐢𝐜𝐚𝐭𝐢𝐨𝐧'𝐬 𝐩𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞 𝐬𝐥𝐨𝐰𝐢𝐧𝐠 𝐝𝐨𝐰𝐧 as your database grows? The real problem may not be SQL indexes or server memory... It could be 𝐡𝐨𝐰 𝐇𝐢𝐛𝐞𝐫𝐧𝐚𝐭𝐞 𝐟𝐞𝐭𝐜𝐡𝐞𝐬 𝐲𝐨𝐮𝐫 𝐞𝐧𝐭𝐢𝐭𝐢𝐞𝐬. 👀 If you’re working with JPA/Hibernate, understanding 𝐋𝐚𝐳𝐲 𝐯𝐬. 𝐄𝐚𝐠𝐞𝐫 𝐋𝐨𝐚𝐝𝐢𝐧𝐠 can save you from slow APIs, memory bloat, and production surprises. Let’s break it down. 👇 🔍 𝐖𝐡𝐚𝐭 𝐢𝐬 𝐅𝐞𝐭𝐜𝐡𝐢𝐧𝐠 𝐢𝐧 𝐇𝐢𝐛𝐞𝐫𝐧𝐚𝐭𝐞? Fetching defines 𝐰𝐡𝐞𝐧 𝐫𝐞𝐥𝐚𝐭𝐞𝐝 𝐝𝐚𝐭𝐚 𝐢𝐬 𝐥𝐨𝐚𝐝𝐞𝐝 𝐟𝐫𝐨𝐦 𝐭𝐡𝐞 𝐝𝐚𝐭𝐚𝐛𝐚𝐬𝐞. For example: If you load an Author, should Hibernate also load all Books immediately? Or only when needed? That’s where 𝐄𝐚𝐠𝐞𝐫 and 𝐋𝐚𝐳𝐲 loading come in. 🟢 𝐄𝐚𝐠𝐞𝐫 𝐋𝐨𝐚𝐝𝐢𝐧𝐠 = 𝐅𝐞𝐭𝐜𝐡 𝐀𝐥𝐥 𝐚𝐭 𝐎𝐧𝐜𝐞 ✅ 𝐁𝐞𝐧𝐞𝐟𝐢𝐭𝐬: Easy to use No extra queries later Good when related data is always required ⚠️ 𝐑𝐢𝐬𝐤𝐬: Fetches unnecessary data Higher memory usage Slower initial query performance 🔵 𝐋𝐚𝐳𝐲 𝐋𝐨𝐚𝐝𝐢𝐧𝐠 = 𝐅𝐞𝐭𝐜𝐡 𝐎𝐧 𝐃𝐞𝐦𝐚𝐧𝐝 With 𝐋𝐚𝐳𝐲 , related data is loaded only when accessed in code. ✅ 𝐁𝐞𝐧𝐞𝐟𝐢𝐭𝐬: Better performance for large relationships Loads only required data Reduces unnecessary joins ⚠️ 𝐑𝐢𝐬𝐤𝐬: Can cause extra queries May throw LazyInitializationException if session is closed Needs proper query design 💻 𝐄𝐱𝐚𝐦𝐩𝐥𝐞 𝐢𝐧 𝐒𝐩𝐫𝐢𝐧𝐠 𝐁𝐨𝐨𝐭 @Entity public class Author { @Id @GeneratedValue private Long id; private String name; @OneToMany(mappedBy = "author", fetch = FetchType.LAZY) private List<Book> books; } @Entity public class Book { @Id @GeneratedValue private Long id; private String title; @ManyToOne(fetch = FetchType.EAGER) @JoinColumn(name = "author_id") private Author author; } 👉 𝐖𝐡𝐚𝐭 𝐡𝐚𝐩𝐩𝐞𝐧𝐬 𝐡𝐞𝐫𝐞? Author.books will load 𝐨𝐧𝐥𝐲 𝐰𝐡𝐞𝐧 𝐚𝐜𝐜𝐞𝐬𝐬𝐞𝐝. Book.author loads 𝗶𝗺𝗺𝗲𝗱𝗶𝗮𝘁𝗲𝗹𝘆 with Book. 💡 𝐏𝐫𝐨 𝐓𝐢𝐩: 𝐓𝐡𝐞 𝐍+𝟏 𝐒𝐞𝐥𝐞𝐜𝐭 𝐏𝐫𝐨𝐛𝐥𝐞𝐦 One of the biggest Hibernate performance killers is the N+1 Select Problem. How Lazy Helps: Prevents loading unnecessary child data by default. How Lazy Hurts: If you loop through entities carelessly, it triggers multiple queries Have you faced the dreaded LazyInitializationException in production? How did you solve it? 👇 Share your experience in the comments. 📩 Subscribe to 𝐒𝐩𝐫𝐢𝐧𝐠 𝐁𝐨𝐨𝐭 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐢𝐧𝐠 𝐃𝐢𝐠𝐞𝐬𝐭 for practical backend engineering insights, performance tips, and real-world Spring Boot strategies. 𝐅𝐫𝐨𝐦 𝐒𝐩𝐫𝐢𝐧𝐠 𝐁𝐨𝐨𝐭 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐢𝐧𝐠 𝐃𝐢𝐠𝐞𝐬𝐭 — 𝐛𝐲 𝐒𝐚𝐫𝐯𝐞𝐬𝐡 𝐊𝐮𝐦𝐚𝐫 𝐒𝐢𝐧𝐠𝐡
Like Comment
To view or add a comment, sign in
Muhammed .
3d
Report this post
Day 43-46/90 (4 Days progress) Why is a 1M-record database as fast as one with 10? The secret: Compound B-Tree Indexing. This week, I shifted from basic CRUD to Query Optimization and Database Performance. So what's B-Tree Indexing? In production staff modules, filtering multi-column mappings (user/status) creates massive overhead. Without a strategy, engines resort to Full Table Scans or Bitmap Heap Scans. At scale, this spikes Latency and kills Throughput, causing dashboard hangs under concurrency. Trade-offs: Indexes add Write Overhead (re-balancing trees on INSERT/DELETE), expand Storage Footprints, and consume Buffer Cache. Over-indexing risks inefficient Query Execution Plans where maintenance costs outweigh retrieval gains. Then why did I choose this? It's mathematically sound here. The module handles high GET volume for analytics with occasional POST operations. Prioritizing read speed ensures Scalability under peak loads. Implementations & Fixes: • Analytical Subqueries: Refactored logic via Subquery/OuterRef to kill the Cartesian Product trap. Offloading math to SQL achieved O(1) retrieval for multi-layer metrics. • Identity Mutation Protection: Enforced Data Immutability by overriding .update() to strip protected keys. Prevents unauthorized relationship shifts during PATCH requests. • Reactivation Guards: Used global validate() patterns to protect Soft-Delete integrity. Verifies parent entity (Dept/Qual) status before reviving child records. • Conditional Unique Constraints: Leveraged Django Q objects to solve the Soft-Delete Paradox. Allows re-using employee codes for new hires without history conflicts. • Ghost Parent Filtering: Used double-underscore (__) syntax to block 'orphan' mappings tied to inactive/deleted parents. • N+1 Query Resolution: Enforced select_related/prefetch_related across list views to minimize database round-trips. • API Standardization: Maintained strict pagination/filter consistency for predictable integration. Coding for 10 users is easy. Architecting for 10,000 with low Latency and high Data Integrity requires moving logic from Python to the SQL engine. How do you balance Index Maintenance vs. Read Speed in concurrent systems? PS: Completed Half of the 90 days! Thank you everyone for supporting this journey. 🚀 #python #django #backend #drf #90daysofcode #sql #database #performance #softwareengineering #BuildInPublic #QueryOptimization #Scalability
1 Comment
Like Comment
To view or add a comment, sign in
Sayir Dahdal
3d
Report this post
Think your ORM just executes SQL? There’s a lot more happening behind the scenes. When you use tools like Hibernate, you're not just sending queries to the database — you're interacting with a powerful mechanism called the Persistence Context. It acts as an intelligent bridge between your application and the database, and it brings several capabilities that many developers overlook 👇 🧠 1. Persistence Context An in-memory context that manages all entities within a transaction. → Checked before any database interaction. ⚡ 2. First-Level Cache Same transaction + same ID = no additional query. → Reduces latency and database load. 🔍 3. Change Tracking (Dirty Checking) The ORM automatically detects changes to your entities. → No need to write manual UPDATE statements. 🔄 4. Identity Guarantee The same database row maps to the same object instance within a transaction. → Prevents duplicates and inconsistencies. 💾 5. Auto Flush (Synchronization) Changes are written to the database at the right time (before queries or commit). → Enables batching and efficient SQL execution. ⏳ 6. Entity Lifecycle Transient → Managed → Detached → Removed → Understanding this helps avoid common pitfalls like lazy loading issues. 💡 Example (single transaction) User u1 = em.find(User.class, 1); // SELECT → managed u1.setEmail("new@example.com"); // marked as dirty User u2 = em.find(User.class, 1); // same instance, no query tx.commit(); // flush → UPDATE only email ⚙️ What actually happens First find() → cache miss → SELECT → entity becomes managed Field update → tracked as dirty (no SQL yet) Second find() → cache hit → same object returned Commit → flush → only changed fields are updated 🚀 Why this matters Without the Persistence Context, you'd be responsible for: • caching • change detection • identity consistency • SQL batching The ORM handles all of this for you — but only if you understand how it works. Master it, and your data layer becomes: → predictable → performant → maintainable #SoftwareArchitecture #ORM #Hibernate #JPA #EntityFramework #BackendDevelopment
1 Comment
Like Comment
To view or add a comment, sign in
Pratham Mehta
2w Edited
Report this post
What is Connection Pooling? Before understanding what it is, let's first understand what the problem is. Imagine you have 10 API services running horizontally behind a Load Balancer. Each time a user requests some data — maybe their profile, chat history, or feed — whichever API service handles that request opens a brand new connection to the database. Every single time, these steps happen: TCP handshake (3 way handshake 2 way to turn down) Authentication / authorization Memory allocation on both client and server Session setup Now think about the scale. You have 10 API servers, each handling say 100 requests/sec. That's 1000 new connections being created every second — each one going through all 4 steps above before even touching your data. This causes real problems: High latency — the user waits for connection setup before their query even runs (adds 20–100ms+ per request) Database gets overwhelmed — databases have a hard connection limit (PostgreSQL defaults to 100). 1000 connections/sec will crush it Wasted resources — CPU and memory burned on setup/teardown, not actual queries Traffic spikes kill you — a sudden surge means thousands of simultaneous connection attempts, a thundering herd that can bring the DB down entirely So the core problem is simple: opening a fresh DB connection per request is slow, expensive, and doesn't scale. This is exactly the problem Connection Pooling solves. So What is Connection Pooling? Instead of opening and closing a connection on every request, you create a pool of connections once at startup and reuse them. The pool keeps, say, 20 connections open and alive. When a request comes in: It borrows a connection from the pool Runs the query Returns the connection back — it stays open, ready for the next request At startup, the pool opens N connections to the DB and keeps them alive When a request needs the DB, it borrows a connection from the pool After the query, it returns the connection — the connection stays open If all connections are busy, new requests wait in a queue (with a timeout) Types of Connection Poolers In-process — built into the library/driver (e.g., HikariCP for Java, SQLAlchemy pool for Python). Lives inside your app. External/sidecar — a separate process that proxies DB connections (e.g., PgBouncer for PostgreSQL, ProxySQL for MySQL). Shared across multiple app instances. Real-world Impact Without pooling, a simple app might spend 50–90% of query time just on connection setup. With pooling, that overhead drops to near zero for most requests, and you can serve far more traffic with the same database resources. The most critical scenario is serverless or short-lived processes (Lambda functions, containers), where every invocation would otherwise create a fresh connection — an external pooler like PgBouncer becomes essential there NOTE: “Connection pooling improves performance by reusing database connections and prevents exhausting PostgreSQL’s connection limit. #systemdesign .
4 Comments
Like Comment
To view or add a comment, sign in
GA4Dataform by Superform Labs

825 followers
2w
Report this post
What is the config block in Dataform? The config block of a SQLX file is the instruction manual beyond the actual SQL query. It defines how and what should be materialized. Let's look at a few examples! The simplest config you will ever see is: config { type: "table" } This tells Dataform: "I want to materialize the output of this query as a table that is rebuilt every time this action runs." But we can get a bit crazy with JavaScript and a config block can also look like this: config { type: require("includes/core/modules/ga4/helpers").helpers.getModuleConfig('ga4').CUSTOM_LINEAGE.ga4_events_custom === "incremental" ? "incremental" : "view", schema: dataform.projectConfig.vars.OUTPUTS_DATASET, tags:["module_ga4", "events"], description: "Custom lineage: intercept ga4_events before it flows into downstream tables. Edit to add/modify columns.", ...(require("includes/core/modules/ga4/helpers").helpers.getModuleConfig('ga4').CUSTOM_LINEAGE.ga4_events_custom === "incremental" ? { onSchemaChange: "EXTEND", bigquery: { partitionBy: "event_date", clusterBy: [...((require("includes/core/modules/ga4/helpers").helpers.getModuleConfig("ga4").CLUSTER_BY || {}).ga4_events || []).slice(0, 2), "event_name", "session_id"], labels: require("includes/core/helpers.js").helpers.storageLabels() } } : {}), ...require("includes/core/helpers.js").helpers.isModuleEnabled('ga4') } This tells a bit different story as you can imagine, but after all the compiled JavaScript mumbo jumbo it will read something like: "I want to materialize a date-partitioned (bigquery.partitionBy), clustered (bigquery.clusterBy) incremental table (type) in the outputs dataset (schema) with a defined table description (description) and labels (bigquery.labels). If there is a new field in the output query that didn't exist before, add it to the existing table's schema before running the incremental INSERT (onSchemaChange)". It may look complex in the config block, but in the config.js file, you just see: CUSTOM_LINEAGE: { ga4_events_custom: 'incremental', int_ga4_sessions_custom: 'view', ga4_sessions_custom: false } This is one of the key pillars that provides the foundation for GA4Dataform since it allows us to control how we want the pipeline to behave from a ~single place. To be fair, it is nothing groundbreaking. But when you build your project with this in mind, you will be surprised how much easier it makes maintaining your code as it grows. #GA4 #Dataform #BigQuery #DataEngineering #Analytics #GCP #GoogleCloud #MarketingAnalytics #GoogleAnalytics4

2 Comments
Like Comment
To view or add a comment, sign in
SHIVAM SINGH
1w
Report this post
The N+1 Query Problem — A Silent Performance Killer In one of my recent backend discussions, we revisited a classic issue that often goes unnoticed during development but can severely impact performance in production — the N+1 Query Problem. What is the N+1 Problem? It occurs when your application executes: 1 query to fetch a list of records (N items) Then executes N additional queries to fetch related data for each record Total = 1 + N queries Example Scenario: You fetch a list of 100 users, and for each user, you fetch their orders separately. That results in 101 database queries instead of just 1 or 2 optimized queries. Why is it Dangerous? 1. Increased database load 2. Slower response time 3. Poor scalability under high traffic 4. Hard to detect in small datasets, but disastrous at scale How to Overcome It? 1. Use Join Fetch (Eager Loading) Fetch related entities in a single query using JOINs. 2. Batch Fetching Load related data in chunks instead of one-by-one queries. 3. Entity Graphs (JPA) Define what relationships should be fetched together dynamically. 4. Use DTO Projections Fetch only required fields instead of entire objects. 5. Caching Strategy Leverage second-level cache to reduce repeated DB hits. 6. Monitor SQL Logs Always keep an eye on generated queries during development. Pro Tip: The N+1 problem is not a bug — it’s a design inefficiency. It often comes from default lazy loading behavior in ORMs like Hibernate. Interview Insight: A good engineer doesn’t just make code work — they make it scale efficiently. #Java #SpringBoot #Hibernate #BackendDevelopment #PerformanceOptimization #Microservices #InterviewPrep
Like Comment
To view or add a comment, sign in
Keerthika Selvam
3w Edited
Report this post
I spent hours staring at this SQL query confused 😅 SELECT u.name FROM users u WHERE NOT EXISTS ( SELECT 1 FROM products p WHERE p.category = 'electronics' AND NOT EXISTS ( SELECT 1 FROM order_items oi JOIN orders o ON oi.order_id = o.id WHERE o.user_id = u.id AND oi.product_id = p.id ) ); My first thought: "We want users who bought ALL electronics products — so why are we using NOT EXISTS?" That one question opened up everything. Here's what I finally understood 👇 SQL does not have a "FOR ALL" keyword. You can't directly ask: "Did this user buy every electronics product?" So you flip the question: "Is there any electronics product this user did NOT buy?" Then negate it: "No such product exists" = user bought everything ✅ That's the power of Double NOT EXISTS. 3 levels work together: → Main query loops through every USER → Outer subquery loops through every ELECTRONICS PRODUCT → Inner subquery checks: did this user buy this product? If even ONE product is missed → outer catches it → user excluded ❌ If ZERO products are missed → outer returns nothing → user included ✅ The rule I'll never forget: EXISTS = "at least one" → partial match NOT EXISTS + NOT EXISTS = "every single one" → complete match SQL thinking is not English thinking. Sometimes you have to flip the question to get the answer. Currently building my SQL + Java backend skills targeting product companies. Keep this — it is perfect Has SQL ever made you flip your thinking completely? Drop it below 👇 #SQL #BackendDevelopment #Java #LearningInPublic #WomenInTech

3 Comments
Like Comment
To view or add a comment, sign in

781 followers

128 Posts

View Profile Follow

Ruslan Mukhamadiarov’s Post

More Relevant Posts

Explore content categories