Optimize PostgreSQL Job from 10-12 Hours to 2 Hours

1w Edited

Cut a PostgreSQL job from 10–12 hours down to ~2hrs 🚀 This was a data loading + cleaning workflow, so the focus was on reducing unnecessary overhead and optimizing execution paths. Key optimizations that worked: 🔹 Used UNLOGGED tables for staging Instead of creating a logged table with type casting on existing columns, I switched to an UNLOGGED table and created new columns with proper casting (e.g., dates). → Reduced WAL overhead significantly for non-critical data. 🔹 Tuned session-level settings Increased work_mem Set synchronous_commit = off (only for this session) → Improved intermediate operations and write performance. 🔹 Optimized indexing strategy Created indexes in the order of joins Focused only on columns used in joins and filters → Avoided index bloat and improved query planning. 🔹 Avoided unnecessary indexing Indexing everything is tempting—but selective indexing made a big difference in execution time. Takeaway: Small, context-aware changes in Postgres can lead to massive performance gains—especially in ETL or staging workloads. #PostgreSQL #SQL #DatabaseOptimization #QueryOptimization #DataEngineering #PerformanceTuning #DatabasePerformance #ETL #DataProcessing #SoftwareEngineering #TechTips #Programming #Engineering #Indexing #SQLPerformance #DataWorkflows #Scalability

To view or add a comment, sign in

More Relevant Posts

OpenSource DB

1,892 followers
2w
Report this post
Hot take: Most partitioned Postgres tables shouldn't be partitioned. We know. Controversial. Let us explain. Partitioning has become the default recommendation for any table over a certain size. "It's 100GB? Partition it." "Queries are slow? Partition it." "Scaling up? Partition it." But partitioning isn't free. It adds planning overhead. It complicates migrations. It makes some queries faster and others slower. And if your partition key doesn't match your query patterns, you've just turned one table into dozens of tables that Postgres has to scan one by one. That's not optimization. That's self-inflicted complexity. So here's the Data Drop #9 framework by Bhupathi Shameer — partition when: ✅ Your queries consistently filter by a predictable key (time range, tenant ID) ✅ You need to archive or drop old data without expensive DELETE operations ✅ Maintenance on the full table (VACUUM, REINDEX) is no longer manageable ✅ You can clearly articulate which partitions most queries will hit Don't partition when: ❌ You're hoping it'll magically speed up queries you haven't profiled yet ❌ Your queries don't filter by the partition key ❌ Your table is large but your actual problem is missing indexes or bad query plans ❌ You're adding it because a blog post said "partition everything over 10GB" The line between "this will save us" vs "this will haunt us" is thinner than most teams think. #AprilDataDrops #PostgreSQL #DataDrop9 #Partitioning #Database #Performance #DataModeling #OpenSourceDB
Like Comment
To view or add a comment, sign in
Zumare Pasha
1w
Report this post
🔥 Ever spent hours debugging a pipeline because your total user count mysteriously dropped by 20%, only to find out a single line of SQL was to blame? If you write SQL heavily in your ETL jobs, you’ve probably fallen into this trap: The Accidental Inner Join. It happens when you write a LEFT JOIN intending to keep all records from your primary table, but then you filter the joined table in the WHERE clause. Here is what this looks like in a real production scenario: Imagine you are using Spark SQL to join a massive users table with a transactions table. You want a list of all users, plus the dates of any completed transactions. * The Mistake (Filtering in WHERE): %SQL SELECT u.user_id, t.transaction_date FROM users u LEFT JOIN transactions t ON u.user_id = t.user_id WHERE t.status = 'COMPLETED' What happens? The LEFT JOIN correctly includes users with no transactions (leaving the status column as NULL). But the WHERE clause runs after the join. Since NULL = 'COMPLETED' evaluates to false, Spark drops every user who hasn't made a purchase. Your LEFT JOIN just silently became an INNER JOIN. * The Fix (Filtering in ON): %SQL SELECT u.user_id, t.transaction_date FROM users u LEFT JOIN transactions t ON u.user_id = t.user_id AND t.status = 'COMPLETED' What happens? The filter is applied during the join. You get all your users, and only the 'COMPLETED' transactions are matched to them. Why this matters in real data engineering: Syntax errors are easy; your PySpark job will just fail and alert you. Logical errors are dangerous. The query above is perfectly valid SQL, meaning your AWS Glue job will happily succeed, write the data to S3, and trigger downstream dependencies. You won't know there's a problem until a frantic product manager pings you saying the active user metrics are broken. Data completeness is just as important as pipeline speed. Always double-check where your filters live. #SQL #MYSQL #DataEngineering #PySpark #AWS #DeltaLake #BigData #ETL #DataPipelines #DataLake #ApacheSpark #CloudComputing #DataEngineeringLife #AnalyticsEngineering #DataPlatform #OpenData #TechCareers
Like Comment
To view or add a comment, sign in
Shamim Ahmed
1mo
Report this post
📊 **Leveling Up My Database Skills with PostgreSQL!** Today, I worked on structuring and managing user data in PostgreSQL. Creating clean, well-organized tables is a foundational step toward building reliable applications and data-driven systems. 🔍 **What this table represents:** * User profiles with name, email, age, and city * Consistent formatting and data types * A scalable structure ready for queries, filters, and analytics Every dataset—no matter how small—is an opportunity to practice data modeling, enhance query performance, and strengthen backend skills. 💡 *Small steps in SQL lead to big wins in development.* #PostgreSQL #SQL #DatabaseDesign #BackendDevelopment #DataEngineering #LearningJourney #TechSkills #Productivity
Like Comment
To view or add a comment, sign in
Rejith Krishnan
3w Edited
Report this post
Database migrations don't have to be a war of attrition. We just published a detailed case study on migrating Microsoft's WideWorldImporters OLTP database from SQL Server to PostgreSQL 15 using AI agents. The schema wasn't trivial: 26 sequences, 21 tables, 47 stored procedures, and all the edge cases that come with real enterprise workloads, including temporal tables, computed columns, and complex JSON update patterns. Instead of manual conversion, we built a toolchain of Claude Code slash commands, each handling a discrete stage of the migration pipeline. The result was repeatability at a level manual migrations simply can't match. Every transformation decision was captured in companion audit files. Smoke tests validated each function within minutes against live PostgreSQL containers. And the pipeline scales to hundreds of objects without accumulating technical debt. 🔹 Dependency analysis and DDL conversion handled by purpose-built slash commands ⚡ 45 stored procedures translated from MSSQL OPENJSON patterns to PostgreSQL jsonb_to_record 🧠 Temporal tables, computed columns, and PostGIS types converted with documented semantic decisions 🚀 Container-based smoke tests validated every function before any code was committed ✅ Full audit trail: every decision captured in reviewable markdown files #AgenticAI #EnterpriseArchitecture #PostgreSQL #DatabaseMigration Tagging some brilliant minds in this space: Rajagopal Nair Arvind Mehrotra Dr. Anil Kumar P Pradeep Chandran Rashid Siddiqui Ancy Paul
1 Comment
Like Comment
To view or add a comment, sign in
Idriss KAZBAT
1w Edited
Report this post
A lot of beginners underestimate databases. But every developer should understand: - SQL vs NoSQL - Indexes - Foreign keys - JOINs - Query performance Database knowledge is repeatedly highlighted as a core skill alongside deployment, caching, and CI/CD because modern systems depend on good data modeling and efficient queries. #Database #SQL #NoSQL #BackendDevelopment #DataEngineering
Like Comment
To view or add a comment, sign in
Sonia Valeja
1w
Report this post
Is your query actually slow, or just long-running? This distinction is often misunderstood but critical for effective tuning. In my recent blog I have explained how to identify and analyze query behavior correctly - https://lnkd.in/d4y_JJP4 Would love your thoughts and experiences! #PostgreSQL #PerformanceTuning #DBA #SQL #TechBlog #Opensource

PostgreSQL Performance: Is Your Query Slow or Just Long-Running? https://www.percona.com
Like Comment
To view or add a comment, sign in
Ulmanu Cristian
1w
Report this post
PostgreSQL is often treated as “just” a relational database. But the more interesting question is usually not SQL vs NoSQL. It is this: What consistency model and scaling model does the system actually need? By understanding the tradeoffs: 🎯 what ACID really gives you 🎯 what BASE really means in practice 🎯 why read replicas are often the first compromise 🎯 why sharding is not replication 🎯 how internet-scale systems changed database architecture 🎯 why many teams eventually still wanted SQL, transactions, and mature tooling 🎯 and why PostgreSQL became such an interesting hybrid answer If you work with PostgreSQL beyond basic CRUD, this presentation should give you a cleaner way to think about consistency, scaling, and architecture decisions. #PostgreSQL #DatabaseArchitecture #SQL #NoSQL #ACID #BASE #DistributedSystems #Backend
Like Comment
To view or add a comment, sign in
Saravana Thiyagarajan
2w
Report this post
Every performance boost comes with a durability bill. In PostgreSQL, unlogged tables are a clear example of that unlogged table skips WAL logging. Writes are faster, but if Postgres crashes, the data is gone. Perfect for caches, ETL staging, or derived data. A terrible idea for anything you’d regret losing. Good architecture isn’t about avoiding trade-offs. It’s about making them intentionally. #PostgreSQL #SystemDesign #DatabaseInternals #PerformanceEngineering #SoftwareArchitecture #BackendEngineering #Scalability
Like Comment
To view or add a comment, sign in
Prexa Patel
3w
Report this post
📢 Day 31 — ROLLUP: Hierarchical Aggregation ROLLUP creates multiple levels of totals. It is useful for hierarchical summaries. 📌 Syntax SELECT column1, column2, SUM(value) FROM table GROUP BY ROLLUP(column1, column2); 📌 Example SELECT department, job_title, SUM(salary) FROM employees GROUP BY ROLLUP(department, job_title); 🛠 Practical Uses ✔️ Department totals ✔️ Subtotals in reports #SQL #DataAnalytics #DataEngineering #Database #Programming #Tech #Developers #Learning #DataScience #DataAnalyst #MachineLearning #BigData #BusinessIntelligence #ETL #DataVisualization #DataWarehouse #CareerGrowth #SQLDeveloper #DatabaseDeveloper #DatabaseAdministrator #DataEngineer #BIDeveloper #SQLServer #PostgreSQL #MySQL #Oracle #Snowflake #BigQuery #SparkSQL #TechCommunity #ITProfessionals #ProfessionalGrowth #Networking #LinkedInLearningData
Like Comment
To view or add a comment, sign in
Md Toufiqul Islam
2w
Report this post
🚀 Advanced Database Design in PostgreSQL 📌 1. JSON / JSONB (Flexible Data Modeling) PostgreSQL allows semi-structured data: ✔ JSON → text-based ✔ JSONB → binary, faster & indexable Powerful features: ✔ ->, ->> → access fields ✔ @> → search inside JSON ✔ jsonb_set → update values ✔ json_agg, json_build_object → API-ready responses 📌 2. Transactions (ACID 🔐) Ensure safe and reliable operations: ✔ BEGIN → start ✔ COMMIT → save ✔ ROLLBACK → undo 📌 3. Savepoints (Partial Rollback) Control transactions like a pro: ✔ Create checkpoints ✔ Rollback specific steps only 📌 4. Partitioning (Handle Big Data ⚡) Split large tables for better performance: ✔ LIST → specific values (e.g., class) ✔ RANGE → ranges (e.g., age, date) ✔ HASH → even distribution ✔ Composite → multi-level partition 📌 5. Scheduling with pg_cron Automate database tasks: ✔ Cleanup old data ✔ Run periodic jobs ✔ Reduce manual work 📌 6. Migrations (Schema Versioning) Treat DB like code: ✔ Track changes ✔ Safe deployments ✔ Team collaboration 📌 7. Schema Evolution (DDL Operations) Modify structure safely: ✔ Rename table ✔ Rename column ✔ Add/remove fields 💬 Final Insight: Advanced DB design is about: ⚡ Scalability (partitioning) ⚡ Flexibility (JSONB) ⚡ Reliability (transactions) ⚡ Maintainability (migrations) #PostgreSQL #DatabaseDesign #BackendDevelopment #SystemDesign #SQL #SoftwareEngineering
Like Comment
To view or add a comment, sign in

611 followers

10 Posts

View Profile Connect

Optimize PostgreSQL Job from 10-12 Hours to 2 Hours

More Relevant Posts

Explore related topics

Explore content categories