- Let us say data in one of your database table has grown very large (in GBs)
- Most queries hit only recent data
- Old data is: rarely queried, expensive to keep on high-cost storage or slowing down vacuum, indexes, backups
- So you are dealing with performance issues, memory issues etc. on your database
There can be many solutions depending on the requirements but here we are going to discuss one of the possible solution that is creating partitions in a way where hot data(queried most) is placed in current database instance as partition and cold data(queried less) in another database instance as partition.
Let us say we have orders(id, order_date, amount) table, where 95% of queries are for orders from year 2025 and the remaining 5% are for orders before 2025.
Let us say the order table size is 75GB(5GB for the orders of year 2025 + 70GB for the orders of year before 2025).
We decide to move all the orders data before the year 2025 to a new database instance(say cold database) so my current database instance gets rid of the problems defined in problem section above.
We will use two database instances of postgres in example and will refer those as hot and cold database instance.
We can achieve this using below steps:
- Create a table orders_before_2025 in cold database instance. CREATE TABLE orders_before_2025 (id bigint NOT NULL,order_date date NOT NULL, amount numeric(10,2));
- Enable fdw extension: CREATE EXTENSION postgres_fdw;
- Create a foreign server(connection with cold database instance): CREATE SERVER remote_pg FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host '<cold-database-connection-url>', dbname '<database-name-in-cold-database', port '<port-of-cold-database>');
- Allow one of the role of hot database to perform operations on cold database using credentials for the cold database: CREATE USER MAPPING FOR "postgres" SERVER remote_pg OPTIONS (user '<user-for-cold-database>', password '<password-to-cold-database>');
- Create partitioned table CREATE TABLE orders (id bigint NOT NULL, order_date date NOT NULL, amount numeric(10,2)) PARTITION BY RANGE (order_date);
- Create a foreign table(partition) to save data for orders before 2025 and map it to the table of cold database: CREATE FOREIGN TABLE orders_before_2025 (id BIGINT NOT NULL, order_date DATE NOT NULL, amount NUMERIC(10,2)) SERVER remote_pg OPTIONS (schema_name 'public', table_name 'orders_before_2025');
- Attach partition with orders table: ALTER TABLE orders ATTACH PARTITION orders_before_2025 FOR VALUES FROM (MINVALUE) TO ('2025-01-01');
- Create a partition to save records for 2025: CREATE TABLE orders_2025 PARTITION OF orders FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');
- Run insert queries: INSERT INTO orders values(1, '2024-12-01', 100); INSERT INTO orders values(2, '2025-02-01', 500);
- Go to cold database and run query SELECT * FROM orders_before_2025 you will see one record with order id: 1.
- Go to hot database and run query SELECT * FROM orders you will see 2 records.
- You can also notice the query execution time different by running SELECT * FROM orders; and SELECT * FROM orders WHERE order_date>='2025-01-01'; The query with where clause will be fast as it only has to read from local database.
- No application code changes required.
- Query performance will be optimized.
- Users will still see the records in the result as they were seeing before partitioning when they search records either for 2025, before 2025 or before and including 2025 with same query.
- Probably a quick turnaround solution depending on your requirements and timelines.
Please keep sharing your thoughts.