I found a strange one on a client's Postgres DB today. 🐘 The disk was filling up, but the table sizes looked small. The numbers didn't add up. The culprit was pg_largeobject. The client used OID to store files. In Postgres, this data hides in a system table that standard commands often miss. The problem: It stays hidden: Most scripts skip system tables. It never deletes: When you delete a row, the file data stays forever. Vacuum skips it: Normal cleaning doesn't touch it. If your DB is huge but your tables look small, check your system catalogs. You might have orphaned data that needs a tool like vacuumlo to clean up! #PostgreSQL #Database #TechTips #DBA
100% yes large objects are super easy to miss! thanks for sharing. most of the time, i think it's better to store objects that are larger than 1GB somewhere external like object storage. data types like bytea/text work well for data under 1GB. but yes - many people are using postgres large objects - and it's really important to know about them
Keeping large objects in your database is usually a bad idea, regardless of this bug. Keep references (pointers) to the file instead - whether it's a blob in the cloud or a file in an on premise file server.
FYI, I added a section called "Large objects" long time ago to pg collector that provide information about Large objects . https://github.com/awslabs/pg-collector/blob/6425eec4ee7431dcf621d5c9cd2a1038e6c2223e/pg_collector.sql#L3002
Interesting
Yes, and another snag often hit is that LOs need to be regenerated during major version upgrade, needing a long time and high amounts of RAM. This could mean unexpectedly long downtimes and/or out of memory crashes. Not to mention the issues of OID exhaustion, and the long waits of finding an unused OID after the global counter wraps around. vacuumlo itself is very inefficient, putting all the OIDs in one temp table and doing huge joins against everything. Then it unlinks without any ordering, so it could remove many objects but not in the area of the current OID global counter which would not help to find a free OID soon. It could also be risky, since vacuumlo only considers OID and lo type columns, while users sometimes store the reference OIDs in columns of other types, which would make vacuumlo think the LO is orphaned and delete it when it is actually still in use. Like Jeremy wrote, it's better to either move to TOASTed data, or move it outside the DB, but move away from LOs if at all possible.