From PostgreSQL wiki

Jump to: navigation, search

The issues to consider during implementing in-database compression vis-à-vis filesystem-based compression are:

- What is the blocksize for compression? Will PostgreSQL blocksize be large enough, say, 256KB, to eliminate enough redundancy? Assume that data is not sorted, so entropy is relatively high.
- Will the blocks stay compressed or uncompressed in cache? That is, will they need to be frequently decompressed if accessed repeatedly?
- If blocks were to stay uncompressed in cache, how do we handle buffers larger than disk blocks? How about the row offsets within buffers?
- You cannot have an index to convert logical (uncompressed) block space to physical (compressed) block space because zlib is compressing multiple contiguous blocks, no? Or are you proposing (de)compressing single database blocks? This would mean that zlib cannot have a block size larger than 64KB?
- You cannot have such an index anyway because small changes (updates) to data will shift the entire "tail," and all subsequent offsets will need to change because of a minor update. This architecture will shut the doors on supporting any (even infrequent) DMLs in the future.
- If we do not support any DMLs, what you are essentially proposing is an archive table. For VLDB/DSS environments, where people use inherited (child) tables to range partition tables, filesystem-based compression is good enough then. The unit of (de)compression needs to be manageable so that the penalty is small.
Personal tools