Storing Archival Data

From PostgreSQL wiki

Revision as of 06:12, 26 November 2009 by Ringerc (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Archival data is data that will never be updated or deleted, stored in tables that will never have additional inserts. It is most commonly found in history tables, which are often partitioned, where a table holds some subset of data generated some time in the past. The following list is a checklist for storaging that data in the most efficient manner possible.

Contents

verify table has fillfactor of 100%

Since the data will never be modified again, we should use as few pages as possible. Note this is the default setting as of postgresql 8.3, but should be verified in case it has been modified.

modify storage parameters

This presumes your table has varlen columns to deal with. Anyone have strong recommendations for which storage setting to use? ISTM it is very data dependent, but might be good to list use cases

re-write the table

two methods for this, depending on if you want to cluster the data in your table.

vacuum full method

drop indexes
vacuum full table
create indexes with fillfactor 100%

cluster method

verify indexes have fillfactor of 100%
cluster table on index

vacuum freeze

vacuum freeze helps stave off any vacuuming in the future, reducing overhead on a number of levels. Anyone know if it also removes any pages for recent changes from the fsm?

move table/indexes to archive tablespace

if you have setup special storage for archival data, you can then move the tables and indexes over to that storage. again, this is pretty implementation dependent, but anyone have thoughts on archival storage systems they want to share?

Personal tools