Bulk Loading and Restores

From PostgreSQL wiki
Jump to navigationJump to search

The manual section Populating a Database covers bulk-loading and includes good guidance on the DDL side of things. In addition to that, you should follow the general guidelines from Tuning Your PostgreSQL Server, particularly those about checkpoint_segments.

One additional option you might consider for restores is using just synchronous commits (8.3+) or even changing fsync=off. The usual reason you can't do that, concerns about corruption, are less important if the database is just being populated for the first time. Note that if your data is really large, though, a crash in the middle of loading might cause a significant time loss. If you can't afford to start over again, playing with fsync is a bit dangerous.

Additional articles on this topic:

Another area to watch carefully is vacuuming. You probably want to turn autovacuum off during the load itself, as that will compete for resources. If you do that, once the load is finished a manual ANALYZE is needed to sort out table statistics. Also, you might consider running a regular VACUUM at that point in order to update all the Hint Bits; this can take a fairly long time to execute however.