20120924updaterelease
Details of 2012-09-24 Update Release Data Corruption Issue
Description of the Problem
Versions 9.1 and 9.2 of PostgreSQL have a bug with flushing dirty blocks from memory, or "checkpointing", introduced accidentally as a side effect of performance optimizations and new features, mainly Unlogged Tables. This bug can cause data of certain types to not be written to disk if the database shuts down or restarts for any of the following reasons:
- PostgreSQL crash
- Server crash or power loss
- "immediate" shutdown (pg_ctl -m immediate)
- "kill -9" or Out-Of-Memory-Kill of the postmaster service
- database is a standby which was promoted to master
Under these circumstances, the database can suffer from recoverable data corruption. The nature of this corruption is such that it can produce wrong, but seemingly valid, answers to queries, so it is critical that users who may have been affected by this corruption take steps to clean it up very soon.
First, there is a low probability of corruption of BTREE and GIN indexes. Shutting down cleanly will limit the further spread of this issue. It's very likely that if corruption has occurred that it would be visible in the form of error messages when the index is used.
Second, there is a significant probability of corruption of relation visibility maps (approaching 100% on standbys). This affects 9.1 very differently from 9.2, however. On PostgreSQL 9.1 the worst consequence is some transient inefficiency and/or failure to recover free space during VACUUM. On PostgreSQL 9.2, we use the visibility map during index only scans and so these are likely to produce wrong answers.
The PostgreSQL Global Development Group apologizes for the inconvenience caused by these issues.
Steps for Users of PostgreSQL 9.1
If you are running 9.1, and suspect that you may be vulnerable to database corruption because your database has shut down unexpectedly or failed over during the last few months:
- Download new 9.1.6 packages
- Do a clean shutdown of PostgreSQL, using one of the following mechanisms:
- init script or service manager
- pg_ctl -m smart stop
- pg_ctl -m fast stop
- Install 9.1.6
- According to the 9.2 upgrade docs, it might be wise to set vacuum_freeze_table_age to 0 in postgresql.conf now, before restarting the database server, and remove that entry after having finished this procedure; this is also the point in time where vacuum_cost_delay can be set globally
- Restart the database system
- Gradually rebuild all of your BTree and GIN indexes (see below)
- Schedule a manual vacuum of the whole database during a convenient slow period (see below)
If you are planning to upgrade to PostgreSQL 9.2 using pg_upgrade, it is critical for you to run the full database VACUUM first.
Steps for Users of PostgreSQL 9.2
If you are running 9.2.0, and suspect that you may be vulnerable to database corruption because your database has shut down unexpectedly or failed over during the last two weeks:
- Download new 9.2.1 packages
- Do a clean shutdown of PostgreSQL, using one of the following mechanisms:
- init script or service manager
- pg_ctl -m smart stop
- pg_ctl -m fast stop
- Install 9.2.1
- According to the 9.2 upgrade docs, it might be wise to set vacuum_freeze_table_age to 0 in postgresql.conf now, before restarting the database server, and remove that entry after having finished this procedure; this is also the point in time where vacuum_cost_delay can be set globally
- Restart the database system
- VACUUM all tables in your database immediately
- Gradually rebuild all of your BTree and GIN indexes (see below)
How to VACUUM All Tables
To correct corruption of the visibility map, users should run a vacuum and force a scan of all database blocks in order to reset the entire map. Since this means effectively scanning the entire database, it will generate considerable IO and take significant time to execute for large databases. One way to ameliorate the impact on concurrently running database load is to use cost delay to spread out the vacuum:
SET vacuum_cost_delay = 50;
Interactive VACUUM
For each database, you should:
- log in to psql as the Postgres superuser
- set vacuum_cost_delay, if doing so
- run "VACUUM ( FREEZE, VERBOSE, ANALYZE );" (ANALYZE is optional)
This will produce a lot of output, allowing you to track progress of the full-database vacuum.
You can also VACUUM one table at a time instead of doing them all one after the other, provided that you have some way to track which tables have and have not been vacuumed.
vacuumdb
If you have multiple databases to vacuum, you may find it convenient to use the vacuumdb utility instead. This would work by:
- set vacuum_cost_delay in postgresql.conf, if doing so (and reload database)
- run "vacuumdb -F -v -z -a" as the postgres superuser
Note that you may need to give vacuumdb additional parameters in order to connect with the database server. The -z (analyze) and -v (verbose) options are optional.
Rebuild BTree/GIN Indexes
It is likely that any indexes which are corrupted because of the issues fixed in this update release will display error messages when accessed, and can be easily identified. However, it is possible (though unlikely) that a few indexes may be corrupted so that they return incorrect answers without errors.
The VACUUM FREEZE recommended above will correct some types of index corruption. However, users who have strong data integrity concerns, or feel they are especially at risk due to multiple crashes or failovers in their server history, should take the extra step of rebuilding indexes in order to eliminate any possible corruption.
Rebuilding an Individual Index
Whether you are being precautionary, or because you have found an index corruption error, you can rebuild indexes one at a time. The simplest way is via REINDEX.
REINDEX TABLE <tablename>;
or for a single index:
REINDEX INDEX <indexname>;
You may want to increase the RAM available to REINDEX, by increasing maintenance_work_mem, up to 1/8 of your available RAM (up to a maximum of 2GB). REINDEX takes a full table write lock, however, and depending on the size of the table, can take a considerable time to run. In order to rebuild indexes while under concurrent database load, use CREATE INDEX CONCURRENTLY:
CREATE INDEX CONCURRENTLY <indexname>_tmp <index_definition>; BEGIN; DROP INDEX <indexname>; ALTER INDEX <indexname>_tmp RENAME TO <indexname>; END;
This locks the table only during the final drop and rename stage. It is, however, more complex.
Either approach will generate considerable IO while running on large tables.
Getting a List of Btree and GIN Indexes
Regardless of your approach towards rebuilding your indexes, you may want to get a list of all BTree and GIN indexes in the database. BTree is the most common type of index, so this will include most of the indexes in your database. Given that GiST indexes can be quite large, though, you may want to omit them from rebuilding.
Use this query:
SELECT tablename, indexname, indexdef FROM pg_indexes WHERE ( indexdef ILIKE '%USING btree%' OR indexdef ILIKE '%USING GIN%' ) AND schemaname <> 'pg_catalog' ORDER BY tablename, indexname;
Reindexing Everything
If you can afford the required downtime, and want to be absolutely certain that you've prevented all corruption, you can reindex every index in your database using the reindexdb utility. Note that this will cause GiST indexes to be rebuilt as well, even though they are not in danger of corruption.
Run the following as the postgres superuser to reindex one database:
reindexdb <databasename>
Or to reindex all databases:
reindexdb -a
Additional options may be required for reindexdb to connect to your database. Since reindexdb will take a lock on entire tables in your installation, one at a time, this is best done during a downtime.