March 20, 2014 PostgreSQL 9.3.4 Release Update Issue Explanations
There are two major user-facing issues fixed in the 9.3.4 update. As both issues have somewhat complex resolutions, they are explained here in the wiki. The issues below affect version 9.3 only.
Index Corruption during Recovery
After recovery replay of a SELECT FOR KEY SHARE concurrently with a HOT update, the updated-and-locked table records might be invisible to index scans.
- [HACKERS] Is anyone aware of data loss causing MultiXact bugs in 9.3.2?
- [HACKERS] Another possible corruption bug in 9.3.2 or possibly a known MultiXact problem?
This corruption happens when non-indexed values are updated in tables while certain kinds of locks are held by another concurrent session (or sometimes by the same session as result of earlier statements in the same transaction). The most common cause of this is when a record is inserted into a "child" table which has a foreign key reference to a "parent" table. This takes a lock which prevents the referenced row in the "parent" table from being deleted or having its referenced key updated; however the lock does not prevent updating any non-indexed columns. If such an update has already occurred (while the locking transaction executes), recovery replay of the locking operation will corrupt the state of the updated record in the "parent table".
When a record is corrupted in this way it will not appear at all in some queries that use an index lookup. It will appear to have been deleted entirely, in violation of any constraints which require it. It will however appear in any queries that do not use an index (i.e. they use sequential scan).
Recovery replay happens during:
- Crash recovery, if the database server goes down unexpectedly and comes back up
- Binary Replication, on the standby only
- Point-in-time-Recovery (PITR) data restore from backup
This means that if you are using binary replication, PITR, or have had a crash since upgrading to 9.3, you are at risk for this issue. The probability of hitting the problem depends heavily on the pattern of the workload.
- Code which inserts into a table with a foreign key and updates the record it references in the same transaction may reliably trigger the problem repeatedly.
- For simpler patterns which execute these operations in separate transactions, the issue will depend on timing and transaction rate.
Users who do not use foreign keys (or explicit SELECT FOR KEY SHARE locks) in their databases are not at risk.
This issue is a side effect of fixing chronic locking issues which had caused unnecessary lock blocking and deadlocks due to foreign key action in older versions of PostgreSQL.
If users have had this kind of data corruption, they will experience it in one of the following ways:
- Query results, such as counts, will differ according to whether or not the query used an index
- Unexpected unique constraint, primary key, or foreign key violations
- pg_dump/pg_restore failing due to key violations
- REINDEX failing unexpectedly
Running a REINDEX on each table should be sufficient to detect if the problem exists. However, this step may be too expensive for most users, as it requires locking each table as it runs.
First, apply the update release.
Any standby servers should be reinitialized with new base backup restores from the primary. The issue only affects the recovery code so the WAL transaction log data is correct and will replay correctly on the new restore using the updated Postgres.
If you believe that you have corruption caused by this issue on the master or only copy of the database, the only real cure is a pg_dump and pg_restore of the database. REINDEXing your indexes will not fix it, and neither will VACUUM FULL. It may be necessary to restore without constraints, fix the duplicated primary keys or missing referenced foreign keys, then recreate all the constraints and indexes.
For fixing the issue in place, operations which do a full sequential scan of the corrupted table for a rebuild should fix the problem, although this has not been tested. This would include doing ALTER TABLE with an expression, for example:
ALTER TABLE badtable ALTER COLUMN somecolumn TYPE coltype USING (COALESCE(somecolumn, NULL));
This forces a full rewrite of the table. Note, however, that the table will be exclusively locked while doing this.
Deletion of Old Database Statistics Files
This issue causes Postgres to retain statistics files for dropped databases indefinitely.
- [HACKERS] pg_stat_tmp files for dropped databases
In 9.3, we introduced per-database statistics files as a way of making statistics collection more parallel and thus more efficient. However, the code designed to delete a statistics file when its database was dropped is buggy, resulting in statistics files being retained indefinitely for databases which no longer exist.
However, statistics files are generally quite small. For that reason the users who care about this issue will have all of the following characteristics:
- Have DROPped lots of databases since upgrading to 9.3
- Have many tables per database
- Have a high transaction rate
These users may want to check if they have orphaned statistics files in order to reduce disk space usage.
Get a list of your current database OIDs by doing the following:
SELECT oid, datname FROM pg_database ORDER BY oid;
Now, find your stats directory. This will be in one of three places:
- in PGDATA/pg_stat_tmp if the server is running;
- in PGDATA/pg_stat if the server is shut down;
- in some other location if you have changed the setting stats_temp_directory
Then match the OIDs against the filenames of all of the db_NNNN.stat files in your stats directory. If you have stats files that don't match any existing database OID, then you've had the issue. Note that db_0.stat is a special file (don't remove it!).
Having gone through the above, delete the extra stats files. Done!