20140320UpdateIssues

From PostgreSQL wiki

Revision as of 00:48, 18 March 2014 by Aglio (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

March 20, 2014 PostgreSQL 9.3.4 Update Release Issue Explanations

DRAFT ONLY: NOT YET COMPLETE OR CHECKED FOR ACCURACY

There are two major user-facing issues fixed in the 9.3.4 update. As both issues have somewhat complex resolutions, they are explained here in the wiki. These issues affect version 9.3 only, as as such are patched in version 9.3 only.

Recovery Index Corruption Issue

References:

Causes

This corruption happens when non-indexed values are updated in tables while certain kinds of locks are held by another concurrent session, or sometimes in another statement in the same transaction. The most common cause of this is when non-indexed values are updated in both tables joined by a foreign key. When these updates are replayed through the recovery mechanism, the index updates for primary keys and other related indexes are lost, causing inconsistency between the index and the table data.

Recovery replay happens during:

  1. Crash recovery, if the database server goes down unexpectedly and comes back up
  2. Binary Replication, on the standby only
  3. Point-in-time-Recovery (PITR) data restore from backup

This means that if you are using binary replication, PITR, or have had a crash since upgrading to 9.3, you are at risk for this issue. Since this issue is timing-based, the probability of any particular user encountering it is low. Thinks which raise your chances of encountering this issue include:

  • very high write transaction rates
  • code which updates records on both sides of a foreign key in a single transaction
  • code which can concurrently update records on both sides of a foriegn key

Users who do not use foreign keys in their databases are not at risk.

This issue is a side effect of fixing chronic locking issues which had caused unnecessary lock blocking and deadlocks due to foreign key action in older versions of PostgreSQL.

Symptoms

If users have had this kind of data corruption, they will experience it in one of the following ways:

  • Query results, such as counts, will differ according to whether or not the query used an index
  • Unexpected unique constraint, primary key, or foreign key violations
  • pg_dump/pg_restore failing due to key violations
  • REINDEX failing undexpectedly

Fixes

First, apply the update release.

Second, if you believe that you may have this kind of corruption on replica standbys, you should take a new base backup of each and restart replication.

If you believe that you have corruption caused by this issue on the master or only copy of the database, the only real cure is a pg_dump and pg_restore of the database. REINDEXing your indexes will not fix it.

Deletion of Old Database Statistics Files

This issue causes Postgres to retain statistics files for dropped databases indefinitely.

Causes

In 9.3, we introduced per-database statistics files as a way of making statistics collection more parallel and thus more efficient. However, the code designed to delete a statistics file when its database was dropped is buggy, resulting in statistics files being retained indefinitely for databases which no longer exist.

However, statistics files are generally quite small. For that reason the users who care about this issue will have all of the following characteristics:

  1. Have DROPped lots of databases since upgrading to 9.3
  2. Have many tables/schema per database
  3. Have a high transaction rate

These users may want to check if the have orphaned statistics files in order to reduce disk space usage.

Symptoms/Detection

Get a list of your current database OIDs by doing the following:

  SELECT oid, datname from pg_database ORDER BY oid;

Now, find your stats directory. This will be in one of three places:

  • in PGDATA/pg_stat_tmp if the server is running;
  • in PGDATA/pg_stat if Postgres is shut down;
  • in some other location if you have changed the setting stats_temp_directory

Then match the oids against the oid filenames of all of the files in your stats directory. Note that db-0.stat is a special file. If you have stats file which don't match any existing database OID, then you've had the issue.

Fixing

Having gone through the above, delete the extra database files. Done!

Personal tools