From PostgreSQL wiki

Revision as of 16:31, 22 November 2013 by Myon (Talk | contribs)

Jump to: navigation, search


DRAFT: descriptions below are not yet verified!

What is the November 2013 Replication Data Loss Issue?

This is an issue, discovered Nov. 18, 2013., which can cause minor data corruption in a streaming replica.

Hackers mailing list discussion here

What are the symptoms of the issue?

The primary symptom of this corruption is rows that are present on the primary, but missing on the secondary; you may also experience duplicated records on the secondary compared to the primary. In general, the issue affects only a handful of rows.

Who is at risk for this issue?

Users who:

  • are on one of the following PostgreSQL versions: 9.3.0, 9.3.1, 9.2.5, 9.1.10, and 9.0.14. Earlier versions are not affected
  • are using streaming replication
  • have taken a new base backup to a replica, or have restarted their replica several times, since updating to one of the above versions
  • have a heavy-write workload

Note that the workload which triggers this issue is still not precisely understood; it does not affect all users with heavy write workloads.

When will this be fixed?

The PostgreSQL project will be releasing an update release in early December which fixes this issue. We strongly advise all users who are using replication to apply that update as soon as it comes out.

What can I do to prevent this issue until then?

If you are currently using 9.2.4, 9.1.9 or 9.0.13, and use streaming replication, do not install the most recent update. Instead, wait for the next update (9.2.6, 9.1.11 and 9.0.15) to come out.

Options for users who have already updated, or are running 9.3, include:

  • start your replica by taking a new base backup with write traffic halted on the master (i.e. a downtime)
  • minimize the number of times you restart your replicas
  • if you are using 9.2.5, 9.1.10 or 9.0.14, downgrade your replication master to the prior update release (9.2.4, 9.1.9 or 9.0.13).

In any case, we recommend that all users who were running streaming replication under one of the affected versions recreate each of their replicas from a fresh base backup, either:

  • during a downtime,
  • after downgrading the master, or
  • after applying the update when it is released.

How can I verify whether I already have this corruption?

That is not yet determined; hackers are working on it.

Personal tools