MVCC violations

From PostgreSQL wiki
Jump to navigationJump to search

In general, PostgreSQL aims to be strict with ACID transaction semantics. But there are some documented cases that, for performance or usability reasons, violate the MVCC protocol, and thus the atomicity, consistency or isolation properties of ACID transactions. Such instances are sometimes referred to as "MVCC violations". This page aims to illustrate when such situations occur.

In general, MVCC violations are detectable using:

  • REPEATABLE READ isolation level -- usually the easiest way to reliably reveal violations.
  • With the default READ COMMITTED level, violations are visible only rarely when winning a tight race condition, or when a 3rd session happens to synchronize the backends in a certain way.
  • Within STABLE PL/pgSQL functions, even when using the READ COMMITTED level, due to snapshot allocation optimizations with STABLE functions.

TRUNCATE TABLE

Using TRUNCATE is a common technique for quickly replacing the contents of a table with new contents from an external source. Due to performance tradeoffs, the table can temporarily appear empty to concurrent transactions.

To avoid this problem, use the DELETE FROM statement, which has a higher performance and I/O cost.

With REPEATABLE READ isolation level

This issue is the easiest do demonstrate in the REPEATABLE READ isolation level. Any reading transaction should see either "old data" or "new data" in the table, never an empty table.

Test case:

Preparation
CREATE TABLE a AS SELECT 'old data'::text;
Reading session Writing session
BEGIN ISOLATION LEVEL REPEATABLE READ;
SELECT txid_current(); -- force snapshot allocation
BEGIN;
TRUNCATE TABLE a;
INSERT INTO a VALUES ('new data');
COMMIT;
SELECT * FROM a;
 text 
------
(0 rows)

With READ COMMITTED (default) isolation

This issue is still present in the default READ COMMITTED isolation level, but more difficult to trigger reliably, because a fresh snapshot is allocated for each new query.

Thanks to Andrew Gierth for pointing out this scenario and Andres Freund for coming up with this test case.

Test case:

Preparation
CREATE TABLE sync(text text);
CREATE TABLE a AS SELECT 'old data'::text;
Synchronizing session Reading session Writing session
PREPARE prep AS
  SELECT * FROM sync UNION ALL SELECT * FROM a;
BEGIN;
LOCK TABLE sync;
EXECUTE prep;
-- blocks behind session #1
BEGIN;
TRUNCATE TABLE a;
INSERT INTO a VALUES ('new data');
COMMIT;
ROLLBACK;
-- session unblocks after #1 rollback
 text 
------
(0 rows)

STABLE functions

MVCC violations are also easily visible to STABLE PL/pgSQL functions due to the way snapshots are allocated. In earlier versions (9.1 and earlier. Not sure about 9.2???), this also affected SQL language functions. Documentation for function snapshot allocation. Thanks to Marko Tiikkaja for the test case.

Test case:

Preparation
CREATE TABLE a AS SELECT 'old data'::text;
CREATE FUNCTION get_a()
  RETURNS SETOF text LANGUAGE plpgsql STABLE
  AS 'BEGIN RETURN QUERY SELECT text FROM a; END';
Writing session Reading session
BEGIN;
TRUNCATE TABLE a;
INSERT INTO a VALUES ('new data');
SELECT * FROM get_a();
-- blocks behind session #1
COMMIT;
-- session unblocks after #1 commits
 get_a
-------
(0 rows)

COPY FREEZE

The FREEZE option of the COPY command was introduced in PostgreSQL version 9.3 to avoid expensive vacuum maintenance later on the restored table. In this case, the reading session should see an empty table, since its snapshot started before any rows were inserted (copied).

To avoid this problem, simply don't use the FREEZE option.

Test case:

Preparation
CREATE TABLE a (text text);
Reading session Writing session
BEGIN ISOLATION LEVEL REPEATABLE READ;
SELECT txid_current(); -- force snapshot allocation
BEGIN;
TRUNCATE TABLE a;
COPY a FROM STDIN WITH (FREEZE);
phantom row
\.
COMMIT;
SELECT * FROM a;
    text     
-------------
 phantom row
(1 row)

UPDATE & SELECT FOR UPDATE

In the READ COMMITTED isolation level, any command that acquires row locks, when faced with concurrently modified rows, is able to "reach into the future" and see the latest committed row version outside of its own snapshot. Arguably it doesn't violate ACID properties because READ COMMITTED is allowed to see concurrently committed rows. But it goes against usual PostgreSQL MVCC behavior where a single statement cannot see any anomalies even in READ COMMITTED.

To avoid this, simply use a higher isolation level. Unlike previous examples, this one cannot happen in REPEATABLE READ and higher levels, instead it throws the error "could not serialize access due to concurrent update".

User-visible symptoms include:

  • Result rows violating ordering specified by ORDER BY clause because the ordering is applied in the original snapshot before acquiring row locks.
  • Result rows that appear to violate causality (and possibly FOREIGN KEY constraints?)
  • WHERE clause expressions evaulated twice for the same row -- once for the old copy and again for the new copy.
  • Result containing fewer rows because the new locked version no longer matches the WHERE clause. (Note: the inverse is not true: rows whose old version didn't match will never appear in the result set, even if their new version does).

Test case:

Preparation
CREATE TABLE a AS SELECT 'a'::text UNION ALL SELECT 'b';
Writing session Reading session
BEGIN;
UPDATE a SET text='zzz' WHERE text='a';
SELECT * FROM a ORDER BY text FOR UPDATE;
-- blocks behind session #1
COMMIT;
-- session unblocks after #1 commits
 text 
------
 zzz    -- notice rows are out of order
 b
(2 rows)

ALTER TABLE rewrite

When querying a table that has been rewritten by ALTER TABLE in another session, the table may appear temporarily empty to concurrent transactions. Again, this is easier to reproduce using the REPEATABLE READ isolation level. Thanks to Marko Tiikkaja for the test case.

Test case:

Preparation
CREATE TABLE a AS SELECT 'data'::text;
Writing session Reading session
BEGIN;
ALTER TABLE a
    ADD COLUMN b int DEFAULT 0;
BEGIN ISOLATION LEVEL REPEATABLE READ;
SELECT * FROM a;
-- blocks behind session #1
COMMIT;
-- session unblocks after #1 commits
 text | b 
------+---
(0 rows)

Historic issues

  • In PostgreSQL 8.2 and earlier, the CLUSTER command could cause DELETEd/UPDATEd rows to disappear for older snapshots.