MVCC violations
In general, PostgreSQL aims to be strict with ACID transaction semantics. But there are some documented cases that, for performance or usability reasons, violate the MVCC protocol, and thus the atomicity, consistency or isolation properties of ACID transactions. Such instances are sometimes referred to as "MVCC violations". This page aims to illustrate when such situations occur.
In general, MVCC violations are detectable using:
- REPEATABLE READ isolation level -- usually the easiest way to reliably reveal violations.
- With the default READ COMMITTED level, violations are visible only rarely when winning a tight race condition, or when a 3rd session happens to synchronize the backends in a certain way.
- Within STABLE PL/pgSQL functions, even when using the READ COMMITTED level, due to snapshot allocation optimizations with STABLE functions.
TRUNCATE TABLE
Using TRUNCATE is a common technique for quickly replacing the contents of a table with new contents from an external source. Due to performance tradeoffs, the table can temporarily appear empty to concurrent transactions.
To avoid this problem, use the DELETE FROM
statement, which has a higher performance and I/O cost.
With REPEATABLE READ isolation level
This issue is the easiest do demonstrate in the REPEATABLE READ isolation level. Any reading transaction should see either "old data" or "new data" in the table, never an empty table.
Test case:
Preparation | |
---|---|
CREATE TABLE a AS SELECT 'old data'::text;
| |
Reading session | Writing session |
BEGIN ISOLATION LEVEL REPEATABLE READ;
SELECT txid_current(); -- force snapshot allocation
|
|
BEGIN;
TRUNCATE TABLE a;
INSERT INTO a VALUES ('new data');
COMMIT;
| |
SELECT * FROM a;
text
------
(0 rows)
|
With READ COMMITTED (default) isolation
This issue is still present in the default READ COMMITTED isolation level, but more difficult to trigger reliably, because a fresh snapshot is allocated for each new query.
Thanks to Andrew Gierth for pointing out this scenario and Andres Freund for coming up with this test case.
Test case:
Preparation | ||
---|---|---|
CREATE TABLE sync(text text);
CREATE TABLE a AS SELECT 'old data'::text;
| ||
Synchronizing session | Reading session | Writing session |
PREPARE prep AS
SELECT * FROM sync UNION ALL SELECT * FROM a;
|
||
BEGIN;
LOCK TABLE sync;
|
||
EXECUTE prep;
-- blocks behind session #1
|
||
BEGIN;
TRUNCATE TABLE a;
INSERT INTO a VALUES ('new data');
COMMIT;
| ||
ROLLBACK;
|
-- session unblocks after #1 rollback
text
------
(0 rows)
|
STABLE functions
MVCC violations are also easily visible to STABLE PL/pgSQL functions due to the way snapshots are allocated. In earlier versions (9.1 and earlier. Not sure about 9.2???), this also affected SQL language functions. Documentation for function snapshot allocation. Thanks to Marko Tiikkaja for the test case.
Test case:
Preparation | |
---|---|
CREATE TABLE a AS SELECT 'old data'::text;
CREATE FUNCTION get_a()
RETURNS SETOF text LANGUAGE plpgsql STABLE
AS 'BEGIN RETURN QUERY SELECT text FROM a; END';
| |
Writing session | Reading session |
BEGIN;
TRUNCATE TABLE a;
INSERT INTO a VALUES ('new data');
|
|
SELECT * FROM get_a();
-- blocks behind session #1
| |
COMMIT;
|
|
-- session unblocks after #1 commits
get_a
-------
(0 rows)
|
COPY FREEZE
The FREEZE option of the COPY command was introduced in PostgreSQL version 9.3 to avoid expensive vacuum maintenance later on the restored table. In this case, the reading session should see an empty table, since its snapshot started before any rows were inserted (copied).
To avoid this problem, simply don't use the FREEZE option.
Test case:
Preparation | |
---|---|
CREATE TABLE a (text text);
| |
Reading session | Writing session |
BEGIN ISOLATION LEVEL REPEATABLE READ;
SELECT txid_current(); -- force snapshot allocation
|
|
BEGIN;
TRUNCATE TABLE a;
COPY a FROM STDIN WITH (FREEZE);
phantom row
\.
COMMIT;
| |
SELECT * FROM a;
text
-------------
phantom row
(1 row)
|
UPDATE & SELECT FOR UPDATE
In the READ COMMITTED isolation level, any command that acquires row locks, when faced with concurrently modified rows, is able to "reach into the future" and see the latest committed row version outside of its own snapshot. Arguably it doesn't violate ACID properties because READ COMMITTED is allowed to see concurrently committed rows. But it goes against usual PostgreSQL MVCC behavior where a single statement cannot see any anomalies even in READ COMMITTED.
To avoid this, simply use a higher isolation level. Unlike previous examples, this one cannot happen in REPEATABLE READ and higher levels, instead it throws the error "could not serialize access due to concurrent update".
User-visible symptoms include:
- Result rows violating ordering specified by ORDER BY clause because the ordering is applied in the original snapshot before acquiring row locks.
- Result rows that appear to violate causality (and possibly FOREIGN KEY constraints?)
- WHERE clause expressions evaulated twice for the same row -- once for the old copy and again for the new copy.
- Result containing fewer rows because the new locked version no longer matches the WHERE clause. (Note: the inverse is not true: rows whose old version didn't match will never appear in the result set, even if their new version does).
Test case:
Preparation | |
---|---|
CREATE TABLE a AS SELECT 'a'::text UNION ALL SELECT 'b';
| |
Writing session | Reading session |
BEGIN;
UPDATE a SET text='zzz' WHERE text='a';
|
|
SELECT * FROM a ORDER BY text FOR UPDATE;
-- blocks behind session #1
| |
COMMIT;
|
|
-- session unblocks after #1 commits
text
------
zzz -- notice rows are out of order
b
(2 rows)
|
ALTER TABLE rewrite
When querying a table that has been rewritten by ALTER TABLE in another session, the table may appear temporarily empty to concurrent transactions. Again, this is easier to reproduce using the REPEATABLE READ isolation level. Thanks to Marko Tiikkaja for the test case.
Test case:
Preparation | |
---|---|
CREATE TABLE a AS SELECT 'data'::text;
| |
Writing session | Reading session |
BEGIN;
ALTER TABLE a
ADD COLUMN b int DEFAULT 0;
|
|
BEGIN ISOLATION LEVEL REPEATABLE READ;
SELECT * FROM a;
-- blocks behind session #1
| |
COMMIT;
|
|
-- session unblocks after #1 commits
text | b
------+---
(0 rows)
|
Historic issues
- In PostgreSQL 8.2 and earlier, the CLUSTER command could cause DELETEd/UPDATEd rows to disappear for older snapshots.