Serializable and Snapshot Transaction Isolation Levels
Serializable transaction isolation is attractive for shops with active development by many programmers against a complex schema because it guarantees data integrity with very little staff time -- if a transaction can be shown to always do the right thing when it is run alone (before or after any other transaction), it will always do the right thing in any mix of concurrent serializable transactions. Where conflicts with other transactions would result in an inconsistent state within the database, or an inconsistent view of the data, a serializable transaction will block or roll back to prevent the anomaly. The SQL standard provides a specific SQLSTATE for errors generated when a transaction rolls back for this reason, so that transactions can be retried automatically.
PostgreSQL does not currently support a full serializable isolation level. A request for serializable transaction isolation actually provides snapshot isolation. This has well known anomalies which can allow data corruption or inconsistent views of the data during concurrent transactions; although these anomalies only occur when certain patterns of read-write dependencies exist within a set of concurrent transactions. Where these patterns exist, the anomalies can be prevented by introducing conflicts through explicitly programmed locks or otherwise unnecessary writes to the database. Snapshot isolation is popular because performance is better than serializable isolation and the integrity guarantees which it does provide allow anomalies to be avoided or managed with reasonable effort in many environments.
Serializable Isolation Implementation Strategies
Techniques for implementing full serializable isolation have been published and in use in many database products for decades. The primary technique which has been used is Strict 2 Phase Locking (S2PL), which operates by blocking writes against data which has been read by concurrent transactions and blocking any access (read or write) against data which has been written by concurrent transactions. A cycle in a graph of blocking indicates a deadlock, requiring a rollback. Blocking and deadlocks under S2PL in high contention workloads can be debilitating, crippling throughput and response time.
A new technique for implementing full serializable isolation in an MVCC database appears in the literature beginning in 2008. This technique, known as Serializable Snapshot Isolation (SSI) has many of the advantages of snapshot isolation. In particular, reads don't block anything and writes don't block reads. Essentially, it runs snapshot isolation but monitors the read-write conflicts between transactions to identify dangerous structures in the transaction graph which indicate that a set of concurrent transactions might produce an anomaly, and rolls back transactions to ensure that no anomalies occur. It will produce some false positives (where a transaction is rolled back even though there would not have been an anomaly), but will never let an anomaly occur. In the two known prototype implementations, performance for many workloads (even with the need to restart transactions which are rolled back) is very close to snapshot isolation and generally far better than an S2PL implementation.
Both S2PL and SSI require some form of predicate locking to handle situations where reads conflict with inserts or with updates which move data into the selected range. PostgreSQL doesn't currently have predicate locking, so it would need to be added to support full serializable transactions under either strategy. Practical implementations of predicate locking generally involve acquiring locks against data as it is accessed, using multiple granularities (tuple, page, table, etc.) with escalation as needed to keep the lock count to a number which can be tracked within RAM structures. Coarse granularities can cause some false positive indications of conflict. The number of false positives can be influenced by plan choice.
The Consolidated Court Automation Programs (CCAP) agency of the Wisconsin State Courts is considering assignment of resources to the implementation a full serializable mode within PostgreSQL, as the most cost-effective means to ensure data integrity in its databases. To gather more information to support making a decision on this, Kevin Grittner is reviewing literature and PostgreSQL source code to identify implementation details and effort, and asking the community who else would be interested in participating in the effort. Jeff Davis has offered to help explore and implement designs. Bruce Momjian has offered to help with design. Robert Haas has offered to review the initial patch for locking changes. Others have contributed to the discussion. A decision on allocation of CCAP resources to the effort is likely to be made in January or February of 2010.
In general, the approach being taken is to try for the fastest possible implementation of serializable isolation level which allows no anomalies, even though it will have many false positives and very poor performance, and then optimize until the rollback rate and overall performance are within a range which would allow practical application. No existing isolation level would be removed, since not everyone will want to pay the performance price for true serializable behavior. An important goal is that for those not using serializable transaction isolation, the patch doesn't cause performance regression.
While it would not be until the end of the Implementation list that it would make sense to commit code to the main development repository, a git repository and milestone patches would be provided to the community in hopes of soliciting feedback on direction. The hope would be that such feedback would be informed by at least a familiarity with this Wiki page, and preferably a working knowledge of the literature on SSI and related issues.
Source Code Management
A git branch will be set up and publicly viewable. Information on read access will be inserted to this subsection once it is established. Write access will be granted to hackers who express an interest in contributing.
- Create a GUC to control whether a request, on the current connection, for serializable transaction isolation provides the current snapshot isolation implementation or the under-development serializable behavior.
- This GUC should only be referenced when setting or querying transaction isolation level, not scattered throughout the implementation.
- At all subsequent milestones, testing should be done to confirm that there is no impact on existing isolation levels.
- Implement serializable isolation in the simplest possible way, just to identify all the points where locks must be acquired.
- Acquire ACCESS EXCLUSIVE locks at table granularity for any data read.
- Obtain these at those points in the code where information would be available to obtain finer-grained locks.
- Avoid breaks in modularity where possible, and keep close track of any, as they will need to be fixed in later refinement steps.
- Ensure that no serialization anomalies can appear with the new locks. Test with custom data type, operators, and index types.
- Transform the table level ACCESS EXCLUSIVE locks for reads to table level SIREAD locks.
- Use a new lock method for SIREAD locks.
- Don't worry about maintaining proper lock lifespan yet.
- Serializable behavior will be broken again at this step; expect serializable tests to fail, but confirm nothing else is broken.
- Implement correct SIREAD lock lifespan.
- Some minimal information on serializable transactions must be kept past commit.
- When a serializable transaction terminates, check whether it is the oldest; if it is, remove any committed serializable transactions which are no longer needed, along with their SIREAD locks.
- Ensure that write locks for serializable transactions are available when needed.
- This probably means keeping them in RAM, either in the same lock method as the SIREAD locks or a fourth lock method, in addition to the current disk-based techniques.
- Dodge the issue of granularity promotion for the time being. This means tests at this point must only impact a fairly small number of rows.
- Unlike SIREAD locks, these can always be cleaned up on transaction completion, whether COMMIT or ROLLBACK.
- Implement SSI, without worrying about optimizations.
- This puts us back to a point where we should not see snapshot isolation anomalies.
- We now have a working, but very fragile, implementation of SSI.
- Implement lock granularity promotion for write locks.
- This removes the limitation on how many rows can be affected by writes.
- We now have a working SSI implementation which is not fragile. Everything from here on out is optimization.
- Obtain SIREAD locks at finer granularity where possible, and promote as necessary to keep RAM usage under control.
- This puts us at a point where we're flirting with a production quality implementation. It would make sense to start benchmarking performance at this milestone.
- Not only should we be testing that there are no anomalies from here out, but we should have a suite of tests to show possibly-avoidable false positives, where we generate serialization errors and roll back transactions where we might be able to avoid that. This will allow us to measure progress on this front and spot any regressions.
- Work on community acceptance of patch.
- Identify and prioritize optimization opportunities. Work on these one at a time, with testing and benchmarks after each.
- Identify all user-visible changes with patch.
- Confirm final form, such as wording of messages, SQLSTATE values, etc.
- Modify documentation as needed.
- Expect more eyes on the code than in previous phases. Discuss issues and adjust code as needed.
For this development effort to succeed, it is absolutely necessary to have some client application which allows execution of test scripts with specific interleaving of statements run against multiple backends. It would be important to have some way to use this technique within the 'make check' process before the patch is committed. No such client has yet been identified, so it may need to be developed to support progress of the serializable isolation level patch.
Michael J. Cahill, Uwe Röhm, and Alan D. Fekete. 2008. Serializable isolation for snapshot databases. In SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 729–738, New York, NY, USA. ACM.