v0.2 12 Dec 2007
Postgres supports the concept of freezing tuples, so they can live forever within the database without needing further writes. Currently there is no command that will guarantee that a table has been completely frozen. This makes it difficult to reliably write data files to WORM media for longer term archiving. (WORM means Write-Once, Read-Many). It's also a pain having to VACUUM a large table again just because a small number of rows need to be frozen.
So we need a DDL command that will ensure all tuples are frozen and then mark the table as read-only. Ideally, we would like to do this in a way that doesn't hold long full table locks, since we want the data to remain accessible at all times.
So... VACUUM FREEZE table SET READ ONLY;
would be my first thought, but I'm guessing everybody will press me towards supporting the more obvious
ALTER TABLE table SET READ ONLY;
This command will place a ShareLock (only) on the table, preventing anybody from writing to the table while we freeze it. The ShareLock is incompatible with any transaction that has written to the table, so when we acquire the lock all writers to the table will have completed. We then run the equivalent of a VACUUM FREEZE which will then be able to freeze *all* rows in one pass (rather than all except the most recent).
The freeze pass will progress until it sees a potentially invisible row, i.e. one that has been written by a recent TransactionId. At that point the ALTER TABLE will wait for that TransactionId to complete, before continuing to progress. We assume that there are few rows that will be potentially invisible, so we don't bother to update OldestXmin as we progress through the table. So a table recently updated by a long running transaction could cause the ALTER TABLE to wait for an extended period, much longer than the time for the table scan.
On completion of the freeze pass we will then update the pg_class entry to show that it is now read-only, so we will emulate the way VACUUM does this.
This form of the ALTER TABLE command will need to be mangled so it can only run outside of a transaction block and also so it takes only a ShareLock rather than an AccessExclusiveLock.
Reversing the process is simpler, since we only have to turn off the flag in pg_class:
ALTER TABLE table SET READ WRITE;
possibly able to do this without grabbing an AccessExclusiveLock, though that isn't an important part of this implementation.
Read-only tables never need VACUUMing, so we would be able to make autovacuum and explicit vacuum ignore them.
Read-only tables may not be written to, yet would still allow implicit or explicit INSERT, UPDATE and DELETE privileges to be held on the table. Attempts to write to the table will result in a specific "read only table cannot be modified" ERROR. This allows a table to be placed into read-only mode for long periods of time and flipped back to read-write if some updates are required. That is useful for various performance reasons, see later. We can't use the privilege mechanism to prevent writes since superusers bypass them. (Thoughts?)
Requests for tuple-level SHARE locks (e.g. FOR SHARE) against read-only tables will be ignored, since they are effectively already there. So we don't need to change the internals of the locking, nor edit the RI code to remove the call to SHARE lock referenced tables. Do this during post-parse analysis.
Tables can be copied to WORM media by using
ALTER TABLE table SET TABLESPACE tblspc;
This would also use a ShareLock rather than an AccessExclusiveLock, piggy-backing off the work mentioned above.
Running SET TABLESPACE and SET READ ONLY at the same time might sound like a good plan, but ISTM will require two fairly different code paths, so if we do it at all it will be a later addition.