PTRACK incremental backups

Annotation

Using this patch, PostgreSQL can track page changes on the fly. Each time a relation page is updated, this page is marked in a special PTRACK bitmap for this relation. As one page requires just one bit in the PTRACK fork, such bitmaps are quite small. Tracking implies some minor overhead on the database server operation but speeds up incremental backups significantly. Another advantage of PTRACK is that it provides page-level granularity for incremental backups.

Incremental Backup Algorithms. Overview.

Incremental backups only store the data that has changed since the previous backup. It allows to decrease the backup size and speed up backup operations. There are several ways to implement them in PostgreSQL architecture:

1. Use file modification time as a marker that the file has changed.

Pros: This approach is easy to implement and it works quite fast.
Cons: It is unsafe to rely on system clock and filesystem timestamps. Only file-level granularity.

2. Compute file checksums and compare them.

Pros: This approach seems safe and easy to implement.
Cons: We have to read all the files in the instance, so it won't speed up the backup that much. Only file-level granularity.

3. LSN-based mechanisms. Backup pages with LSN >= last backup LSN.

Pros: Easy to implement. Page-level granularity.
Cons: We have to read all the files in the instance, so it won't speed up the backup that much. Some commands do not update page LSNs and require some awkward workaround.

4. Scan all WAL files in the archive since the previous backup and collect information about changed pages.

Pros: Not that difficult to implement and reliable. Page-level granularity.
Cons: WAL archive must be available. It takes quite some time to read and parse WAL.

5. Track page changes on the fly. (PTRACK)

Pros: Backup is fast. Backup tool only needs to read changed blocks. Page-level granularity.
Cons: Not so easy to implement and especially hard to ensure that it works correctly in all scenarios. It implies some overhead to the database server operation.

6. Various combined approaches. For example, we can aggregate information about changed pages from WAL at each checkpoint. This also brings some difficulties, such as the need to merge lists of changes.

How to try PTRACK?

ptrack_10.1_v1.4.patch - https://gist.github.com/lubennikovaav/475a1ac3d394ea08231966a07fecdbde.

ptrack_9.6.6_v1.4.patch - https://gist.github.com/lubennikovaav/6e99048ee20b7d697c41fb2cac648d30

Spoiler

Please consider this patch and README as a proof of concept. It can be improved in many ways, but in its current state PTRACK is a stable prototype, reviewed and tested well enough to find many non-trivial corner cases and subtle problems. And any discussion of change track algorithm must be aware of them. Feel free to share your concerns and point out any shortcomings of the idea or the implementation.

Since ptrack is basically just an API for use in backup tools, it is impossible to test the patch independently. Now it is integrated with our backup utility, called pg_probackup. You can find it here https://github.com/postgrespro/pg_probackup Let me know if you find the documentation too complicated, I'll write a brief How-to for ptrack backups.

Interface Routines (PostgreSQL Side)

ptrack_add_block() - Sets a bit to track a dirty page.
ptrack_add_block_redo() - Sets a bit to track a recovered page.
create_ptrack_init_file() - Creates PTRACK_INIT_FILE in the given database directory.

API (Backup utility side)

pg_ptrack_version() - Returns PTRACK version currently in use.
pg_ptrack_control_lsn() - Gets LSN from ptrack_control file.
pg_ptrack_clear() - Resets bits in all PTRACK files. This function must be called for each database in the cluster.
pg_ptrack_get_and_clear(Oid tablespace_oid, Oid table_oid) - Reads a PTRACK file for the given relation and resets it. Returns the PTRACK content as bytea. It is essential to receive and clear the map atomically in order to avoid losing PTRACK bits because of race conditions. (Imagine that your backup tool reads the map, then some blocks of the relation are updated and the ptrack bits are set, after that the backup tool cleans up the map and resets ptrack_clear_lsn. So, we may lose some of the updates). This function must be called for each database in the cluster.
pg_ptrack_init_get_and_clear(Oid db_oid, Oid tablespace_oid) - Checks whether PTRACK_INIT_FILE exists in the given database and deletes it. Returns true if the file was found. This function is analogous to pg_ptrack_get_and_clear(), but it handles directory-level changes (i.e. CREATE DATABASE, ALTER DATABASE SET TABLESPACE). This function must be called for each database in the cluster.

Implementation Overview

PTRACK is stored as a relation fork

PTRACK is stored alongside the main relation data in a separate relation fork, named after the filenode number of the relation, plus a _ptrack suffix. For example, if the filenode of a relation is 12345, the PTRACK is stored in a file called 12345_ptrack, in the same directory as the main relation file. PTRACK is supported by all kinds of PostgreSQL persistent relations, namely tables and TOAST tables, all types of indexes, materialized views and sequences. It tracks changes of the main fork pages only.

PTRACK contains one bit per page

Thus, one PTRACK page can contain information about 65344 blocks of the relation. It means that with the standard 8kB pages, it will take one PTRACK page per ~500MB, which seems to be an acceptable overhead. False positives are not considered as bugs, though we should try to decrease their number to reduce the incremental backup size. Currently, there are quite a few cases we can meet false positive. One of them is SpGistUpdateMetaPage(), which uses Conditional Lock on the buffer. False negatives are unacceptable and should be considered as critical bugs.

A PTRACK bit is set each time we call MarkBufferDirty(buf)

The patch is pretty big and inclusive because there is no function from which ptrack_add_block() can be called, and we have to add a lot of one-line function calls before START_CRIT_SECTION(). We might be able pass relation information into the MarkBufferDirty(), parse it there and call ptrack_add_block() when applicable. Thus, we won't miss any buffer dirtied by the standard mechanism. Since there are about two hundred calls of MarkBufferDirty(), refactoring will cause quite a lot of changes as well.

PTRACK also handles operations that do not go through Shared Buffers

Some functions (for some reasons I do not fully understand) write data bypassing Buffer Management. 1. Some of these functions rely on the fact that the relation will be fsync'd before commit using smgrimmedsync():

creation of the _init fork (also, ambuildempty)
creation of the new btree index
data load via COPY FROM.
VACUUM FULL
CLUSTER

In these cases we use ptrack_add_block() to track changes.

2. Other functions simply copy the files, unaware of their content: Note that these operations do not update page LSNs. So any LSN-based incremental backup must know how to handle them.

ALTER DATABASE SET TABLESPACE - movedb()
CREATE DATABASE

To track these directory-level changes, PTRACK_INIT_FILE is created after copydir(). If we find PTRACK_INIT_FILE while performing an incremental backup, all "*_ptrack" files in this directory are ignored and the content of the directory is copied entirely.

PTRACK does not track unlogged changes

Hint bits that use MarkBufferDirtyHint() and other unlogged hint updates (i.e. SpGistUpdateMetaPage()) are not reflected in PTRACK.

PTRACK changes are replayed at the REDO stage of recovery

PTRACK itself is UNLOGGED, but it successfully survives crash and restore. To implement that, we call ptrack_add_block_redo() from each redo function that dirties the buffer. Like in the regular case, I didn't manage to find a single function that can wrap these calls. One more specific thing about ptrack_add_block_redo() is that we do not have Relation structure, so we use RelFileNode and CreateFakeRelcacheEntry() instead. Personally, I find this solution pretty strange and ugly, but I have no idea how to do it better.

PTRACK works on replica

Since PTRACK bits are also set on WAL replay, replica also has its own PTRACK maps. PTRACK backup on master and replica can be performed independently because PTRACK clear actions are not logged and not streamed.

PTRACK can be enabled and disabled via GUC

The following GUC variable enables page tracking:

   ptrack_enable = on

It is defined in SIGHUP GucContext, which means that the option can be set at postmaster startup or by changing the configuration file and sending the HUP signal to the postmaster or a backend process.

There is a control file "global/ptrack_control" that contains only one value -- *ptrack_enabled_lsn*. If the server started with ptrack_enable = off, *ptrack_enabled_lsn* is set to InvalidXLogRecPtr. Otherwise, it is set to the current LSN. After a ptrack_clear() call, the *ptrack_enabled_lsn* value is updated to ensure that we track all changes starting from this LSN. Based on the *ptrack_enabled_lsn* value, we can determine if it's legal to perform an incremental backup, or we have lost PTRACK mapping since the previous backup and must take a full backup before the next incremental PTRACK backup. If a backup failed or was interrupted, some relations can already have their PTRACK forks cleared, so the next incremental backup will be incomplete.

PTRACK backup cannot be done after a failed incremental backup

PTRACK incremental backup should fail after a failed incremental backup (no matter page or ptrack). Failed backups erase some or all PTRACK information, so the PTRACK mode cannot guarantee that all changes since the last full backup will be captured. In this situation, a user must take a full backup before the next incremental PTRACK backup. For the same reason, PTRACK backup can only be exclusive.

PTRACK must never ever lose page changes

To ensure all page changes are preserved, we wrote a bunch of tests. They cover incremental PTRACK backup of the following objects:

tables, all types of indexes, sequences, materialized views
multiple segment files
newly created databases
tables moved to another tablespace
tables filled with data via COPY FROM.
instances after the server crash and restore.
Are there any other specific scenarios we must test?

Ideas for Further Development

We can store an extra bit for differential backups
We may be able to implement non-exclusive PTRACK backups

Related Threads

On markers of changed data http://www.postgresql-archive.org/On-markers-of-changed-data-td5986909.html
Hooks to track changed pages for backup purposes http://www.postgresql-archive.org/Hooks-to-track-changed-pages-for-backup-purposes-td5980862.html