PGCon 2021 Fun With WAL

From PostgreSQL wiki
Jump to navigationJump to search

Notes from PGCon 2021 Unconference session "Fun with WAL"

Fun with WAL

- There are some cool things we could with the WAL

- Faster archive recovery

- Lazy or partial restore, only replay WAL when it's needed

- Push API to replace archive_command

Zenith architecture

                     WAL
   PostgreSQL  ---------------->     Page Server
   
                  GetPage@LSN
               ---------------->
               <----------------

- PostgreSQL streams the WAL to the page server using Streaming Replication.

- Page Server applies the WAL

- No (relation) data is stored in the PostgreSQL data directory

- smgr / md layer has been replaced with calls into the Page Server


Faster archive recovery

Perform WAL redo in two phases:

1. Scan the wal, make note of which record applies to which page.

2. Whenever you see a full-page image for a block, all the previous records for same block can be immediately thrown away. (This can be made much more effective by writing an extra full-page image e.g. every 1000 updates on the same page.)

3. After the first phase, wal redo can be performed separately for each relation or block. that's a better, more sequential I/O pattern.

Parallel WAL redo:

After the WAL has been split per relation, each relation can be restored in parallel

Instant Recovery:

You can actually start up the cluster, before WAL redo has finished. Whenever a page is accessed, replay all the WAL applicable to that record on demand.

http://wwwlgis.informatik.uni-kl.de/cms/fileadmin/publications/2017/PhD_Thesis_Caetano_Sauer.pdf

Lazy Restore

Restoring cluster from base backup requires copying all the data from the backup, and replaying all the WAL.

- If we split the WAL in the backup, per relation, it would be possible to restore only what's needed.

- with Lazy Restore, non-relational stuff, like clog, would be restored as usual.

- When a relation is accessed for the first time, it's fetched from the backup, on demand, and the WAL applicable to that relation is replayed


Problems / annoyances

Some things are currently not included in WAL records: - cmin/cmax - speculative insertion tokens

These are not needed for crash recovery, but are needed by the primary server

With some WAL records, it's complicated to decipher which blocks are affected. For example: - The visibility map and FSM updates are implicit with heap WAL records

- XLOG_SMGR_TRUNCATE truncates the heap, the FSM and VM in one operation

- pg_rewind suffers from these too


Synchronous replication woes

If a WAL record cannot be streamed out, we still write it to local disk.

There was discussion on this last year in PGCon...


Push API for WAL

To replace archive_command


Corrupt WAL and security

- We could use with more sanity checks in WAL redo routines

- Can we make a guarantee that the WAL redo routines can tolerate any corrupt WAL without crashing? (Currently we can't.)