Free Space Map Problems

From PostgreSQL wiki

Jump to: navigation, search



PostgreSQL relations (both tables and indexes) contain free-space maps (FSM), which provide hints about the location of free space within the relation. Corruption of a free-space map typically causes errors like "could not read block XXX: read only 0 of 8192 bytes" during insertions and updates, due to the system following such a hint to a block that doesn't actually exist in the file. This page discusses ways to detect and fix such problems.

Known bugs causing FSM corruption

PostgreSQL versions 9.6 (before 9.6.1), 9.5 (before 9.5.5), 9.4 (before 9.4.10), and 9.3 (before 9.3.15) contain a bug that causes failure to make adequate WAL log entries when a FSM is truncated as part of truncating its relation. A database crash-and-restart shortly after such an event can lead to corrupted FSMs. In a replication environment, standby servers will receive incorrect WAL data causing them to create corrupted FSMs locally, meaning that standbys are likely to have corrupted data even if the master is valid. On a standby, the corruption causes no harm as long as the server remains a standby (since it won't use the FSM), but problems will appear on a standby that is promoted to master.

In addition to these known bugs in PostgreSQL itself, operating-system bugs or hardware problems could result in corrupted FSM files.

Detection of broken FSM files

First it is necessary to detect which relations contain corrupt FSMs. This analysis method makes use of the extension pg_freespacemap, so it needs to be enabled first:

CREATE EXTENSION pg_freespacemap;

Then run the following query, as superuser, to look for corrupted FSMs within the current database:

SELECT oid::regclass AS relname,
       pg_relation_filepath(oid) || '_fsm' AS fsm
FROM pg_class,
     CAST(current_setting('block_size') AS BIGINT) AS bs
WHERE relkind IN ('r', 'i', 't', 'm') AND EXISTS
   generate_series(pg_relation_size(oid) / bs,
                   (pg_relation_size(oid, 'fsm') - 2*bs) / 2) AS blk
   WHERE pg_freespace(oid, blk) > 0);

If this returns an empty list, the current database contains no problems. If not, the file names returned in the 'fsm' column need to be deleted manually. See below for more details. Note that the file names are given relative to $PGDATA (the cluster data directory).

Be sure to repeat this test in each database of the cluster. In a replicated configuration, you'll need to check each standby individually, along with the master.

Cleanup of broken FSM files

With the list of corrupted FSM files in hand, stop the server (performing a clean shutdown, ie not "pg_ctl stop -m immediate"), then manually remove each invalid FSM file, then restart the server.

Afterwards, it's a good idea to run VACUUM on each table that had a corrupt FSM (either in the table proper, or an index), so as to rebuild corrected FSM data. But note that this can be costly for large tables. For tables that receive steady streams of updates, it can be expected that autovacuum will take care of rebuilding the FSM eventually.

In a replicated configuration, remove bad files from the standby servers first, then from the master, then do VACUUMs on the master to cause FSMs to be rebuilt.

Personal tools