Reindex concurrently

From PostgreSQL wiki

Jump to: navigation, search

Contents

Background

Goal

The goal is to provide an alternative to REINDEX with lower-level locks that would allow read and write operations on the parent table while the reindex is performed. REINDEX takes an exclusive lock on the parent table, REINDEX CONCURRENTLY can use a ShareUpdateExclusiveLock, similarly to what its parents of the type CREATE/DROP INDEX CONCURRENTLY do. Some external utilities, like pg_reorg or pg_repack, provide similar features except that they need to take an ExclusiveLock on parent relation when swapping indexes.

Synopsis

The query grammar just consists in the addition of the already-reserved keyword CONCURRENTLY.

REINDEX { TABLE | INDEX | DATABASE | SYSTEM } [ CONCURRENTLY ] ...

The following characteristics apply:

  • With REINDEX DATABASE CONCURRENTLY, shared catalogs and system catalogs are reindexed normally, while other relations are reindexed concurrently
  • REINDEX DATABASE CONCURRENTLY rebuilds table relations one by one
  • REINDEX SYSTEM CONCURRENTLY is not supported
  • Similarly to REINDEX, toast relations are
  • REINDEX TABLE CONCURRENTLY includes in index rebuild all the indexes of relation, including toast indexes
  • Toast relations have been modified to support multiple indexes, based on the assumption that each toast relation

Process flow

REINDEX CONCURRENTLY consist of the following internal steps, each step is done successively on all the indexes present for the reindex.

  • Select the list of indexes to rebuild, take session lock on it with
  • Create a concurrent index, exactly similar to the one rebuilt, and mark it as invalid and not ready
  • Build the concurrent index, mark it as ready, or able to receive inserts
  • Validate the concurrent index but do not mark it as valid (or selectable by planner for SELECT queries), to avoid doubling the number of indexes if a failure happens while the concurrent index is seen as ready by the other sessions as well.
  • Swap relfilenode of index and its concurrent index. Note that the concurrent index does not need to be marked as ready, as only its relfilenode it switched.
  • Drop the concurrent index with similar process as DROP INDEX CONCURRENTLY
  • Release session locks

During this whole process, DML and SELECT queries can be used on parent tables without consequences on the index rebuild. Process is longer, might consume more ressources but does not impact production server by design like what a normal REINDEX would do. Process needs cache invalidations at multiple necessary places like after the index swap

Issues pointed out

  • When checking for past snapshots at swap phase, deadlocks can occur with other relations using at least ShareUpdateExclusiveLock.
  • Some concurrency issues pointed here, author was not able to reproduce them though...

Commit fest status

  • The patch is registered in the commit fest of 2013/11 here.
  • Latest version, as of 2014/01/21, is available here
  • Patch has been through 57 versions, the author himself lost the count
  • With a little bit of more love...
Personal tools