Index-only scans

From PostgreSQL wiki

Jump to: navigation, search

Index-only scans, also called covering indexes, is a performance feature which allows an index to be used to satisfy a query without accessing the heap tuples. This is possible using the visibility map in cases where all tuples on a page are visible to all sessions.

TODO items include:

  • test speed improvement for single-row lookups
  • test speed improvement for scans of the entire index (this involves random I/O)
    • we can't scan the index in physical order like vacuum does
  • make the visibility map always accurate and crash-safe
  • modify the index API to allow index-only lookups
  • modify the optimizer to identify when index-only scans are useful
    • right now large index scans are rarely chosen
  • modify the executor to process index-only scans

Contents

Making the Visibility Map Crash-Safe

Currently, a heap page that has all-visible tuples is marked by vacuum as PD_ALL_VISIBLE and the visibility map (VM) bit is set. This is currently unlogged, and a crash could require these to be set again.

The complexity is that for index-only scans, the VM bit has meaning, and cannot be incorrectly set (though it can be incorrectly cleared because that would just result in additional heap access). If both PD_ALL_VISIBLE and the VM bit were to be set, and a crash resulted the VM bit being written to disk, but not the PD_ALL_VISIBLE bit, a later heap access that wrote a conditionally-visible row would not know to clear the VM bit, causing incorrect results for index-only scans.

The solution is to WAL log the VM set bit activity. This will cause full-page writes for the VM page, but this is much less than WAL-logging each heap page because a VM page represents many heap pages. This requires that the VM page not be written to disk until its VM-set WAL record is fsynced to disk. Also, during crash recovering, reading the VM-set WAL record would cause both the VM-set and heap PD_ALL_VISIBLE to be set.

It is still possible for PD_ALL_VISIBLE to be set and for the VM bit not to be set if the heap page was written to disk before the VM-set WAL record was fsync'ed to disk. In such cases, vacuum will clean this up, and it is also possible for index-only lookups to correct this as well.

Email Threads

Microfunding

The intention of this section is to add information that can help funding development of the feature. First, a list of people willing to fund part of the cost. Second, developers willing to take the task and hopefully the groups will meet somehow. No-one commits to anything by writing them on the list, except from being contacted.

Funders

  • Jesper Krogh <jk@novozymes.com>, Novozymes A/S, 2.000+€
  • Jim Nasby <jnasby@enovafinancial.com> / Enova Financial, $2,000+ USD

Developers

Personal tools