Support for Index-only scans for GIST GSoC 2014

From PostgreSQL wiki
Jump to navigationJump to search

Introduction

This project will allow to implement Index-only scans for some data types indexed with GIST. Currently GiST index don’t have index-only scan functionality. Index-only scans are a major performance feature added to Postgres 9.2. They allow certain types of queries to be satisfied just by retrieving data from indexes, and not from tables. This feature for B-trees (implemented since PostgreSQL-9.2) allows doing certain types of queries significantly faster. This is achieved by reduction in the amount of I/O necessary to satisfy queries. It will be useful to implement this feature for user defined data types that use GiST index.

Benefits to the PostgreSQL Community

Faster GiST index search would be actual for many PostgreSQL applications (for example some GIS systems).

Quantifiable results

Acceleration of GiST index search.

Details

The GiST is a balanced tree structure like a B-tree, containing <key, pointer> pairs. GiST key is a member of a user-defined class, and represents some property that is true of all data items reachable from the pointer associated with the key. The GiST provides a possibility to create custom data types with indexed access methods and extensible set of queries. There are seven methods that an index operator class for GiST must provide, and an eighth that is optional.

  • consistent
  • union
  • compress
  • decompress
  • penalty
  • picksplit
  • equal
  • distance (optional)

I’m going to create new optional method fetch() in addition to existing. Thus user can create a method of retrieving data from the index without loss. It will be used when performing search queries to speed data retrieving.

TODO

  • support for one-column indexes (Done)
  • support for multi-column indexes (Done)
  • implementing the fetch-function for the built-in opclasses
    • box (Done)
    • circle (Dropped)
    • point (Done)
  • Create new regression test (Done)
  • Clear elog comments to make code readable (Done)
  • Update documentation (Done)

Additional goals

  • implementing the fetch-function for the rest modules and opclasses
    • inet
    • range
    • btree_gist
  • update related documentation

Project Schedule

until May 31

  • create a public repository to the project ([[1]])
  • read what has already been discussed by the community about the project
  • learn about some PostgreSQL internals
  • first, approximate implementation for one-column index

June 1 - June 30 (I’ve got exams in this month so I haven’t much time to work on project)

  • test one-column index version
  • analyze the results of applying index-only scans for GIST
  • solve architecture questions with help of community:
    • changes in AM interface required for application multi-column Index-only scans
  • mentor review the work in progress

July 1 - July 31

  • implementation of the second prototype:
    • support for Index-only scans for multi-column GIST

August 1 - August 20

  • do the adjustments based on the community feedback
  • final mentor review

About the proponent

Anastasia Lubennikova

I participate in GSoC firstly, and before this competition didn't work with Open Source projects. This project is very interesting to me and I hope that my work will be useful for PostgreSQL community.