Bitmap Indexes

From PostgreSQL wiki
Jump to navigationJump to search

Implementation Overview

This page details the on-disk bitmap index access method developed for PostgreSQL.

History

While the on-disk bitmap index access method was originally developed for Greenplum's Bizgres system, it has been ported and altered to meet Postgres standards.

  • Bizgres Implementation
    • Ayush Parashar (Greenplum)
    • Jie Zhang (Greenplum)
    • Mark Kirkwood (Greenplum)
    • Gavin Sherry (Greenplum)
  • PostgreSQL Implementation
    • Gavin Sherry (Greenplum)
      • Initial port for 8.3
    • Dr. Gianni Ciolli and Gabriele Bartolini (2ndQuadrant Italia)
      • Bug fixes, refactoring, and general clean-up for 8.4
    • Jonah H. Harris (myYearbook.com) & Simon Riggs (2ndQuadrant)
      • Bug fixes, refactoring, optimization, and general clean-up for 8.5

TODO

  • Add TODO items... :-)

Concepts

General

Compared to tree-based indexing algorithms, on-disk bitmap indexes provide a substantial space and performance advantage for low-cardinality, read-mostly data. The reason for this advantage is due primarily to I/O and CPU savings.

In our implementation, a compressed bitmap vector is created for each distinct key value. These bitmap vectors can be used not only to locate all occurrences of a single key value within a table, but also multiple key values using bitwise operations.

first_name last_name gender
John Smith M
David Jones M
Michael Johnson M
Chris Lee M
Mike Brown M
Mark Williams M
Paul Rodriguez M
Daniel Garcia M
James Gonzalez M
Maria Lopez F
Sarah Martinez F
Laura Martin F
Robert Perez M
Lisa Miller F
Jennifer Taylor F
Andrea Thomas F
Steve Wilson M
Peter Davis M
Kevin Khan M
Jason Ali M
Jessica Singh F
Michelle Sanchez F
Karen Anderson F
Joe Hernandez M
Brian Chan M
Alex Ahmed M
Richard White M
Linda Wong F
Julie Thompson F
Anna Jackson F
Andrew Kumar M
Mary Moore F
Eric Gomez M
Tom Diaz M
Thomas Walker M
Martin James M
Scott Green M
Matt Robinson M
Jim Clark M
Ali Young M
Tony Scott M
Carlos Chen M
Jeff Hall M
Marco Wright M
Ryan Adams M
Bob Allen M
Dave Hill M
Jose Rossi M

Unlike the first_name and last_name attributes, gender has a very low cardinality of 2 (M or F). As such, it's a good candidate for using a bitmap index.

Distinct Key Values

Distinct key values are handled by the List of Values (LOV) component...

Distinct Key Bitmaps

As described above, a single bit vector is associated with each distinct key value...

The uncompressed bitmap for each LOV item (M and F) is as follows:

M = 11111111 10001000 11110001 11100010 11111111 11111111
F = 00000000 01110111 00001110 00011101 00000000 00000000

Compression

Our implementation is based on a hybrid run-length compression algorithm in which a bit vector is represented by two components, the header and content sections.

Header Section (Header Words)

The header section contains bits, each of which corresponds to a word in the content section. If a bit in the header section is 1, then the corresponding word in the content section is a compressed word; if the bit is 0, then the corresponding word is not a compressed word.

Content Section (Content Words)

For a compressed word in the content section, the first bit in this word indicates whether 1s or 0s are compressed. The rest of the bits represent the value of "<the number of bits>/<word size>".

Example

Consider the uncompressed bitmap vector for LOV item M:

11111111 10001000 11110001 11100010 11111111 11111111

If the size of a word is set to 8, then an HRL compressed form for this bitmap vector is as follows:

header section:   00001
content section:  11111111 10001000 11110001 11100010 10000010

The first word is uncompressed.

The second word is uncompressed.

The fifth word is compressed and it's first bit is set to one. As such it compresses ones. As 10 evaluates to 2, this compressed word represents 16 bits of ones (2 * 8 = 16).

Consider the uncompressed bitmap vector for LOV item F:

00000000 01110111 00001110 00011101 00000000 00000000

If the size of a word is set to 8, then an HRL compressed form for this bitmap vector is as follows:

header section:   00001
content section:  00000000 01110111 00001110 00011101 00000010

The first word is uncompressed.

The second word is uncompressed.

The fifth word is compressed and it's first bit is set to zero. As such it compresses zeros. As 10 evaluates to 2, this compressed word represents 16 bits of zeroes (2 * 8 = 16).

Sort-based Optimization

As bitmap indexes are often used in data warehousing systems, pre-sorting the values during the ETL stage can offer much better compression.

Example

Consider the uncompressed bitmap vector for LOV item M:

00000000 00000111 11111111 11111111 11111111 11111111

If the size of a word is set to 8, then an HRL compressed form for this bitmap vector is as follows:

header section:   001
content section:  00000000 00000111 10000100

The first word is uncompressed.

The second word is uncompressed.

The third word is compressed and it's first bit is set to one. As such it compresses ones. As 100 evaluates to 4, this compressed word represents 32 bits of ones (4 * 8 = 32).

Consider the uncompressed bitmap vector for LOV item F:

11111111 11111000 00000000 00000000 00000000 00000000

If the size of a word is set to 8, then an HRL compressed form for this bitmap vector is as follows:

header section:   001
content section:  11111111 11111000 00000100

The first word is uncompressed.

The second word is uncompressed.

The third word is compressed and it's first bit is set to zero. As such it compresses zeroes. As 100 evaluates to 4, this compressed word represents 32 bits of zeroes (3 * 8 = 24).

Operation under Various Use-Cases

The following sections describe the various processes bitmap indexes follow for different circumstances.

Index Creation

Tuple Insertion

Tuple Update

Tuple Deletion

Index Scan

VACUUM

Run-Time Behavior

Write Ahead Logging

Changes to bitmap indexes are WAL-logged and safe from crashes.

Locking

Handling of INSERT/UPDATE

VACUUM

Performance

Index Creation

Index Size

Index Performance

Benchmark Kit

Page Inspection

General

bm_page_headers()

Meta Pages

bm_metap()

LOV Pages

bm_lov_page_stats()

Bitmap Pages

bm_bmv_page_stats()

Page Layouts

The on-disk bitmap index implementation consists of several page layouts:

Meta Page

This page contains meta-data about an index, is always stored as page zero, and is referred to as BM_METAPAGE.

+----------------+---------------------------------+
| PageHeaderData | BMMetaPageData                  |
+----------------+---------------------------------+
|                ^ pd_lower                        |
|                                                  |
|                                                  |
|                                                  |
|                                                  |
|                                                  |
|                                                  |
|                                                  |
|                                                  |
+--------------------------------------------------+
                               pd_upper/pd_special ^

List of Values (LOV) Page

List of Values (LOV) pages to store a list of distinct values for attribute(s) to be indexed, some meta-data related to their corresponding bitmap vectors, and the pointers to their bitmap vectors. For each distinct value, there is a BMLOVItemData associated with it.

An LOV page maintains an array of BMLOVItemData instances, called LOV items. To quickly find the LOV item for a given value, we create a heap to maintain all distinct key values along with the block and offset numbers for their LOV items in LOV pages. That is, there are total "<number_of_attributes> + 2" attributes in this new heap. Along with this heap, we also create a new b-tree index on this heap using attribute(s) as keys. In this way, for any given value, we search this b-tree to quickly locate the applicable block and offset number for the given value's corresponding LOV item.

The first LOV page is reserved for NULL key values and is known as BM_LOV_STARTPAGE.

+----------------+---------------------------------+
| PageHeaderData | linp1 linp2 linp3 ...           |
+-----------+----+---------------------------------+
| ... linpN |                                      |
+-----------+--------------------------------------+
|           ^ pd_lower                             |
|                                                  |
|             v pd_upper                           |
+-------------+------------------------------------+
|             | BMLOVItemN ...                     |
+-------------+------------------------------------+
|             ... BMLOVItem3 BMLOVItem2 BMLOVItem1 |
+--------------------------------------------------+
                               pd_upper/pd_special ^

Bitmap Vector Page

This page layout represents a compressed bitmap vector.

Each bitmap page stores two parts of information: header words and content words. Each bit in the header words is corresponding to a word in the content words. If a bit in the header words is 1, then its corresponding content word is a compressed word. Otherwise, it is a literal word.

If a content word is a fill word, it means that there is a sequence of 0 bits or 1 bits. The most significant bit in this content word represents the bits in this sequence are 0s or 1s. The rest of bits store the value of "the number of bits / BM_WORD_SIZE".

In this block, two data structures are in use: BMBitmapVectorPageData and BMPageOpaqueData. BMBitmapVectorPageData is what contains the two fixed-size arrays: header words and content words; both arrays are sized based on the compile-time Postgres block size definition. BMPageOpaqueData contains information relevant to that page, including the number of words used, the next page in the bitmap, and the last TID set in that page.

+----------------+---------------------------------+
| PageHeaderData | BMBitmapVectorPageData          |
+----------------+---------------------------------+
|                ^ pd_lower                        |
|                                                  |
|                                                  |
|                                                  |
|                                                  |
|                                                  |
|                               v pd_upper         |
+-------------------------------+------------------+
|                           ... | BMPageOpaqueData |
+-------------------------------+------------------+
                                ^ pd_special

Source Code Details

Files

As on-disk bitmap indexes are another access method, new files are laid out accordingly:

  • src/backend/access/bitmap/
    • bitmapattutil.c--Attribute-level List-of-Values (LOV) Handling
    • bitmap.c--Top-level access method interface functions
    • bitmapinsert.c--Bitmap Insertion
    • bitmappages.c--Bitmap-specific Page Handling
    • bitmapsearch.c--Bitmap Search/Iteration
    • bitmaputil.c--Generic Bitmap Utility Functions/VACUUM-handling
    • bitmapxlog.c--WAL-handling
  • src/include/access/
    • bitmap.h--On-disk Bitmap Access Method Header File

Naming Conventions

Functions

Instead of the underscore-prefix function naming used for the b-tree and hash access methods, bitmap index function naming is similar to the lowerCamelCase used in gin/gist with the prefix bm. Examples are as follows:

  • bmInitLOVPage
  • bmReadBuffer
  • bmWriteAndReleaseBuffer
  • bmSetTIDBit

Structures/Typedefs/Variables

C structures and typedef names follow UpperCamelCase naming. Structure members and variables follow lowerCamelCase naming unless a consistent name for that variable is already used within Postgres.

As an example, the structure member for content words would be, "contentWords", not content_words or contentwords.

Commentation

In the spirit of furthering PostgreSQL Source Code Documentation, all comments are doxygen-compatible. From bitmappages.c:

/**
 * Reads, locks, and returns a given relation's block as a buffer.
 *
 * @param[in]       reln        The relation whose block to read
 * @param[in]       blockNum    The block number to read
 * @param[in]       lockMode    The buffer content lock type to use
 *
 * @return  A locked buffer containing the requested block of the requested relation.
 */
Buffer
bmReadBuffer (Relation reln, BlockNumber blockNum, int lockMode)
{

Compile-Time Options

Bitmap Index Tracing

To aid in debugging and correctness verification, extensive tracing has been added to the bitmap index access method. If tracing is desired, it must be enabled at compile-time by defining (uncommenting) TRACE_BITMAP_INDEX in src/include/access/bitmap.h. When enabled, all relevant internal bitmap index information will be emitted at message levels DEBUG1-DEBUG5.

An example of tracing is as follows:

DEBUG:  bminsert (rel => "bitmapx1",
DEBUG:            ht_ctid => "(0,1)")
DEBUG:  bmPerformInsert (rel => "bitmapx1",
DEBUG:                   ht_ctid => "(0,1)")
DEBUG:  bmReadBuffer (reln => "bitmapx1",
DEBUG:                blockNum => 0,
DEBUG:                lockMode => 1)
DEBUG:  bmOpenLOVHeapAndIndex (metaPage->bm_lov_heapId => 66151,
DEBUG:                         metaPage->bm_lov_indexId => 66154,
DEBUG:                         lockMode => 3)
DEBUG:  bmInsertTuple (reln => "bitmapx1",
DEBUG:                 metaBuffer => 0,
DEBUG:                 tidBit => 1,
DEBUG:                 ht_ctid => "(0,1)",
DEBUG:                 lovHeap => "pg_bm_66150",
DEBUG:                 lovIndex => "pg_bm_66150_index",
DEBUG:                 use_wal => "TRUE")
...
DEBUG:  bmSetTIDBit (reln => "bitmapx1",
DEBUG:               lovBuffer => 1,
DEBUG:               lovOffset => 2,
DEBUG:               tidBit => 1,
DEBUG:               use_wal => "TRUE")

Catalog Changes

  • Bitmap indexes have been added as a resource manager for WAL (RM_BITMAP_ID)
  • Bitmap indexes have been added to pg_am, pg_amop, pg_proc, and friends.
  • pg_bitmapindex, a new namespace for containing database-wide bitmap index-specific data, has been created.