Implementation Overview

This page details the on-disk bitmap index access method developed for PostgreSQL.

History

While the on-disk bitmap index access method was originally developed for Greenplum's Bizgres system, it has been ported and altered to meet Postgres standards.

Bizgres Implementation
- Ayush Parashar (Greenplum)
- Jie Zhang (Greenplum)
- Mark Kirkwood (Greenplum)
- Gavin Sherry (Greenplum)
PostgreSQL Implementation
- Gavin Sherry (Greenplum)
  - Initial port for 8.3
- Dr. Gianni Ciolli and Gabriele Bartolini (2ndQuadrant Italia)
  - Bug fixes, refactoring, and general clean-up for 8.4
- Jonah H. Harris (myYearbook.com) & Simon Riggs (2ndQuadrant)
  - Bug fixes, refactoring, optimization, and general clean-up for 8.5

TODO

Add TODO items... :-)

Concepts

General

Compared to tree-based indexing algorithms, on-disk bitmap indexes provide a substantial space and performance advantage for low-cardinality, read-mostly data. The reason for this advantage is due primarily to I/O and CPU savings.

In our implementation, a compressed bitmap vector is created for each distinct key value. These bitmap vectors can be used not only to locate all occurrences of a single key value within a table, but also multiple key values using bitwise operations.

first_name	last_name	gender
John	Smith	M
David	Jones	M
Michael	Johnson	M
Chris	Lee	M
Mike	Brown	M
Mark	Williams	M
Paul	Rodriguez	M
Daniel	Garcia	M
James	Gonzalez	M
Maria	Lopez	F
Sarah	Martinez	F
Laura	Martin	F
Robert	Perez	M
Lisa	Miller	F
Jennifer	Taylor	F
Andrea	Thomas	F
Steve	Wilson	M
Peter	Davis	M
Kevin	Khan	M
Jason	Ali	M
Jessica	Singh	F
Michelle	Sanchez	F
Karen	Anderson	F
Joe	Hernandez	M
Brian	Chan	M
Alex	Ahmed	M
Richard	White	M
Linda	Wong	F
Julie	Thompson	F
Anna	Jackson	F
Andrew	Kumar	M
Mary	Moore	F
Eric	Gomez	M
Tom	Diaz	M
Thomas	Walker	M
Martin	James	M
Scott	Green	M
Matt	Robinson	M
Jim	Clark	M
Ali	Young	M
Tony	Scott	M
Carlos	Chen	M
Jeff	Hall	M
Marco	Wright	M
Ryan	Adams	M
Bob	Allen	M
Dave	Hill	M
Jose	Rossi	M

Unlike the first_name and last_name attributes, gender has a very low cardinality of 2 (M or F). As such, it's a good candidate for using a bitmap index.

Distinct Key Values

Distinct key values are handled by the List of Values (LOV) component...

Distinct Key Bitmaps

As described above, a single bit vector is associated with each distinct key value...

The uncompressed bitmap for each LOV item (M and F) is as follows:

M = 11111111 10001000 11110001 11100010 11111111 11111111
F = 00000000 01110111 00001110 00011101 00000000 00000000

Compression

Our implementation is based on a hybrid run-length compression algorithm in which a bit vector is represented by two components, the header and content sections.

Header Section (Header Words)

The header section contains bits, each of which corresponds to a word in the content section. If a bit in the header section is 1, then the corresponding word in the content section is a compressed word; if the bit is 0, then the corresponding word is not a compressed word.

Content Section (Content Words)

For a compressed word in the content section, the first bit in this word indicates whether 1s or 0s are compressed. The rest of the bits represent the value of "<the number of bits>/<word size>".

Example

Consider the uncompressed bitmap vector for LOV item M:

11111111 10001000 11110001 11100010 11111111 11111111

If the size of a word is set to 8, then an HRL compressed form for this bitmap vector is as follows:

header section:   00001
content section:  11111111 10001000 11110001 11100010 10000010

The first word is uncompressed.

The second word is uncompressed.

The fifth word is compressed and it's first bit is set to one. As such it compresses ones. As 10 evaluates to 2, this compressed word represents 16 bits of ones (2 * 8 = 16).

Consider the uncompressed bitmap vector for LOV item F:

00000000 01110111 00001110 00011101 00000000 00000000

If the size of a word is set to 8, then an HRL compressed form for this bitmap vector is as follows:

header section:   00001
content section:  00000000 01110111 00001110 00011101 00000010

The first word is uncompressed.

The second word is uncompressed.

The fifth word is compressed and it's first bit is set to zero. As such it compresses zeros. As 10 evaluates to 2, this compressed word represents 16 bits of zeroes (2 * 8 = 16).

Sort-based Optimization

As bitmap indexes are often used in data warehousing systems, pre-sorting the values during the ETL stage can offer much better compression.

Example

Consider the uncompressed bitmap vector for LOV item M:

00000000 00000111 11111111 11111111 11111111 11111111

If the size of a word is set to 8, then an HRL compressed form for this bitmap vector is as follows:

header section:   001
content section:  00000000 00000111 10000100

The first word is uncompressed.

The second word is uncompressed.

The third word is compressed and it's first bit is set to one. As such it compresses ones. As 100 evaluates to 4, this compressed word represents 32 bits of ones (4 * 8 = 32).

Consider the uncompressed bitmap vector for LOV item F:

11111111 11111000 00000000 00000000 00000000 00000000

If the size of a word is set to 8, then an HRL compressed form for this bitmap vector is as follows:

header section:   001
content section:  11111111 11111000 00000100

The first word is uncompressed.

The second word is uncompressed.

The third word is compressed and it's first bit is set to zero. As such it compresses zeroes. As 100 evaluates to 4, this compressed word represents 32 bits of zeroes (3 * 8 = 24).

Operation under Various Use-Cases

The following sections describe the various processes bitmap indexes follow for different circumstances.

Index Creation

Tuple Insertion

Tuple Update

Tuple Deletion

Index Scan

VACUUM

Run-Time Behavior

Write Ahead Logging

Changes to bitmap indexes are WAL-logged and safe from crashes.

Locking

Handling of INSERT/UPDATE

VACUUM

Performance

Index Creation

Index Size

Index Performance

Benchmark Kit

Page Inspection

General

bm_page_headers()

Meta Pages

bm_metap()

LOV Pages

bm_lov_page_stats()

Bitmap Pages

bm_bmv_page_stats()

Page Layouts

The on-disk bitmap index implementation consists of several page layouts:

Meta Page

This page contains meta-data about an index, is always stored as page zero, and is referred to as BM_METAPAGE.

+----------------+---------------------------------+
| PageHeaderData | BMMetaPageData                  |
+----------------+---------------------------------+
|                ^ pd_lower                        |
|                                                  |
|                                                  |
|                                                  |
|                                                  |
|                                                  |
|                                                  |
|                                                  |
|                                                  |
+--------------------------------------------------+
                               pd_upper/pd_special ^

List of Values (LOV) Page

List of Values (LOV) pages to store a list of distinct values for attribute(s) to be indexed, some meta-data related to their corresponding bitmap vectors, and the pointers to their bitmap vectors. For each distinct value, there is a BMLOVItemData associated with it.

An LOV page maintains an array of BMLOVItemData instances, called LOV items. To quickly find the LOV item for a given value, we create a heap to maintain all distinct key values along with the block and offset numbers for their LOV items in LOV pages. That is, there are total "<number_of_attributes> + 2" attributes in this new heap. Along with this heap, we also create a new b-tree index on this heap using attribute(s) as keys. In this way, for any given value, we search this b-tree to quickly locate the applicable block and offset number for the given value's corresponding LOV item.

The first LOV page is reserved for NULL key values and is known as BM_LOV_STARTPAGE.

+----------------+---------------------------------+
| PageHeaderData | linp1 linp2 linp3 ...           |
+-----------+----+---------------------------------+
| ... linpN |                                      |
+-----------+--------------------------------------+
|           ^ pd_lower                             |
|                                                  |
|             v pd_upper                           |
+-------------+------------------------------------+
|             | BMLOVItemN ...                     |
+-------------+------------------------------------+
|             ... BMLOVItem3 BMLOVItem2 BMLOVItem1 |
+--------------------------------------------------+
                               pd_upper/pd_special ^

Bitmap Vector Page

This page layout represents a compressed bitmap vector.

Each bitmap page stores two parts of information: header words and content words. Each bit in the header words is corresponding to a word in the content words. If a bit in the header words is 1, then its corresponding content word is a compressed word. Otherwise, it is a literal word.

If a content word is a fill word, it means that there is a sequence of 0 bits or 1 bits. The most significant bit in this content word represents the bits in this sequence are 0s or 1s. The rest of bits store the value of "the number of bits / BM_WORD_SIZE".

In this block, two data structures are in use: BMBitmapVectorPageData and BMPageOpaqueData. BMBitmapVectorPageData is what contains the two fixed-size arrays: header words and content words; both arrays are sized based on the compile-time Postgres block size definition. BMPageOpaqueData contains information relevant to that page, including the number of words used, the next page in the bitmap, and the last TID set in that page.

+----------------+---------------------------------+
| PageHeaderData | BMBitmapVectorPageData          |
+----------------+---------------------------------+
|                ^ pd_lower                        |
|                                                  |
|                                                  |
|                                                  |
|                                                  |
|                                                  |
|                               v pd_upper         |
+-------------------------------+------------------+
|                           ... | BMPageOpaqueData |
+-------------------------------+------------------+
                                ^ pd_special

Source Code Details

Files

As on-disk bitmap indexes are another access method, new files are laid out accordingly:

src/backend/access/bitmap/
- bitmapattutil.c--Attribute-level List-of-Values (LOV) Handling
- bitmap.c--Top-level access method interface functions
- bitmapinsert.c--Bitmap Insertion
- bitmappages.c--Bitmap-specific Page Handling
- bitmapsearch.c--Bitmap Search/Iteration
- bitmaputil.c--Generic Bitmap Utility Functions/VACUUM-handling
- bitmapxlog.c--WAL-handling
src/include/access/
- bitmap.h--On-disk Bitmap Access Method Header File

Naming Conventions

Functions

Instead of the underscore-prefix function naming used for the b-tree and hash access methods, bitmap index function naming is similar to the lowerCamelCase used in gin/gist with the prefix bm. Examples are as follows:

bmInitLOVPage
bmReadBuffer
bmWriteAndReleaseBuffer
bmSetTIDBit

Structures/Typedefs/Variables

C structures and typedef names follow UpperCamelCase naming. Structure members and variables follow lowerCamelCase naming unless a consistent name for that variable is already used within Postgres.

As an example, the structure member for content words would be, "contentWords", not content_words or contentwords.

Commentation

In the spirit of furthering PostgreSQL Source Code Documentation, all comments are doxygen-compatible. From bitmappages.c:

/**
 * Reads, locks, and returns a given relation's block as a buffer.
 *
 * @param[in]       reln        The relation whose block to read
 * @param[in]       blockNum    The block number to read
 * @param[in]       lockMode    The buffer content lock type to use
 *
 * @return  A locked buffer containing the requested block of the requested relation.
 */
Buffer
bmReadBuffer (Relation reln, BlockNumber blockNum, int lockMode)
{

Compile-Time Options

Bitmap Index Tracing

To aid in debugging and correctness verification, extensive tracing has been added to the bitmap index access method. If tracing is desired, it must be enabled at compile-time by defining (uncommenting) TRACE_BITMAP_INDEX in src/include/access/bitmap.h. When enabled, all relevant internal bitmap index information will be emitted at message levels DEBUG1-DEBUG5.

An example of tracing is as follows:

DEBUG:  bminsert (rel => "bitmapx1",
DEBUG:            ht_ctid => "(0,1)")
DEBUG:  bmPerformInsert (rel => "bitmapx1",
DEBUG:                   ht_ctid => "(0,1)")
DEBUG:  bmReadBuffer (reln => "bitmapx1",
DEBUG:                blockNum => 0,
DEBUG:                lockMode => 1)
DEBUG:  bmOpenLOVHeapAndIndex (metaPage->bm_lov_heapId => 66151,
DEBUG:                         metaPage->bm_lov_indexId => 66154,
DEBUG:                         lockMode => 3)
DEBUG:  bmInsertTuple (reln => "bitmapx1",
DEBUG:                 metaBuffer => 0,
DEBUG:                 tidBit => 1,
DEBUG:                 ht_ctid => "(0,1)",
DEBUG:                 lovHeap => "pg_bm_66150",
DEBUG:                 lovIndex => "pg_bm_66150_index",
DEBUG:                 use_wal => "TRUE")
...
DEBUG:  bmSetTIDBit (reln => "bitmapx1",
DEBUG:               lovBuffer => 1,
DEBUG:               lovOffset => 2,
DEBUG:               tidBit => 1,
DEBUG:               use_wal => "TRUE")

Catalog Changes

Bitmap indexes have been added as a resource manager for WAL (RM_BITMAP_ID)
Bitmap indexes have been added to pg_am, pg_amop, pg_proc, and friends.
pg_bitmapindex, a new namespace for containing database-wide bitmap index-specific data, has been created.

Bitmap Indexes

Implementation Overview

History

TODO

Concepts

General

Distinct Key Values

Distinct Key Bitmaps

Compression

Header Section (Header Words)

Content Section (Content Words)

Example

Sort-based Optimization

Example

Operation under Various Use-Cases

Index Creation

Tuple Insertion

Tuple Update

Tuple Deletion

Index Scan

VACUUM

Run-Time Behavior

Write Ahead Logging

Locking

Handling of INSERT/UPDATE

VACUUM

Performance

Index Creation

Index Size

Index Performance

Benchmark Kit

Page Inspection

General

Meta Pages

LOV Pages

Bitmap Pages

Page Layouts

Meta Page

List of Values (LOV) Page

Bitmap Vector Page

Source Code Details

Files

Naming Conventions

Functions

Structures/Typedefs/Variables

Commentation

Compile-Time Options

Bitmap Index Tracing

Catalog Changes

Navigation menu

Search