Bitmap Indexes
Implementation Overview
This page details the on-disk bitmap index access method developed for PostgreSQL.
History
While the on-disk bitmap index access method was originally developed for Greenplum's Bizgres system, it has been ported and altered to meet Postgres standards.
- Bizgres Implementation
- Ayush Parashar (Greenplum)
- Jie Zhang (Greenplum)
- Mark Kirkwood (Greenplum)
- Gavin Sherry (Greenplum)
- PostgreSQL Implementation
- Gavin Sherry (Greenplum)
- Initial port for 8.3
- Dr. Gianni Ciolli and Gabriele Bartolini (2ndQuadrant Italia)
- Bug fixes, refactoring, and general clean-up for 8.4
- Jonah H. Harris (myYearbook.com) & Simon Riggs (2ndQuadrant)
- Bug fixes, refactoring, optimization, and general clean-up for 8.5
- Gavin Sherry (Greenplum)
TODO
- Add TODO items... :-)
Concepts
General
Compared to tree-based indexing algorithms, on-disk bitmap indexes provide a substantial space and performance advantage for low-cardinality, read-mostly data. The reason for this advantage is due primarily to I/O and CPU savings.
In our implementation, a compressed bitmap vector is created for each distinct key value. These bitmap vectors can be used not only to locate all occurrences of a single key value within a table, but also multiple key values using bitwise operations.
first_name | last_name | gender |
---|---|---|
John | Smith | M |
David | Jones | M |
Michael | Johnson | M |
Chris | Lee | M |
Mike | Brown | M |
Mark | Williams | M |
Paul | Rodriguez | M |
Daniel | Garcia | M |
James | Gonzalez | M |
Maria | Lopez | F |
Sarah | Martinez | F |
Laura | Martin | F |
Robert | Perez | M |
Lisa | Miller | F |
Jennifer | Taylor | F |
Andrea | Thomas | F |
Steve | Wilson | M |
Peter | Davis | M |
Kevin | Khan | M |
Jason | Ali | M |
Jessica | Singh | F |
Michelle | Sanchez | F |
Karen | Anderson | F |
Joe | Hernandez | M |
Brian | Chan | M |
Alex | Ahmed | M |
Richard | White | M |
Linda | Wong | F |
Julie | Thompson | F |
Anna | Jackson | F |
Andrew | Kumar | M |
Mary | Moore | F |
Eric | Gomez | M |
Tom | Diaz | M |
Thomas | Walker | M |
Martin | James | M |
Scott | Green | M |
Matt | Robinson | M |
Jim | Clark | M |
Ali | Young | M |
Tony | Scott | M |
Carlos | Chen | M |
Jeff | Hall | M |
Marco | Wright | M |
Ryan | Adams | M |
Bob | Allen | M |
Dave | Hill | M |
Jose | Rossi | M |
Unlike the first_name and last_name attributes, gender has a very low cardinality of 2 (M or F). As such, it's a good candidate for using a bitmap index.
Distinct Key Values
Distinct key values are handled by the List of Values (LOV) component...
Distinct Key Bitmaps
As described above, a single bit vector is associated with each distinct key value...
The uncompressed bitmap for each LOV item (M and F) is as follows:
M = 11111111 10001000 11110001 11100010 11111111 11111111 F = 00000000 01110111 00001110 00011101 00000000 00000000
Compression
Our implementation is based on a hybrid run-length compression algorithm in which a bit vector is represented by two components, the header and content sections.
Header Section (Header Words)
The header section contains bits, each of which corresponds to a word in the content section. If a bit in the header section is 1, then the corresponding word in the content section is a compressed word; if the bit is 0, then the corresponding word is not a compressed word.
Content Section (Content Words)
For a compressed word in the content section, the first bit in this word indicates whether 1s or 0s are compressed. The rest of the bits represent the value of "<the number of bits>/<word size>".
Example
Consider the uncompressed bitmap vector for LOV item M:
11111111 10001000 11110001 11100010 11111111 11111111
If the size of a word is set to 8, then an HRL compressed form for this bitmap vector is as follows:
header section: 00001 content section: 11111111 10001000 11110001 11100010 10000010
The first word is uncompressed.
The second word is uncompressed.
The fifth word is compressed and it's first bit is set to one. As such it compresses ones. As 10 evaluates to 2, this compressed word represents 16 bits of ones (2 * 8 = 16).
Consider the uncompressed bitmap vector for LOV item F:
00000000 01110111 00001110 00011101 00000000 00000000
If the size of a word is set to 8, then an HRL compressed form for this bitmap vector is as follows:
header section: 00001 content section: 00000000 01110111 00001110 00011101 00000010
The first word is uncompressed.
The second word is uncompressed.
The fifth word is compressed and it's first bit is set to zero. As such it compresses zeros. As 10 evaluates to 2, this compressed word represents 16 bits of zeroes (2 * 8 = 16).
Sort-based Optimization
As bitmap indexes are often used in data warehousing systems, pre-sorting the values during the ETL stage can offer much better compression.
Example
Consider the uncompressed bitmap vector for LOV item M:
00000000 00000111 11111111 11111111 11111111 11111111
If the size of a word is set to 8, then an HRL compressed form for this bitmap vector is as follows:
header section: 001 content section: 00000000 00000111 10000100
The first word is uncompressed.
The second word is uncompressed.
The third word is compressed and it's first bit is set to one. As such it compresses ones. As 100 evaluates to 4, this compressed word represents 32 bits of ones (4 * 8 = 32).
Consider the uncompressed bitmap vector for LOV item F:
11111111 11111000 00000000 00000000 00000000 00000000
If the size of a word is set to 8, then an HRL compressed form for this bitmap vector is as follows:
header section: 001 content section: 11111111 11111000 00000100
The first word is uncompressed.
The second word is uncompressed.
The third word is compressed and it's first bit is set to zero. As such it compresses zeroes. As 100 evaluates to 4, this compressed word represents 32 bits of zeroes (3 * 8 = 24).
Operation under Various Use-Cases
The following sections describe the various processes bitmap indexes follow for different circumstances.
Index Creation
Tuple Insertion
Tuple Update
Tuple Deletion
Index Scan
VACUUM
Run-Time Behavior
Write Ahead Logging
Changes to bitmap indexes are WAL-logged and safe from crashes.
Locking
Handling of INSERT/UPDATE
VACUUM
Performance
Index Creation
Index Size
Index Performance
Benchmark Kit
Page Inspection
General
bm_page_headers()
Meta Pages
bm_metap()
LOV Pages
bm_lov_page_stats()
Bitmap Pages
bm_bmv_page_stats()
Page Layouts
The on-disk bitmap index implementation consists of several page layouts:
Meta Page
This page contains meta-data about an index, is always stored as page zero, and is referred to as BM_METAPAGE.
+----------------+---------------------------------+ | PageHeaderData | BMMetaPageData | +----------------+---------------------------------+ | ^ pd_lower | | | | | | | | | | | | | | | | | +--------------------------------------------------+ pd_upper/pd_special ^
List of Values (LOV) Page
List of Values (LOV) pages to store a list of distinct values for attribute(s) to be indexed, some meta-data related to their corresponding bitmap vectors, and the pointers to their bitmap vectors. For each distinct value, there is a BMLOVItemData associated with it.
An LOV page maintains an array of BMLOVItemData instances, called LOV items. To quickly find the LOV item for a given value, we create a heap to maintain all distinct key values along with the block and offset numbers for their LOV items in LOV pages. That is, there are total "<number_of_attributes> + 2" attributes in this new heap. Along with this heap, we also create a new b-tree index on this heap using attribute(s) as keys. In this way, for any given value, we search this b-tree to quickly locate the applicable block and offset number for the given value's corresponding LOV item.
The first LOV page is reserved for NULL key values and is known as BM_LOV_STARTPAGE.
+----------------+---------------------------------+ | PageHeaderData | linp1 linp2 linp3 ... | +-----------+----+---------------------------------+ | ... linpN | | +-----------+--------------------------------------+ | ^ pd_lower | | | | v pd_upper | +-------------+------------------------------------+ | | BMLOVItemN ... | +-------------+------------------------------------+ | ... BMLOVItem3 BMLOVItem2 BMLOVItem1 | +--------------------------------------------------+ pd_upper/pd_special ^
Bitmap Vector Page
This page layout represents a compressed bitmap vector.
Each bitmap page stores two parts of information: header words and content words. Each bit in the header words is corresponding to a word in the content words. If a bit in the header words is 1, then its corresponding content word is a compressed word. Otherwise, it is a literal word.
If a content word is a fill word, it means that there is a sequence of 0 bits or 1 bits. The most significant bit in this content word represents the bits in this sequence are 0s or 1s. The rest of bits store the value of "the number of bits / BM_WORD_SIZE".
In this block, two data structures are in use: BMBitmapVectorPageData and BMPageOpaqueData. BMBitmapVectorPageData is what contains the two fixed-size arrays: header words and content words; both arrays are sized based on the compile-time Postgres block size definition. BMPageOpaqueData contains information relevant to that page, including the number of words used, the next page in the bitmap, and the last TID set in that page.
+----------------+---------------------------------+ | PageHeaderData | BMBitmapVectorPageData | +----------------+---------------------------------+ | ^ pd_lower | | | | | | | | | | | | v pd_upper | +-------------------------------+------------------+ | ... | BMPageOpaqueData | +-------------------------------+------------------+ ^ pd_special
Source Code Details
Files
As on-disk bitmap indexes are another access method, new files are laid out accordingly:
- src/backend/access/bitmap/
- bitmapattutil.c--Attribute-level List-of-Values (LOV) Handling
- bitmap.c--Top-level access method interface functions
- bitmapinsert.c--Bitmap Insertion
- bitmappages.c--Bitmap-specific Page Handling
- bitmapsearch.c--Bitmap Search/Iteration
- bitmaputil.c--Generic Bitmap Utility Functions/VACUUM-handling
- bitmapxlog.c--WAL-handling
- src/include/access/
- bitmap.h--On-disk Bitmap Access Method Header File
Naming Conventions
Functions
Instead of the underscore-prefix function naming used for the b-tree and hash access methods, bitmap index function naming is similar to the lowerCamelCase used in gin/gist with the prefix bm. Examples are as follows:
- bmInitLOVPage
- bmReadBuffer
- bmWriteAndReleaseBuffer
- bmSetTIDBit
Structures/Typedefs/Variables
C structures and typedef names follow UpperCamelCase naming. Structure members and variables follow lowerCamelCase naming unless a consistent name for that variable is already used within Postgres.
As an example, the structure member for content words would be, "contentWords", not content_words or contentwords.
Commentation
In the spirit of furthering PostgreSQL Source Code Documentation, all comments are doxygen-compatible. From bitmappages.c:
/** * Reads, locks, and returns a given relation's block as a buffer. * * @param[in] reln The relation whose block to read * @param[in] blockNum The block number to read * @param[in] lockMode The buffer content lock type to use * * @return A locked buffer containing the requested block of the requested relation. */ Buffer bmReadBuffer (Relation reln, BlockNumber blockNum, int lockMode) {
Compile-Time Options
Bitmap Index Tracing
To aid in debugging and correctness verification, extensive tracing has been added to the bitmap index access method. If tracing is desired, it must be enabled at compile-time by defining (uncommenting) TRACE_BITMAP_INDEX in src/include/access/bitmap.h. When enabled, all relevant internal bitmap index information will be emitted at message levels DEBUG1-DEBUG5.
An example of tracing is as follows:
DEBUG: bminsert (rel => "bitmapx1", DEBUG: ht_ctid => "(0,1)") DEBUG: bmPerformInsert (rel => "bitmapx1", DEBUG: ht_ctid => "(0,1)") DEBUG: bmReadBuffer (reln => "bitmapx1", DEBUG: blockNum => 0, DEBUG: lockMode => 1) DEBUG: bmOpenLOVHeapAndIndex (metaPage->bm_lov_heapId => 66151, DEBUG: metaPage->bm_lov_indexId => 66154, DEBUG: lockMode => 3) DEBUG: bmInsertTuple (reln => "bitmapx1", DEBUG: metaBuffer => 0, DEBUG: tidBit => 1, DEBUG: ht_ctid => "(0,1)", DEBUG: lovHeap => "pg_bm_66150", DEBUG: lovIndex => "pg_bm_66150_index", DEBUG: use_wal => "TRUE") ... DEBUG: bmSetTIDBit (reln => "bitmapx1", DEBUG: lovBuffer => 1, DEBUG: lovOffset => 2, DEBUG: tidBit => 1, DEBUG: use_wal => "TRUE")
Catalog Changes
- Bitmap indexes have been added as a resource manager for WAL (RM_BITMAP_ID)
- Bitmap indexes have been added to pg_am, pg_amop, pg_proc, and friends.
- pg_bitmapindex, a new namespace for containing database-wide bitmap index-specific data, has been created.