Gsoc08-hashindex

From PostgreSQL wiki
Jump to navigationJump to search

Proposal Of GSoC

Project Info

Improve hash index performance. This patch implements the item in the ToDo's list:

  • In hash indexes, consider storing the hash value with or instead of the key itself

Ideas

There's two basic ideas in the patch.

  • Store the hash value instead of real key in the bucket.

We can keep more tuples in a bucket and reduce the index size. It also means all hash indexscans become lossy and have to be rechecked at the heap.

  • Keep the contents of each index page ordered by hash value and use binary instead of linear search to find the matching item(s) during an indexscan.

Performance

Here is some results of hash index with patch vs. btree index. You can see Details About The Test for more information about the test.

  • Platform

Linux = 2.6.24-16-generic

Processor = Intel(R) Core(TM)2Duo CPU T7500@2.20GHz

Memory = 1GB

  • Workload

Table size - 39916800 unique words, 1529MB

Query - 2000 equality query

  • Result
index build time index size query time blocks read
hash 504650.100 ms 1024MB 5189.844 ms 39801
btree 844495.867 ms 1027MB 5656.045ms 41627

Other

Hash index without the patch is huge. Here are some simple results of index building on a smaller workload.

  • Workload

table Size - 3628800 unique words, 139MB

  • Result
index build time index size
btree 51961.123 ms 93MB
hash without patch 411069.264 ms 2048MB
hash with patch 36288.931 ms 128MB