Improve hash index performance. This patch implements the item in the ToDo's list:
- In hash indexes, consider storing the hash value with or instead of the key itself
There's two basic ideas in the patch.
- Store the hash value instead of real key in the bucket.
We can keep more tuples in a bucket and reduce the index size. It also means all hash indexscans become lossy and have to be rechecked at the heap.
- Keep the contents of each index page ordered by hash value and use binary instead of linear search to find the matching item(s) during an indexscan.
Here is some results of hash index with patch vs. btree index. You can see Details About The Test for more information about the test.
Linux = 2.6.24-16-generic
Processor = Intel(R) Core(TM)2Duo CPU T7500@2.20GHz
Memory = 1GB
Table size - 39916800 unique words, 1529MB
Query - 2000 equality query
|index||build time||index size||query time||blocks read|
|hash||504650.100 ms||1024MB||5189.844 ms||39801|
Hash index without the patch is huge. Here are some simple results of index building on a smaller workload.
table Size - 3628800 unique words, 139MB
|index||build time||index size|
|hash without patch||411069.264 ms||2048MB|
|hash with patch||36288.931 ms||128MB|