Gsoc08-hashindex
From PostgreSQL wiki
Jump to navigationJump to searchProject Info
Improve hash index performance. This patch implements the item in the ToDo's list:
- In hash indexes, consider storing the hash value with or instead of the key itself
Ideas
There's two basic ideas in the patch.
- Store the hash value instead of real key in the bucket.
We can keep more tuples in a bucket and reduce the index size. It also means all hash indexscans become lossy and have to be rechecked at the heap.
- Keep the contents of each index page ordered by hash value and use binary instead of linear search to find the matching item(s) during an indexscan.
Performance
Here is some results of hash index with patch vs. btree index. You can see Details About The Test for more information about the test.
- Platform
Linux = 2.6.24-16-generic
Processor = Intel(R) Core(TM)2Duo CPU T7500@2.20GHz
Memory = 1GB
- Workload
Table size - 39916800 unique words, 1529MB
Query - 2000 equality query
- Result
index | build time | index size | query time | blocks read |
hash | 504650.100 ms | 1024MB | 5189.844 ms | 39801 |
btree | 844495.867 ms | 1027MB | 5656.045ms | 41627 |
Other
Hash index without the patch is huge. Here are some simple results of index building on a smaller workload.
- Workload
table Size - 3628800 unique words, 139MB
- Result
index | build time | index size |
btree | 51961.123 ms | 93MB |
hash without patch | 411069.264 ms | 2048MB |
hash with patch | 36288.931 ms | 128MB |