Difference between revisions of "Gsoc08-hashindex"

From PostgreSQL wiki
Jump to: navigation, search
m (Performance)
m
Line 1: Line 1:
 
[[hash_proposal|Proposal Of GSoC]]
 
[[hash_proposal|Proposal Of GSoC]]
 +
== Project Info ==
 +
Improve hash index performance.
 +
Implement the item in the ToDo's list:
 +
* In hash indexes, consider storing the hash value with or instead of the key itself
 
== Ideas ==
 
== Ideas ==
 
There's two basic ideas in the patch.
 
There's two basic ideas in the patch.
Line 9: Line 13:
  
 
== Performance ==
 
== Performance ==
You can see [[hash_test | Details About The Test]] for more information.
+
Here is some results of hash index with patch  v.s. btree index.
 +
You can see [[hash_test | Details About The Test]] for more information about the test.
 
* Platform
 
* Platform
 
Linux = 2.6.24-16-generic
 
Linux = 2.6.24-16-generic
Line 17: Line 22:
 
Memory = 1GB
 
Memory = 1GB
 
* Workload
 
* Workload
2000 equality query
+
Table size - 39916800 unique words, 1529MB
 +
Query - 2000 equality query
 
* Result
 
* Result
 
{| border="1"
 
{| border="1"
Line 37: Line 43:
 
| 5656.045ms
 
| 5656.045ms
 
| 41627
 
| 41627
 +
|}
 +
== Other ==
 +
Hash index without the patch is huge. [http://archives.postgresql.org/pgsql-hackers/2008-07/msg01183.php Here] are some results of index building on a smaller workload.
 +
table Size - 3628800 unique words, 139MB
 +
{| border="1"
 +
| index
 +
| build time
 +
| index size
 +
|-
 +
| btree       
 +
| 51961.123 ms 
 +
| 93MB
 +
|-
 +
| hash without patch     
 +
| 411069.264 ms 
 +
| 2048MB
 +
|-
 +
| hash with patch 
 +
| 36288.931 ms 
 +
| 128MB
 
|}
 
|}

Revision as of 12:43, 17 August 2008

Proposal Of GSoC

Project Info

Improve hash index performance. Implement the item in the ToDo's list:

  • In hash indexes, consider storing the hash value with or instead of the key itself

Ideas

There's two basic ideas in the patch.

  • Store the hash value instead of real key in the bucket.

We can keep more tuples in a bucket and reduce the index size. It also means all hash indexscans become lossy and have to be rechecked at the heap.

  • Keep the contents of each index page ordered by hash value and use binary instead of linear search to find the matching item(s) during an indexscan.

Performance

Here is some results of hash index with patch v.s. btree index. You can see Details About The Test for more information about the test.

  • Platform

Linux = 2.6.24-16-generic

Processor = Intel(R) Core(TM)2Duo CPU T7500@2.20GHz

Memory = 1GB

  • Workload

Table size - 39916800 unique words, 1529MB Query - 2000 equality query

  • Result
index build time index size query time blocks read
hash 504650.100 ms 1024MB 5189.844 ms 39801
btree 844495.867 ms 1027MB 5656.045ms 41627

Other

Hash index without the patch is huge. Here are some results of index building on a smaller workload. table Size - 3628800 unique words, 139MB

index build time index size
btree 51961.123 ms 93MB
hash without patch 411069.264 ms 2048MB
hash with patch 36288.931 ms 128MB