Fast GiST index build GSoC 2011

From PostgreSQL wiki

Revision as of 14:00, 18 May 2012 by Boshomi (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

Idea

Currently GiST index don't have any bulk load functionality. It have to create new index by entry insertion one by one. This makes new index creation relatively slow in comparison with btree and even GIN. There are various works in computer science about bulk operation on R-tree. Since GiST in basic is generalization of R-tree, some of them seems to be applicable to GiST. In [1] R-tree bulk operations techniques are presented. Despite some issues related with GiST distinction from R-tree, techniques of this paper seems to be applicable to GiST.

Details

Brief description of algorithm

Paper [1] present R-tree bulk operations techniques which are, in general, application of buffer tree [2] to R-tree.

The technique of the paper [1] can be very briefly described in following rules.

M = number of index keys fitting in RAM;

B = number of index keys in one page;

Additional buffers of M/(2*B) pages each is attached to all nodes of some levels. Levels are selected with step floor(log(M/4B, B))), leaf nodes don't contain buffers. I.e. nodes in levels i*floor(log(M/4B, B))), i = 1,2,3,... contain buffers (numbering is going from down to up, level 0 contain leaf nodes). When entry reaches node with buffer, it is placed into buffer. When buffer is overflowed it runs down into lower buffers or leaf pages. When split occurs in node with buffer, then this buffers splits into two buffers using penalty function.

Varlena keys

Since R-tree was cosidered in paper [1], key length is fixed. In GiST we can deal with varlena keys that can leads complexities. For example, page split can occurs during placing index entry into appropriate buffers. Another difficulty with varlena keys is that size of buffer and level step of buffers are calculated by the number of entries in page. When using varlena keys this number is not constant.

The first implementation will have very simple approach to deal with varlena keys. The greatest buffer size and minimal level step are achived when key size is minimal. Thereby, minimal key size is worst case. Since minimal varlena size is 4 bytes, we can use it in initial calculations.

Another possible applications

  • Bulk insert. Since access method interfae in PostgreSQL doesn't contain bulk insert function, bulk insert operations are performed by entries insertion one by one. And access method doesn't know how many entries is expected. This circumstance makes application of bulk insert technique difficult for PostgreSQL GiST. With current access method interface bulk insert can be implemented using technique like fastupdate in GIN. But let us note that such technique lead concurrent index searches to scan buffers. Since, buffer size is significant (the size of most lowest buffer is comparable with size of whole subtree), it can lead to significant slowdown. The finding of compromise between index build acceleration and concurrent search slowdown is a separate research, and it doesn't seem to be feasible to fit it in the GSoC.
  • Bulk delete. Current bulk delete operation in PostgreSQL GiST is fast, but it doesn't perform index restructuring. These can lead to worsening of index during bulk delete. The bulk delete technique in the paper [1] focused on acceleration of bulk delete procedure which perform index restructuring. The problem of application of this approach to GiST is that GiST implementation doesn't provide any explicit guarantees about storage utilization. If such guarantees exists then they are hidden in the picksplit access method. Application of bulk delete technique requires GiST access methos to know about storage utilization guarantees of implementation. In turn it require a GiST operator class refactoring, and it doesn't seem to be feasible to fit it in the GSoC.

Shedule

  • until 3 june

First, approximate implementation.

  • 4 june - 6 june

First benchmarking.

  • 7 june - 14 july

Work on index quality problem. Tests on various GiST applications and data distributions. Resolve issues which appear in testing. Code refactioring and comments.

  • 15 july - 31 july

Detail benchmarks on most wide range of datasets. Resolve remaining issues.

  • August 1 - August 16

Final refactoring, testing and commit.

TODO

  • Rerun some tests on lastest version of patch.

Testing results

Operator class Dataset Tuples count Avg. length Split method Build method Actual time CPU time Shared read Search time
point_ops Uniformly distributed random data 100M N/A New linear regular 17 h 39 m 27 m 35568002 1
fastbuild (1/366/on) 3 h 17 m 28 m 2539827 1.12
fastbuild (1/366/auto) 3 h 23 m 29 m 2548365 0.90
Double sorting regular 9 d 10 h 50 m 85705372 0.089
fastbuild (1/366/on) 4 h 9 m 28 m 2729786 0.094
fastbuild (1/366/auto) 4 h 11 m 28 m 2729116 0.089
point_ops Set of some USNOA2 stars (ordered) 100M N/A New linear regular 56 m 40 m 1987537 1
fastbuild (1/366/on) 1 h 22 m 50 m 1759204 0.48
fastbuild (1/366/auto) 1 h 27 m 50 m 1761244 0.46
Double sorting regular 45 m 42 m 1671106 0.37
fastbuild (1/366/on) 1 h 29 m 55 m 1803297 0.34
fastbuild (1/366/auto) 1 h 29 m 55 m 1801702 0.36
point_ops Set of some USNOA2 stars (shuffled) 100M N/A New linear regular 10 h 12 m 23 m 15650402 1
fastbuild (1/366/on) 3 h 20 m 30 m 2514365 1.48
fastbuild (1/366/auto) 3 h 16 m 30 m 2505735 0.89
Double sorting regular 8 d 20 h 50 m 85315498 0.072
fastbuild (1/366/on) 4 h 15 m 28 m 2722891 0.068
fastbuild (1/366/auto) 4 h 20 m 30 m 2725721 0.067

Columns description:

  • Operator class - the operator class used in index
  • Dataset - description of indexed dataset
  • Tuples count - number of indexed tuples
  • Avg. length - average length of value if applicable (used for represent length of strings and arrays)
  • Split method - picksplit implementation used during index build
  • Build method - method of index building: regular or fastbuild. For fastbuild levelstep, buffersize, buffering parameters are given in parentheses.
  • Actual time - actual time of index build on test machine
  • Shared read - number of page reads during index build measured by pg_stat_statement
  • CPU time - CPU usage time during index build
  • Search time - measure of index quality comparison. Defined as average quotient between number of page accesses during scan with fast built index to that of regularly built index on few test cases. The 1 value means that index fast build index is similar in search with regularly built index. If value is greater than 1 then fast built index is slower. If value is less than 1 them fast built index is faster.

Datasets

This section contains detail description of datasets used in section above.

  • Uniformly distributed random data

Data can be generated using following SQL command.

CREATE TABLE points AS (SELECT point(random(), random() FROM generate_series(1,10000000));

Second argument of generate_series may differs in dependence of required tuples count.

For reproducibility reasons dataset can be downloaded from http://89.235.136.212/uniform.txt.gz.

  • Geonames for US

This dataset can be get from http://download.geonames.org/export/dump/.

  • English dictionary

American english dictionary from wamerican package in Ubuntu Linux. Typically placed in /usr/share/dict/american-english.

  • DBLP paper titles

Original XML can be get from http://dblp.uni-trier.de/xml/. Simple PHP script for title extraction is below.

<?php
$fp = fopen("dblp.xml", "r");
while (!feof($fp))
{
	$s = fgets($fp);
	if (preg_match('/<title>(.*)<\\/title>/',$s,$matches))
	{
		$str = $matches[1];
		$str = str_replace(chr(0), ' ', $str);
		echo $str."\n";
	}
}
fclose($fp);
  • Set of some USNOA2 stars (ordered)

This dataset can be get from http://89.235.136.212/usnoa2.txt.gz.

  • Set of some USNOA2 stars (shuffled)

This dataset can be get from http://89.235.136.212/usnoa2_shuffled.txt.gz.

References

1. Lars Arge, Klaus Hinrichs, Jan Vahrenhold, Jeffrey Scott Vitter. Efficient Bulk Operations on Dynamic R-trees. ALENEX '99 Selected papers from the International Workshop on Algorithm Engineering and Experimentation

2. L. Arge. The Buffer Tree: A new technique for optimal I/O-algorithms. In S. G. Akl, F. Dehne, J.-R. Sack, and N. Santoro, editors, Algorithms and Data Structures, 4th International Work-shop, WADS '95, volume 955 of Lecture Notes in Computer Science, pages 334 345. Springer, 1993.

Personal tools