Support KNN for SP-GiST GSoC 2014
Intro
As of version 9.4 Postgres' space-partitioned gists lack the support of the k-nearest-neighbour search query though this set of data structures allow rather effective implementation of one. So the goal of the project is to introduce such support.
Outcome
Currently Postgres SP-GiST generalization includes three main datastructures: text trie, k-d tree and quad tree. Trie deals with sets of text strings and the most common use of both k-d and quad trees is the partition of points in the n-dimensional space. In terms of operating with point sets, the query logic is straightforward and allows to retrieve k points from it with the least distance to a given one over the whole set. The usecases of such a request for text datasets are less intuitive but still can prove to be usefull (example of such could be e.g. bioinformatics genome manipulation applications).
Goals
- (main) Implement KNN support for k-d and quad trees
- (additional) Implement KNN support for text tries
- (additional, under concideration) Implement tree-rebuild techniques to guarantee logarithmic complexity over all types of point distribution in a dataset. (as introduced here: [1])
Schedule
Pre- May 22
- dig into codebase to estimate the actual implementation
- discussions with community:
- practicability and usefullness of implementing the tree-rebuild technique
- (in case concidered usefull) - it's usage details - auto\manual, syntax, etc.
- any uncertainities with codebase in case they appear
- any additional goals
- practicability and usefullness of implementing the tree-rebuild technique
May 22 - June 23
- basic implementation of knn for quad tree so that it works inside the pgsql environment
June 23 - June 27
- submitting midterm meta
- discussion of the achived results
June 27 - August 18
- adjust quad tree implementation to act as a basement for the one of k-d tree
- based on quad tree implementation experience, implement knn for k-d tree
- implement knn for text tries (different from the two previous and much easier, given the actual experience of coding in pgsql environment)
- any adjustments that arised
August 18 - August 22
- preparation for final term and meta