Support KNN for SP-GiST GSoC 2014

From PostgreSQL wiki
Jump to navigationJump to search

Intro

As of version 9.4 Postgres' space-partitioned gists lack the support of the k-nearest-neighbour search query though this set of data structures allow rather effective implementation of one. So the goal of the project is to introduce such support.

Outcome

Currently Postgres SP-GiST generalization includes three main datastructures: text trie, k-d tree and quad tree. Trie deals with sets of text strings and the most common use of both k-d and quad trees is the partition of points in the n-dimensional space. In terms of operating with point sets, the query logic is straightforward and allows to retrieve k points from it with the least distance to a given one over the whole set. The usecases of such a request for text datasets are less intuitive but still can prove to be usefull (example of such could be e.g. bioinformatics genome manipulation applications).

Goals

  • (main) Implement KNN support for k-d and quad trees
  • (additional) Implement KNN support for text tries
  • (additional, under concideration) Implement tree-rebuild techniques to guarantee logarithmic complexity over all types of point distribution in a dataset. (as introduced here: [1])

Schedule

Pre- May 22

  • dig into codebase to estimate the actual implementation
  • discussions with community:
    • practicability and usefullness of implementing the tree-rebuild technique
      • (in case concidered usefull) - it's usage details - auto\manual, syntax, etc.
    • any uncertainities with codebase in case they appear
    • any additional goals

May 22 - June 23

  • basic implementation of knn for quad tree so that it works inside the pgsql environment

June 23 - June 27

  • submitting midterm meta
  • discussion of the achived results

June 27 - August 18

  • adjust quad tree implementation to act as a basement for the one of k-d tree
  • based on quad tree implementation experience, implement knn for k-d tree
  • implement knn for text tries (different from the two previous and much easier, given the actual experience of coding in pgsql environment)
  • any adjustments that arised

August 18 - August 22

  • preparation for final term and meta