GIN generalization

From PostgreSQL wiki

(Difference between revisions)
Jump to: navigation, search
(Part 1: New posting lists/trees storage)
(Part 2: Fast scan)
Line 23: Line 23:
  
 
== Part 2: Fast scan ==
 
== Part 2: Fast scan ==
 +
 +
This patch addresses problem of "frequest_term && rare_term". It implements algorithm which is able to skip parts of posting tree while scan. It introduces new interface method "preConsistent" which is used for preliminary checks and allows false positives on imput.
  
 
== Part 3: Ordering in index ==
 
== Part 3: Ordering in index ==
  
 
== Part 4: TSearch opclass revision ==
 
== Part 4: TSearch opclass revision ==

Revision as of 11:07, 25 April 2013

Contents

General

The initial purpose of GIN generalization was to improve in-core fulltext search so that it becomes comparable on search speed with stand-alone solutions like Sphinx, Solr etc. After some investigation it appears that GIN generalization could be also useful in following areas (at least):

  • Similarity search on arrays
  • Positioned n-grams (could provide better acceleration than just n-grams)
  • Search of regular expressions (inverse task: search regular expressions which matches given string)

Prototype

Prototype was demonstrated on PGConf.EU 2012 Full-text search in PostgreSQL in milliseconds (Oleg Bartunov, Alexander Korotkov). General conclusion is that we can even outperform Sphinx in search speed with new GIN.

Prototype is pretty big and tangled. So we decided to split it into several clear patches.

Part 1: New posting lists/trees storage

This patch introduces following:

  • Store additional information in posting lists/trees (store arbitrary Datum with each item pointer)
  • Corresponding interface changes
  • Posting lists/trees compression

Question to discuss: how to handle partial match with additional information?

Part 2: Fast scan

This patch addresses problem of "frequest_term && rare_term". It implements algorithm which is able to skip parts of posting tree while scan. It introduces new interface method "preConsistent" which is used for preliminary checks and allows false positives on imput.

Part 3: Ordering in index

Part 4: TSearch opclass revision

Personal tools