GIN generalization

From PostgreSQL wiki

(Difference between revisions)
Jump to: navigation, search
(Created page with "== General == The initial purpose of GIN generalization was to improve in-core fulltext search so that it becomes comparable on search speed with stand-alone solutions like Sphi…")
 
(Part 1: New posting lists/trees storage)
Line 13: Line 13:
  
 
== Part 1: New posting lists/trees storage ==
 
== Part 1: New posting lists/trees storage ==
 +
 +
This patch introduces following:
 +
 +
* Store additional information in posting lists/trees (store arbitrary Datum with each item pointer)
 +
* Corresponding interface changes
 +
* Posting lists/trees compression
 +
 +
'''Question to discuss: how to handle partial match with additional information?'''
  
 
== Part 2: Fast scan ==
 
== Part 2: Fast scan ==

Revision as of 11:05, 25 April 2013

Contents

General

The initial purpose of GIN generalization was to improve in-core fulltext search so that it becomes comparable on search speed with stand-alone solutions like Sphinx, Solr etc. After some investigation it appears that GIN generalization could be also useful in following areas (at least):

  • Similarity search on arrays
  • Positioned n-grams (could provide better acceleration than just n-grams)
  • Search of regular expressions (inverse task: search regular expressions which matches given string)

Prototype

Prototype was demonstrated on PGConf.EU 2012 Full-text search in PostgreSQL in milliseconds (Oleg Bartunov, Alexander Korotkov). General conclusion is that we can even outperform Sphinx in search speed with new GIN.

Prototype is pretty big and tangled. So we decided to split it into several clear patches.

Part 1: New posting lists/trees storage

This patch introduces following:

  • Store additional information in posting lists/trees (store arbitrary Datum with each item pointer)
  • Corresponding interface changes
  • Posting lists/trees compression

Question to discuss: how to handle partial match with additional information?

Part 2: Fast scan

Part 3: Ordering in index

Part 4: TSearch opclass revision

Personal tools