Pgcon2014unconferenceVODKAandJSQuery

From PostgreSQL wiki
Jump to navigationJump to search

Leaders: Alexander Korotkov, Teodor Sigaev

Updating item pointers in an index file

(Korotkov) Vodka maintains two index files, one for keys and one for heap item pointers (postings). Each index file can be, in effect, managed by a different access method. This is a generalization of GIN, which uses btree (albeit its own btree code, not src/backend/access/nbtree) for both keys and postings. The keys index file has item pointers that refer to the postings index file. When the postings file's access method needs to move an index item, it must arrange to update any item pointer in the keys file. To that end, proposing a new "amupdateiptr" access method function. Calls to that function, when necessary, will arrive right after an amgettuple call.

(Heikki Linnakangas) Vodka separates one GIN idea (extract keys from input values) from an independent GIN idea (store heap item pointers in some way other than flatly, one per key). The latter idea is helpful by itself when you have many duplicate keys. Instead of storing the key many times, store it once and point to a posting tree/list. (Korotkov) Indeed, the motivating application for Vodka was use of a GiST-managed R-tree for postings.

(Sigaev) When the access method of one index file calls amupdateiptr for another index file, the first access method is holding a buffer lock across that call. It would be better to avoid that, but how?

Tracking the two files of a Vodka index

(Korotkov) How should we track the existence/identity of the two index files? (Linnakangas) Let the access method of the keys file be in control. If the keys are in a btree, btree is responsible for tracking any posting structure. (Tom Lane) Posting tree representations are not a datatype-independence problem; the data is always item pointers. There's a fixed number of possible storage strategies, so allowing arbitrary access methods and operator classes for postings is needlessly general. Put a version number on disk in case we think of additional methods later, though. (Oleg Bartunov) We chose that approach for easy experimentation with different posting representations.

Getting OID of opclass/operator

(Korotkov) Vodka operator classes declare a "config" function that returns an opclass OID and an operator OID. Implementing the config function is awkward, because non-core objects do not have stable OIDs. It's not so simple as doing a name lookup like a reg* cast would do. Might need a new syscache.

JSQuery

(Sigaev) Proposing query language for searching in jsonb; see slides. An implementation of the proposal exists, but statistics and planner support aren't done. (Andrew Dunstan, unknown speakers) Why is the query language itself not in json? Why not use an already-standardized language, such as JPath? MongoDB has a query language. (Dunstan, Bartunov) MongoDB query language is too verbose. (unknown speaker) People migrating from MongoDB would value compatibility. (Dunstan) Especially if the idea is to use this for something resembling schema validation, it would be better to follow an established standard.