Query Parsing

From PostgreSQL wiki
Revision as of 15:19, 25 July 2014 by Robe (talk | contribs) (first version)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Query Parsing

There are quite a few use cases where it'd be of great help if you could access the query tree Postgres uses internally for processing queries.

For example, when you need to do non-trivial operations on queries, like:

  • Filtering out queries containing specific tables
  • Reformatting & pretty-printing query strings
  • Modifying or removing parts of the query string, e.g. rename a column


Approaches

Using Postgres internals

Postgres already offers tooling to create a query tree from a given query string.

raw_parser takes a query statement as input and will return a parse tree. This parse tree can then be pretty-printed by using nodeToString.

An example can be seen here, this can be easily implemented in a Postgres extension as well as seen here.

[..example output..]


This approach isn't without caveats though, since the nodeToString code is used only for printing out planned query trees. These trees are rewritten to contain only contain nodes ultimately needed by the Executor. This means that all nodes not to expected in later stages of the query execution can't be printed, for example INSERT/UPDATE/DELETE - these are all aggregated in a ModifyStmt node.


pg_query

pg_query is a Ruby gem which can generate query trees in JSON representation. It accomplishes this by using a patched Postgres version which one hond adressess the shortcomings outlined above, on the other hand it provides a new function nodeToJSONString which turns a query tree into a JSON representation instead of the native Postgres format.

The Ruby bindings are just glue code around the Postgres code.

[..example output..]

More information can be found in this blog post


pgpool-II

pgpool-II is a middleware for Postgresql which in it's simplest use-case is an connection broker between client and database servers.

To be able to do advanced query routing it needs to understand the queries it gets sent form the query. To be able to do that, they forked the parser and implemented their own node outFuncs to be able to recreated a query string from a given query tree.

If you want to create your own as correct as possible Postgres query pretty-printer this might be a good starting point.


sqlparse

sqlparse is a Python module which implements a non-validating SQL parser suitable for parsing, splitting and formatting SQL statements. It provides the backbone for http://sqlformat.org/ - a SQL pretty printer.

Introspection into the queries is quite limited and since it's not specifically targed at Postgres it's capabilities are quite limited when it comes to drilling down into query trees.


TODO

To make life easier for people using query trees the following things need to be addressed in Postgres core .

  • Implement outFuncs for all node types and not only those seen in a post-planning state.
  • Ship alternate outFuncs suitable for generating JSON trees instead of the native Postgres tree format.
  • Support a placeholder character suitable for mimicking all constant expressions in query strings. This would be of great help when trying to parse normalized queries as produced by pg_stat_statements.