PostgresServerExtensionPoints

From PostgreSQL wiki
Jump to navigationJump to search

PostgreSQL server extension points

PostgreSQL is a very extensible and pluggable engine. This article seeks to list, categorize and explain the various ways the server can be extended.

It covers mainly extension points that are less well documented in the existing official documentation - accordingly it's mainly focused on the extension points usable by 'C language extensions'.

Most people know about the SQL-level customisation opportunities like custom aggregates so they won't be given much attention here.

Core docs

It's assumed that you've already read the core documentation and are thoroughly familiar with most of it, especially the "Extending SQL" section. Make sure you have reviewed all these chapters:

SQL-level extensibility

PostgreSQL offers tons of scope for extension without the need to write or compile C code, including:

  • User-defined functions in multiple different languages
  • User-defined operators
  • User-defined aggregates
  • User-defined composite types and domains
  • User-defined index access methods
  • User-defined data types (except type input and output functions)

Most of this is very well documented and won't be covered in detail here.

C-level extensibility

Can't do it from SQL? Read on.

C Extensions (plugins)

A PostgreSQL extension can just be a SQL script with a control file. But for the purposes of this document the extensions of interest are those written in (usually) C. They're compiled to loadable loadable modules - a regular shared library with some PostgreSQL metadata and some conventions for symbols that must have specific type signatures and behaviour if exposed.

C extensions can use almost all the same API as core PostgreSQL code.

See PG_MODULE_MAGIC(), PGXS, C language functions, etc.

C implementations of SQL-callable functions

This is the most common extension point and very well known so I won't go into detail here. Extensions expose a C linkage symbol with the signature Datum funcname(PG_FUNCTION_ARGS) for the function. It uses the PostgreSQL PG_FUNCTION_INFO_V1 macro to define another with metadata about the function. Then registers it in its extension script with:

CREATE FUNCTION ... LANGUAGE 'c'

to expose it to SQL callers.

Pre-defined dlsym extension points

PostgreSQL defines a few function signatures that extensions may (or must) define. Each must expose a specific symbol and accept a specific signature. The most obvious is void _PG_init(void), which PostgreSQL calls when it loads an extension into the postmaster (if `shared_preload_libraries`) or a backend.

We try not to define too many of these as they're an inconvenient interface. The server must dlsym(...) them from the extension after dlopen(...)ing it so they're a bit clumsy.

Try to avoid adding these. It's better to use hooks, callbacks, etc, where possible, and then register them from _PG_init.

Rendezvous variables

Rendezvous variables are a PostgreSQL facility to allow extensions to connect with each other once they're loaded and share functionality. They use the find_rendezvous_variable(...) entrypoint.

Why rendezvous variables?

Extensions are compiled independently from each other. They generally don't want to rely on a specific extension load order and often cannot access the shared library of other extensions at compile-time. So they cannot generally pass other extension libraries as -l arguments to their linker at link-time. If they did it might confuse the other extension as it wouldn't get its _PG_init called at the right point in the extension lifecycle.

Use of extern symbols defined in other extensions will still create unresolved symbols to be resolved at dynamic link time. But extensions' symbols are not visible to the dynamic linker when it's resolving another extension's symbols and you'll get an unresolved symbol error at load-time. That's because PostgreSQL doesn't load extensions with RTLD_GLOBAL (for good reasons).

So the usual "call an extern function and let the dynamic linker sort it out" approach won't work.

Using rendezvous variables

To handle these linkage difficulties PostgreSQL exposes 'rendezvous variables' via the fmgr. See include/fmgr.h:

1extern void **find_rendezvous_variable(const char *varName);

These let one extension expose a named variable with a void pointer to a struct of extension-defined type. This is usually a struct full of callbacks to serve as an extension C API.

For a core usage example see plpgsql's plugin support in src/pl/plpgsql/src/pl_handler.c.

Hooks

A "hook" is a global variable of pointer-to-function type. PostgreSQL calls the hook function 'instead of' a standard postgres function if the variable is set at the relevant point in execution of some core routine. The hook variable is usually set by extension code to run new code before and/or after existing core code, usually from shared_preload_libraries or session_preload_libraries.

If the hook variable was already set when an extension loads the extension must remember the previous hook value and call it; otherwise it generally calls the original core PostgreSQL routine.

See separate article on entry points for extending PostgreSQL for list of existing hooks.

An example is the ProcessUtility_hook which is used to intercept and wrap, or entirely suppress, utility commands. A utility command is any "non plannable" SQL command, anything other than SELECT/INSERT/UPDATE/DELETE. An real example can be found in contrib/pg_stat_statements/pg_stat_statements.c, but a trivial demo (click to expand) is:

 1static ProcessUtility_hook_type next_ProcessUtility_hook;
 2
 3static void
 4demo_ProcessUtility_hook(PlannedStmt *pstmt,
 5                                          const char *queryString, ProcessUtilityContext context,
 6                                          ParamListInfo params,
 7                                          QueryEnvironment *queryEnv,
 8                                          DestReceiver *dest, char *completionTag)
 9{
10  /* Do something silly to show how the hook can work */
11  if (IsA(parsetree, TransactionStmt))
12  {
13    TransactionStmt *stmt = (TransactionStatement)parsetree;
14    if (stmt->kind == TRANS_STMT_PREPARE && !is_superuser())
15        ereport(ERROR,
16                (errmsg("MyDemoExtension prohibits non-superusers from using PREPARE TRANSACTION")));
17  }
18
19  /* Call next hook if registered, or original postgres stmt */
20  if (next_ProcessUtility_hook)
21    next_ProcessUtility_hook(pstmt, queryString, context, params, queryEnv, dest, completionTag);
22  else
23    standard_ProcessUtility_hook(pstmt, queryString, context, params, queryEnv, dest, completionTag);
24
25  if (completionTag)
26    ereport(LOG,
27            (errmsg("MyDemoExtension allowed utility statement %s to run", completionTag)));
28}
29
30void
31_PG_init(void)
32{
33  next_ProcessUtility_hook = ProcessUtility_hook;
34  ProcessUtility_hook = demo_ProcessUtility_hook;
35}

Existing hooks

To list all hooks that follow the convention of `HookName_hook_type HookName_hook` and are exposed as public API, run

git grep "PGDLLIMPORT .*_hook_type" src/include/

At time of writing these hooks were:


 1src/include/catalog/objectaccess.h:extern PGDLLIMPORT object_access_hook_type object_access_hook;
 2src/include/commands/explain.h:extern PGDLLIMPORT ExplainOneQuery_hook_type ExplainOneQuery_hook;
 3src/include/commands/explain.h:extern PGDLLIMPORT explain_get_index_name_hook_type explain_get_index_name_hook;
 4src/include/commands/user.h:extern PGDLLIMPORT check_password_hook_type check_password_hook;
 5src/include/executor/executor.h:extern PGDLLIMPORT ExecutorStart_hook_type ExecutorStart_hook;
 6src/include/executor/executor.h:extern PGDLLIMPORT ExecutorRun_hook_type ExecutorRun_hook;
 7src/include/executor/executor.h:extern PGDLLIMPORT ExecutorFinish_hook_type ExecutorFinish_hook;
 8src/include/executor/executor.h:extern PGDLLIMPORT ExecutorEnd_hook_type ExecutorEnd_hook;
 9src/include/executor/executor.h:extern PGDLLIMPORT ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook;
10src/include/fmgr.h:extern PGDLLIMPORT needs_fmgr_hook_type needs_fmgr_hook;
11src/include/fmgr.h:extern PGDLLIMPORT fmgr_hook_type fmgr_hook;
12src/include/libpq/auth.h:extern PGDLLIMPORT ClientAuthentication_hook_type ClientAuthentication_hook;
13src/include/optimizer/paths.h:extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook;
14src/include/optimizer/paths.h:extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook;
15src/include/optimizer/paths.h:extern PGDLLIMPORT join_search_hook_type join_search_hook;
16src/include/optimizer/plancat.h:extern PGDLLIMPORT get_relation_info_hook_type get_relation_info_hook;
17src/include/optimizer/planner.h:extern PGDLLIMPORT planner_hook_type planner_hook;
18src/include/optimizer/planner.h:extern PGDLLIMPORT create_upper_paths_hook_type create_upper_paths_hook;
19src/include/parser/analyze.h:extern PGDLLIMPORT post_parse_analyze_hook_type post_parse_analyze_hook;
20src/include/rewrite/rowsecurity.h:extern PGDLLIMPORT row_security_policy_hook_type row_security_policy_hook_permissive;
21src/include/rewrite/rowsecurity.h:extern PGDLLIMPORT row_security_policy_hook_type row_security_policy_hook_restrictive;
22src/include/storage/ipc.h:extern PGDLLIMPORT shmem_startup_hook_type shmem_startup_hook;
23src/include/tcop/utility.h:extern PGDLLIMPORT ProcessUtility_hook_type ProcessUtility_hook;
24src/include/utils/elog.h:extern PGDLLIMPORT emit_log_hook_type emit_log_hook;
25src/include/utils/lsyscache.h:extern PGDLLIMPORT get_attavgwidth_hook_type get_attavgwidth_hook;
26src/include/utils/selfuncs.h:extern PGDLLIMPORT get_relation_stats_hook_type get_relation_stats_hook;
27src/include/utils/selfuncs.h:extern PGDLLIMPORT get_index_stats_hook_type get_index_stats_hook;

Callbacks

PostgreSQL accepts callback functions in a wide variety of places. Function pointers can be passed to individual postgres API functions for immediate use or to store in created objects for later invocation. They're distinct from hooks mainly in that they're scoped to some object or function call, not chained off a global variable.

For example, extension-defined GUCs can register hooks that're called before and after the GUC value is changed. See include/utils/guc.h:

 1/*...*/
 2typedef bool (*GucStringCheckHook) (char **newval, void **extra, GucSource source);
 3/*...*/
 4typedef void (*GucStringAssignHook) (const char *newval, void *extra);
 5/*...*/
 6extern void DefineCustomStringVariable(const char *name,
 7                                       /*...*/
 8                                       GucStringCheckHook check_hook,
 9                                       GucStringAssignHook assign_hook,
10                                       GucShowHook show_hook);

Another example is MemoryContext callbacks, where a callback can be registered to perform destructor-like actions via MemoryContextRegisterResetCallback(...).

Existing callbacks

Lifecycle callbacks

Extensions can use postmaster and backend lifecycle callbacks including

  • before_shmem_exit
  • on_proc_exit
  • on_shmem_exit

There are also transaction lifecycle callbacks:

  • RegisterXactCallback

Cache invalidation callbacks:

  • CacheRegisterRelcacheCallback
  • CacheRegisterSyscacheCallback

and many many more.

Most of these work more like overrideable hooks in that they're generally part of the process-wide state.

errcontext callbacks

Extensions can define their own errcontext callbacks. When log messages (elog or ereport) are prepared these errcontext callbacks are called to annotate the error message by appending to the CONTEXT field.

errcontext callbacks generally follow the call-stack, with new entries pushed onto the errcontext callback stack on entry to a function and popped on exit. The errcontext stack is automatically unwound by PostgreSQL's exception handling macros PG_TRY() and PG_CATCH() so there is no need for a PG_CATCH() to restore the errcontext stack and PG_RE_THROW().

See existing usage in core for examples.

Warning: failing to pop an errcontext callback can have very confusing results as the context pointer will point to stack that has since been re-used so it will attempt to treat some unpredictable value as a function pointer for the errcontext callback. See this blog post for details.


Abstract interfaces with function pointer implementations

In many places PostgreSQL follows the pseudo-OO C convention of defining an interface as a struct of function pointers, then calling methods of the interface via the function pointers.

Some other extension point is generally used to register these, such as a dlsym'd function, a callback, etc.

One of many examples is the logical decoding interface. PostgreSQL calls:

1void _PG_output_plugin_init(OutputPluginCallbacks *cb)

when loading an extension library as an output plugin. This assigns extension-defined function pointers to members of the passed OutputPluginCallbacks struct, e.g.

 1void
 2_PG_output_plugin_init(OutputPluginCallbacks *cb)
 3{
 4    AssertVariableIsOfType(&_PG_output_plugin_init, LogicalOutputPluginInit);
 5
 6    cb->startup_cb = pg_decode_startup;
 7    cb->begin_cb = pg_decode_begin_txn;
 8    cb->change_cb = pg_decode_change;
 9    cb->truncate_cb = pg_decode_truncate;
10    cb->commit_cb = pg_decode_commit_txn;
11    cb->filter_by_origin_cb = pg_decode_filter;
12    cb->shutdown_cb = pg_decode_shutdown;
13    cb->message_cb = pg_decode_message;
14}

... each of which conforms to a specific signature and is invoked at specific points in execution.

Many of these share a common state structure defined in PostgreSQL's headers and passed to each callback in the interface. For logical decoding that's LogicalDecodingContext from include/replication/logical.h.

To allow extensions to track their own state PostgreSQL usually defines a void* private data member in such state structures, e.g. output_plugin_private in LogicalDecodingContext.

See contrib/test_decoding/test_decoding.c for example usage.

Extension of shared memory and IPC primitives

Extensions may use a wide variety of core features relating to shared memory, registering their own:

  • shared memory segments - RequestAddinShmemSpace, shmem_startup_hook and ShmemInitStruct in storage/shmem.h
  • lightweight lock tranches (LWLock) - LWLockRegisterTranche etc in storage/lwlock.h
  • latches - storage/latch.h
  • dynamic shared memory (DSM) - storage/dsm.h
  • dynamic shared memory areas (DSA) - utils/dsa.h
  • shared-memory queues (shm_mq) - storage/shm_mq.h
  • condition variables - storage/condition_variable.h

Extensions may use PostgreSQL's process latches too; most of the time they can just use their own &MyProc->procLatch or set another backend's latch from its PGPROC entry.

Background workers (bgworkers)

Extensions may register new PostgreSQL backends that exist independently of any client connection.

A bgworker runs as a child of the postmaster. It usually has full access to a particular database as if it was a user backend started on that DB. bgworkers control their own signal handling, run their own event loop, can do their own file and socket I/O, link to or dynamically load arbitrary C libraries, etc etc. They have broad access to most PostgreSQL internal facilities - they can do low level table manipulation with genam or heapam, they can use the SPI, they can start other bgworkers, etc.

There are two kinds of bgworker, static and dynamic. Static workers can only be registered at _PG_init time in shared_preload_libraries. Dynamic workers can be launched at any time *after* startup completes. New code usually uses dynamic workers launched from a hook on

Considerable care is needed to get background worker implementations correct. At time of writing they do not have any way to use

Logical decoding output plugins

The walsender and a related SQL-callable set of functions has support for plugins that interpret pre-processed WAL and transform it. This is used for logical replication amongst other things. See the documentation on logical decoding.

Defining various server objects from extensions

Extensions can create all sorts of server objects. GUCs (configuration variables) are one of many such examples along with all the usual SQL-visible stuff implemented with SQL-callable C functions like index access methods.

A non-exhaustive list includes:

SQL-callable C functions

Data types

Security label providers

Generic WAL (generic xlog)

Generic WAL is a PostgreSQL feature that lets extensions create and use relations with pages in an extension-defined format.

The extension writes custom WAL records with extension-defined payloads. PostgreSQL applies the WAL in a crash-safe, consistent manner on the master and any physical replicas. The extension may then read pages from the relation for whatever purpose it needs.

See generic_xlog.h and generic_xlog.c.

Note that 'extensions may not register redo callbacks for generic WAL' so they cannot run their own code during crash-recovery or replica WAL replay. Extensions may only read the relation's pages once the changes are applied.

See contrib/bloom.c for an index implementation built on top of generic WAL.

There is not currently any logical decoding support for generic WAL records. They cannot be reorder-buffered and there is no output plugin callback that accepts them.

Logical WAL messages

Logical WAL messages provide a WAL-consistent, crash-safe and optionally-transactional one-way communication channel from upstream bgworkers and user backends/functions to downstream receivers of logical decoding output plugin data streams.

Extensions may write "logical WAL messages" with a label string and an arbitrary extension-defined payload to WAL. The label is used to allow extensions to identify their own messages and ignore messages from other extensions. These logical WAL messages are passed to a message handler callback on all logical decoding output plugins that implement the handler. The output plugin is expected to know which messages it is interested in and ignore the rest. The output plugin may use the message content to change plugin state internally and/or write a message in a plugin-defined format to its client output stream.

There are two message types. Transactional messages are reorder-buffered and decoded as part of a transaction. Non-transactional messages are not reorder-buffered; instead the output plugin's message handler callback is invoked as soon as the message is decoded from WAL.

See replication/message.h and the message_cb callback in struct OutputPluginCallbacks in replication/output_plugin.h.

Logical WAL messages are treated as no-ops during crash recovery redo and physical replica replay. They have no effect on the heap and there are no callbacks or hooks that can handle them at redo time. They're ignored by everything except logical decoding.