Todo:HooksAndTracePoints

From PostgreSQL wiki
Revision as of 05:16, 6 August 2019 by Ringerc (talk | contribs) (TODO: Hooks, callbacks and trace points)
Jump to: navigation, search

TODO: Hooks, callbacks and trace points

This TODO/wishlist sub-section is intended for all users and developers to edit to add their own thoughts on desired extension points within the core PostgreSQL codebase.

Wishlist

Add the hooks, callbacks, etc you'd like to see added here along with why they'd be useful and any considerations of performance impact etc, categorizing them where it makes sense.

Logical decoding

  • Hooks in reorder buffer management for memory accounting
  • Hooks in reorder buffer on spill to disk for memory accounting
  • Logical decoding output plugin callback to filter events as they are added to the reorder buffer

Definitions with existing examples

C implementations of SQL-callable functions

This is the most common extension point and very well known so I won't go into detail here. Extensions expose a C linkage symbol with the signature Datum funcname(PG_FUNCTION_ARGS) for the function. It uses the PostgreSQL PG_FUNCTION_INFO_V1 macro to define another with metadata about the function. Then registers it in its extension script with:

CREATE FUNCTION ... LANGUAGE 'c'

to expose it to SQL callers.

Pre-defined dlsym extension points

PostgreSQL defines a few function signatures that extensions may (or must) define. Each must expose a specific symbol and accept a specific signature. The most obvious is void _PG_init(void), which PostgreSQL calls when it loads an extension into the postmaster (if `shared_preload_libraries`) or a backend.

We try not to define too many of these as they're an inconvenient interface. The server must dlsym(...) them from the extension after dlopen(...)ing it so they're a bit clumsy.

Try to avoid adding these. It's better to use hooks, callbacks, etc, where possible, and then register them from _PG_init.

Rendezvous variables

Rendezvous variables are a PostgreSQL facility to allow extensions to connect with each other once they're loaded and share functionality. They use the find_rendezvous_variable(...) entrypoint.

Why rendezvous variables?

Extensions are compiled independently from each other. They generally don't want to rely on a specific extension load order and often cannot access the shared library of other extensions at compile-time. So they cannot generally pass other extension libraries as -l arguments to their linker at link-time. If they did it might confuse the other extension as it wouldn't get its _PG_init called at the right point in the extension lifecycle.

Use of extern symbols defined in other extensions will still create unresolved symbols to be resolved at dynamic link time. But extensions' symbols are not visible to the dynamic linker when it's resolving another extension's symbols and you'll get an unresolved symbol error at load-time. That's because PostgreSQL doesn't load extensions with RTLD_GLOBAL (for good reasons).

So the usual "call an extern function and let the dynamic linker sort it out" approach won't work.

Using rendezvous variables

To handle these linkage difficulties PostgreSQL exposes 'rendezvous variables' via the fmgr. See include/fmgr.h:

1 extern void **find_rendezvous_variable(const char *varName);

These let one extension expose a named variable with a void pointer to a struct of extension-defined type. This is usually a struct full of callbacks to serve as an extension C API.

For a core usage example see plpgsql's plugin support in src/pl/plpgsql/src/pl_handler.c.

Hooks

A "hook" is a global variable of pointer-to-function type. PostgreSQL calls the hook function 'instead of' a standard postgres function if the variable is set at the relevant point in execution of some core routine. The hook variable is usually set by extension code to run new code before and/or after existing core code, usually from shared_preload_libraries or session_preload_libraries.

If the hook variable was already set when an extension loads the extension must remember the previous hook value and call it; otherwise it generally calls the original core PostgreSQL routine.

See separate article on entry points for extending PostgreSQL for list of existing hooks.

An example is the ProcessUtility_hook which is used to intercept and wrap, or entirely suppress, utility commands. A utility command is any "non plannable" SQL command, anything other than SELECT/INSERT/UPDATE/DELETE. An real example can be found in contrib/pg_stat_statements/pg_stat_statements.c, but a trivial demo (click to expand) is:

 1 static ProcessUtility_hook_type next_ProcessUtility_hook;
 2 
 3 static void
 4 demo_ProcessUtility_hook(PlannedStmt *pstmt,
 5                                           const char *queryString, ProcessUtilityContext context,
 6                                           ParamListInfo params,
 7                                           QueryEnvironment *queryEnv,
 8                                           DestReceiver *dest, char *completionTag)
 9 {
10   /* Do something silly to show how the hook can work */
11   if (IsA(parsetree, TransactionStmt))
12   {
13     TransactionStmt *stmt = (TransactionStatement)parsetree;
14     if (stmt->kind == TRANS_STMT_PREPARE && !is_superuser())
15         ereport(ERROR,
16                 (errmsg("MyDemoExtension prohibits non-superusers from using PREPARE TRANSACTION")));
17   }
18 
19   /* Call next hook if registered, or original postgres stmt */
20   if (next_ProcessUtility_hook)
21     next_ProcessUtility_hook(pstmt, queryString, context, params, queryEnv, dest, completionTag);
22   else
23     standard_ProcessUtility_hook(pstmt, queryString, context, params, queryEnv, dest, completionTag);
24 
25   if (completionTag)
26     ereport(LOG,
27             (errmsg("MyDemoExtension allowed utility statement %s to run", completionTag)));
28 }
29 
30 void
31 _PG_init(void)
32 {
33   next_ProcessUtility_hook = ProcessUtility_hook;
34   ProcessUtility_hook = demo_ProcessUtility_hook;
35 }

Existing hooks

To list all hooks that follow the convention of `HookName_hook_type HookName_hook` and are exposed as public API, run

git grep "PGDLLIMPORT .*_hook_type" src/include/

At time of writing these hooks were:


 1 src/include/catalog/objectaccess.h:extern PGDLLIMPORT object_access_hook_type object_access_hook;
 2 src/include/commands/explain.h:extern PGDLLIMPORT ExplainOneQuery_hook_type ExplainOneQuery_hook;
 3 src/include/commands/explain.h:extern PGDLLIMPORT explain_get_index_name_hook_type explain_get_index_name_hook;
 4 src/include/commands/user.h:extern PGDLLIMPORT check_password_hook_type check_password_hook;
 5 src/include/executor/executor.h:extern PGDLLIMPORT ExecutorStart_hook_type ExecutorStart_hook;
 6 src/include/executor/executor.h:extern PGDLLIMPORT ExecutorRun_hook_type ExecutorRun_hook;
 7 src/include/executor/executor.h:extern PGDLLIMPORT ExecutorFinish_hook_type ExecutorFinish_hook;
 8 src/include/executor/executor.h:extern PGDLLIMPORT ExecutorEnd_hook_type ExecutorEnd_hook;
 9 src/include/executor/executor.h:extern PGDLLIMPORT ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook;
10 src/include/fmgr.h:extern PGDLLIMPORT needs_fmgr_hook_type needs_fmgr_hook;
11 src/include/fmgr.h:extern PGDLLIMPORT fmgr_hook_type fmgr_hook;
12 src/include/libpq/auth.h:extern PGDLLIMPORT ClientAuthentication_hook_type ClientAuthentication_hook;
13 src/include/optimizer/paths.h:extern PGDLLIMPORT set_rel_pathlist_hook_type set_rel_pathlist_hook;
14 src/include/optimizer/paths.h:extern PGDLLIMPORT set_join_pathlist_hook_type set_join_pathlist_hook;
15 src/include/optimizer/paths.h:extern PGDLLIMPORT join_search_hook_type join_search_hook;
16 src/include/optimizer/plancat.h:extern PGDLLIMPORT get_relation_info_hook_type get_relation_info_hook;
17 src/include/optimizer/planner.h:extern PGDLLIMPORT planner_hook_type planner_hook;
18 src/include/optimizer/planner.h:extern PGDLLIMPORT create_upper_paths_hook_type create_upper_paths_hook;
19 src/include/parser/analyze.h:extern PGDLLIMPORT post_parse_analyze_hook_type post_parse_analyze_hook;
20 src/include/rewrite/rowsecurity.h:extern PGDLLIMPORT row_security_policy_hook_type row_security_policy_hook_permissive;
21 src/include/rewrite/rowsecurity.h:extern PGDLLIMPORT row_security_policy_hook_type row_security_policy_hook_restrictive;
22 src/include/storage/ipc.h:extern PGDLLIMPORT shmem_startup_hook_type shmem_startup_hook;
23 src/include/tcop/utility.h:extern PGDLLIMPORT ProcessUtility_hook_type ProcessUtility_hook;
24 src/include/utils/elog.h:extern PGDLLIMPORT emit_log_hook_type emit_log_hook;
25 src/include/utils/lsyscache.h:extern PGDLLIMPORT get_attavgwidth_hook_type get_attavgwidth_hook;
26 src/include/utils/selfuncs.h:extern PGDLLIMPORT get_relation_stats_hook_type get_relation_stats_hook;
27 src/include/utils/selfuncs.h:extern PGDLLIMPORT get_index_stats_hook_type get_index_stats_hook;

Callbacks

PostgreSQL accepts callback functions in a wide variety of places. Function pointers can be passed to individual postgres API functions for immediate use or to store in created objects for later invocation. They're distinct from hooks mainly in that they're scoped to some object or function call, not chained off a global variable.

For example, extension-defined GUCs can register hooks that're called before and after the GUC value is changed. See include/utils/guc.h:

 1 /*...*/
 2 typedef bool (*GucStringCheckHook) (char **newval, void **extra, GucSource source);
 3 /*...*/
 4 typedef void (*GucStringAssignHook) (const char *newval, void *extra);
 5 /*...*/
 6 extern void DefineCustomStringVariable(const char *name,
 7                                        /*...*/
 8                                        GucStringCheckHook check_hook,
 9                                        GucStringAssignHook assign_hook,
10                                        GucShowHook show_hook);

Another example is MemoryContext callbacks, where a callback can be registered to perform destructor-like actions via MemoryContextRegisterResetCallback(...).

Abstract interfaces with function pointer implementations

In many places PostgreSQL follows the pseudo-OO C convention of defining an interface as a struct of function pointers, then calling methods of the interface via the function pointers.

Some other extension point is generally used to register these, such as a dlsym'd function, a callback, etc.

One of many examples is the logical decoding interface. PostgreSQL calls:

1 void _PG_output_plugin_init(OutputPluginCallbacks *cb)

when loading an extension library as an output plugin. This assigns extension-defined function pointers to members of the passed OutputPluginCallbacks struct, e.g.

 1 void
 2 _PG_output_plugin_init(OutputPluginCallbacks *cb)
 3 {
 4     AssertVariableIsOfType(&_PG_output_plugin_init, LogicalOutputPluginInit);
 5 
 6     cb->startup_cb = pg_decode_startup;
 7     cb->begin_cb = pg_decode_begin_txn;
 8     cb->change_cb = pg_decode_change;
 9     cb->truncate_cb = pg_decode_truncate;
10     cb->commit_cb = pg_decode_commit_txn;
11     cb->filter_by_origin_cb = pg_decode_filter;
12     cb->shutdown_cb = pg_decode_shutdown;
13     cb->message_cb = pg_decode_message;
14 }

... each of which conforms to a specific signature and is invoked at specific points in execution.

Many of these share a common state structure defined in PostgreSQL's headers and passed to each callback in the interface. For logical decoding that's LogicalDecodingContext from include/replication/logical.h.

To allow extensions to track their own state PostgreSQL usually defines a void* private data member in such state structures, e.g. output_plugin_private in LogicalDecodingContext.

See contrib/test_decoding/test_decoding.c for example usage.

Defining various server objects from extensions

Extensions can create all sorts of server objects. GUCs (configuration variables) are one of many such examples along with all the usual SQL-visible stuff implemented with SQL-callable C functions like index access methods.

TODO: list them?

Wishlist for other extension point types

There are other sorts of functionality in PostgreSQL that are not presently extensible at all. Some of these would be wonderful to be able to extend.

Wait Event types

Extensions have access to the PG_WAIT_EXTENSION WaitEvent type, but have no ability to define their own finer grained wait events. This limits how well complex extensions can be traced and monitored via pg_stat_activity and other wait-event aware interfaces.

Heavyweight lock types and tags

Being able to extend PostgreSQL's heavyweight locks with new lock types would be immensely useful for distributed and clustered applications. They often have to re-implement significant parts of the lock manager, and their own locks aren't then visible to the core deadlock detector etc.

TODO: set out example for how it might work

Parser syntax extension points

Mechanisms to allow the parser to be extended with addin-defined syntax are requested semi-regularly on the -hackers list. This is a much harder problem than it looks though, especially with PostgreSQL's flex and bison based LALR(1) parser, which is implemented using C code generation at compile time and compiled along with the rest of the server, then statically linked to the server executable.

Some more targeted extension points are probably possible in places where the syntax can be well-bounded. For example, it might be practical to allow extensions to register new elements in WITH(...) lists such as in COPY ... WITH (FORMAT CSV, ...).

Add your proposed points and use cases here.

DTrace/Perf/SystemTAP/etc statically defined trace events (SDTs)

PostgreSQL accepts the configure option --enable-dtrace to generate DTrace-compatible statically defined tracepoint events . Usually this uses systemtap on Linux.

Events are defined as markers in the source code as TRACE_POSTGRESQL_EVENTNAME(...) function-like macros, which are no-ops unless trace events generation are enabled.

These events can be used by trace-event aware utilities including perf (Linux), ebpf-tools (Linux), systemtap (Linux), DTrace (Solaris/FreeBSD), etc to observe PostgreSQL's behaviour non-invasively. (They can also be used by gdb).

The PostgreSQL implementation translates src/backend/utils/probes.d to a C header src/backend/utils/probes.h that defines TRACE_POSTGRESQL_ events as wrappers for DTRACE_PROBE macros, which in turn are defined by /usr/include/sys/sdt.h as wrappers for _STAP_PROBE . That injects some asm placeholders that're used by tracing systems.

At present PostgreSQL extensions don't have any way to use PostgreSQL's own tracepoint generation to add their own tracepoints in extension code.

Extensions may duplicate the same build logic and define their own providers though.