ExtensionPackaging

See [[1]]

PostgreSQL Extensions

Here's the first round of User Design about PostgreSQL Extensions. I tried to put together the ideas expressed by a lot of different people. The aim here is to first agree on the naming and the goals, then talk about what user design we propose.

name

The contenders are extension, module, bundle and package. Winner is extension.

The module is something else in the SQL standard, a bundle is an ok choice, a package would certainly make people think we're Oracle compatible (and we don't want to have Ada like skeleton and bodies), and extension is what PGXS is make for and what we -you-name-it- authors made.

v1.0 goals

We're not trying to be feature complete on first round.

must have

- dump & restore support (when upgrading a cluster or just restoring)

- easy install and uninstall

- support for home grown SQL/PLpgSQL only extensions in order to make life easier for in-house PG based development (you don't have to code in C to benefit from extensions)

- support for "basic" modules, providing a type and its operators and indexing support, such as ip4r, hstore, temporal, prefix and many others, you name it, of even simpler things like preprepare or backports/min_update.

- support for procedural languages (a priori easily covered within basic modules but I'm not sure) like plproxy, pllolcode, pllua, plscheme, plsh et al.

- support for all what you find in contrib/ for 8.4 (covered already?)

would be great (target later commit fest)

- versioning support with upgrade in place facility (hooks?)

- supporting more than one version of the same module installed in the same time, possibly (I suppose always but...) in different schemas

- custom variables?

- PostGIS complete support, with user data dependancy, even if an extensible typmod system would certainly solve this problem in a better place. Maybe someone will come up with another existing extension sharing the problem and not the typmod solution?

- a core team approved list of extensions (replacing contribs, maybe adding to it), where approved means code has been reviewed and the only reason why it's not in the core itself is that core team feels that it's not part of a RDBMS per-se, or feel like the code should be maintained and released separately until it gets some more field exposure... (think plproxy).

later please

- CPAN or ports like infrastructure for auto downloading a more or less prepared "bundle", place it at the right place on the filesystem and install it in the database(s) of choice

- complex support for ad-hoc bootstrap of uncommon modules such as pljava

- dependancy graph solving and automatic installation, with depends, recommends and suggest sections and with rules/setup to choose what to pull in by default...

dump & restore

We want pg_dump to issue only one line per extension, the one installing the extension in the database, see syntax.

SYNTAX

Extensions will need metadata, and after reading several proposals, what I propose here is to have a first explicit step to register the extension name and metadata, then have "basic" tools to play with it.

creating extensions (authoring)

The 'foo' extension author is meant to provide a +foo.sql+ file containing this:

 create extension foo 
   with version 1.0
        install [script] 'foo.install.sql'
        uninstall [script] 'foo.uninstall.sql'
        upgrade function upgrade_foo(old version, new version)
        [ custom_variable_classes 'a,b' 
          configuration file 'foo.conf' ]
   depends on bar version 0.3
       and on baz version >= 1.2;

Here we suppose we have also a new datatype "version" to host the versioning information, with the associated operators. See http://packages.debian.org/sid/postgresql-9.0-debversion

Doing it this way, we skip the need to provide a way of telling "next comands are meant for creating SQL objects which belongs to such extension", at the expense of forcing authors to manage upgrades to add objects.

The upgrade function is mandatory, and has to return the installed version or null, meaning "please run the install script again, that's how I upgrade". The error management is to be made by means of RAISE EXCEPTION.

If a specific function is to get called at install or uninstall time, it's easy enough to SELECT install_function(); from within the install script, after having defined it. To support this, internal GUCs (not exposed in postgresql.conf) will be provided and set by PG when running those scripts, named current_extension and current_extension_version.

installing and removing an extension

 begin;
 install extension foo;
 commit;

The extension and its objects end up in pg_extension, authors are not supposed to explicitly create a schema where to live in, because that means users will have to think about adapting their search_path, which can get pretty hairy sometimes. There are exceptions to this rule, like PGQ / Londiste for example, that you install in the database then use command line tools to install triggers. The triggers can of course reference easily pgq.logutrigga() function, so the user has no direct contact with the schema choice. In such situation, the author is free to create a specific schema.

 begin;
 drop extension foo [cascade];
 commit;

The "cascade" option is there to care about reverse depends.

ACLs

The "bulk" ACL management of an extension's objects is pushed to the globing support project for GRANT/REVOKE, so we don't have to speak about what it'll look like here :)

OS Filesystem Interaction

PostgreSQL already provides standard paths where to install extensions by means of PGXS, and distribution packagers have been able to adapt those. We should just stick with this, meaning the problem is solved.

Packaging extensions at the OS level

Using PGXS and some glue around, we could propose a source-based extension packaging facility for PostgreSQL:

pg_ext add-mirror http://extensions.postgresql.org/
pg_ext list [remote | available]
pg_ext add plproxy prefix citext
pg_ext install mydatabase plproxy prefix
pg_ext drop [--force] plproxy mydatabase
pg_ext remove <package> ...
...

The add-mirror allows to fetch from remote url directly, what you fetch is a source tarball. The repository will have to be organized with "branches" for major PostgreSQL versions, and the pg_ext behavior will be determined at its "compilation" time, so you get it hard-coded for the major version it came with.

The add command will retrieve the source tarball of <extension> and run make && make install from a (configurable) temporary location. For an extension to be supported it'll have to use PGXS standards and to provide a simple make && make install mecanism. Is this realistic? Would postgis fit in?

Then install will issue the SQL for you, as drop would, so those are convenience wrappers.

The command remove will rip out the extension on disk only if it's no more used in your clusters databases, and it'll be responsible for removing modules objects at the SQL level. Mrmmm.... or will it?

Distributing extensions

If you're a distribution packager it's now easy to package extension based on pg_ext tool. The job consists in providing compiled .so for the archs you support, when the distribution level package is installed next step the used is confronted to is installing the extension in his database(s). So it goes like this:

apt-get install postgresql-8.5-plproxy
psql mydb
=# install extension plproxy;

Or maybe like this:

apt-get install postgresql-8.5-plproxy
pg_ext install mydb plproxy

Code design

http://archives.postgresql.org/pgsql-hackers/2009-07/msg01425.php

- make the pg_catalog.pg_extension entry and the associated commands with version as text, one thing at a time please

- bootstrap core components in pg_extension for dependancy on them (plpgsql, ...)

- implement a backend function pg_execute_commands_from_file('path/ to/file.sql'); reserved to superuser, file in usual accepted places

- implement INSTALL EXTENSION with the previous function

- add a static backend local variable installing_extension (oid)

- modify each SQL object create statement to add entries in pg_depend - add an specific type for handling version numbers and operators comparing them

pg_execute_commands_from_file

http://archives.postgresql.org/pgsql-hackers/2009-07/msg01468.php

in particular, what do you think about implementing a general purpose backend function similar to psql's \i (except without support for \commands and :variables):

 SELECT pg_execute_commands_from_file('path/ to/file.sql');

Your recent work about having a re-entrant parser should make it possible to implement, by either "extending" or copy/pasting the postgres.c:exec_simple_query, right?

(Difference is about not overriding current unnamed portal and maybe

forcing PortalRunMulti() usage, and that there's already a started
transaction (but start_xact_command() is a noop in this case))

Open Items

Here are from memory the problems we don't have a solution for yet:

- how to give user the ability to install the extension's objects in another schema than the pg_extension default one

- how to provide extension author a way to have major PG version dependant code without having to implement and maintain a specific function in their install.sql file