User:Ziga/xpg dump

From PostgreSQL wiki
Jump to navigationJump to search

Often it is desirable to dump the database as SQL statements; fine database tools (pgAdmin, phpPgAdmin) need to support this. It is also useful for migration and backup.

There appears to be a Google SoC 2007 idea for this functionality.

Many programming languages (Prolog,ML,Lisp) offer fascilities to dump code as source; with the use of functions this can be done in SQL too.

Currently, the 'official' way of generating DDL dumps from PostgreSQL is pg_dump command, which is external to the backend. This makes it somewhat impractical to use in SQL queries.

Currently there are several implementations of DDL dumper:

  • pg_dump command line utility that comes with PostgreSQL
  • PgAdminIII has DDL dumping fascilities
  • others?

Variant: dump as ALTERs when possible. these can sometimes better be used for upgrading existing schemas

Variant: support dumping in strict ANSI SQL.

Use cases

  • backup
  • schema replication / comparison
  • support for DB admin tools
  • create packages for xpg_package
  • relocation of code from where it is (pg_dump) to where it belongs (backend). This makes this code more accesible to more software.

Problems and limitations of pg_dump

  • external to the database. This probably violates some of Codd's 12 rules for RDBMS.
  • complex internals, difficult to maintain
  • can't use same tool for different versions of backend, it has to be upgraded
  • it cannot dump functions only
  • it will not properly dump end restore my somewhat hairy database (7.4)

Quick and dirty solution

CREATE OR REPLACE FUNCTION system(text) RETURNS text 
AS 'my $cmd=shift; return `cd /tmp;$cmd`;' LANGUAGE plperlu;

CREATE OR REPLACE FUNCTION ddl_pgdump(name, name)
  RETURNS text AS '  
SELECT
   system(PGUSER=||current_user||
           pg_dump --attribute-inserts --no-owner --no-reconnect  ||
           --schema || $1 ||
           --table || $2 || || current_database());
'
LANGUAGE sql;

Of course, we would prefer something more sophisticated.

Dumper backend API

Plan is to implements a set of DDL generating functions withing the backend and an external pg_dump compatible replacement.

High-level language (plpgsql and plperl) is preffered and sufficient.

Generally, the dumper functions are partitioned according to what they dump; this way it is possible to use a high-level program (ie. the before mentioned pg_dump replacement) for ordering dumps. High level tool should also take care of creating pg_restore compatible compressed dump.

Domains:

  • sql_expression
  • sql_statement

DDL functions (should) return sql_statement type.

When dumping objects not in current schema, SQL identifier names are prefixed with schema name. This means you can set the search_path and have functions behave as expected.

Currently we have these end-user functions:

ddl_package(name) - dump a whole schema as one neat SQL
ddl_script(name, name) - dump a particular object. this is intended to be used in GUI tools and such
ddl_script(name, name, text) - dump a user-friendly script. $3 specifies what to dump.

Perhaps:

ddl_script(statement_type,object_type,namespace,name,name)
statement type: create, alter od drop
object type: table, view, function, type, domain, constraint, index, rule, trigger, grant, default

These are support functions used by the ones above.

ddl_all(name)
ddl_class(regclass) - dump a particular class (table or view)
ddl_constraints(name) - dump constraints for a whole schema
ddl_constraints(regclass) - dump constraints on a particular class
ddl_contrib(name) - dump contrib/dependencies. TO BE REPLACED.
ddl_defaults(name) - dump column defaults for a whole schema
ddl_defaults(regclass) - dump column defaults for a particular class
ddl_export(name, uri)
ddl_function(regproc) - dump a particular function
ddl_functions(name) - dump functions for a whole schema
ddl_grants(regclass) - dump grants for a particular class
ddl_grants(regclass, name)
ddl_indexes(name) - dump indexes for a whole schema
ddl_indexes(regclass) - dump indices on a particular class
ddl_pgdump(name, name, text) - generate SQL dump by calling external program pg_dump. TO BE REMOVED.
ddl_rules(name) - dump rules for a whole schema
ddl_rules(regclass) - dump rules on a particular class
ddl_schema(name) - dump schema creation
ddl_table(regclass)
ddl_tables(name)
ddl_triggers(name) - dump triggers for a whole schema
ddl_triggers(regclass) - dump triggers on a particular class
ddl_type(regclass) - dump a type (composite or sequence)
ddl_type(regtype) - dump a type (domain)
ddl_types(name) - dump types for a whole schema
ddl_view(regclass)
ddl_views(name)

Dumping order

Not so important for individual objects. More important for whole schemas.

  • by dependancy (difficult)
  • by increasing oid

Some things (functions, defaults, null constraints) can be CREATE OR REPLACEd or ALTERed without problems.

Apparently the smart/hairy stuff about this is in pg_dump C code.

Current status

  • Implemented for Postgres 7.4 (or later)
  • DDL generator functions largely implemented in plpgsql and plperl(u), as a part of WDBI schema.
  • No C code (this is a feature)
  • external dumper implemented as unix command wdbi-dump, which is designed to be option-compatible with pg_dump

Friendly system views

Dumper depends on friendly system views (currently those in WDBI schema). Using pg_catalog views directly is somewhat difficult.

Perhaps INFORMATION_SCHEMA should be used, but it's not entirely sufficient at the moment (eg. no OID). SQL spec says INFORMATION_SCHEMA views can be extended by implementation specific columns, though...

Investigate: maybe newsysviews could be used instead; consider implementing the dumper as an add on to newsysviews.