User:Ziga/xpg dump
Often it is desirable to dump the database as SQL statements; fine database tools (pgAdmin, phpPgAdmin) need to support this. It is also useful for migration and backup.
There appears to be a Google SoC 2007 idea for this functionality.
Many programming languages (Prolog,ML,Lisp) offer fascilities to dump code as source; with the use of functions this can be done in SQL too.
Currently, the 'official' way of generating DDL dumps from PostgreSQL is pg_dump command, which is external to the backend. This makes it somewhat impractical to use in SQL queries.
Currently there are several implementations of DDL dumper:
- pg_dump command line utility that comes with PostgreSQL
- PgAdminIII has DDL dumping fascilities
- others?
Variant: dump as ALTERs when possible. these can sometimes better be used for upgrading existing schemas
Variant: support dumping in strict ANSI SQL.
Use cases
- backup
- schema replication / comparison
- support for DB admin tools
- create packages for xpg_package
- relocation of code from where it is (pg_dump) to where it belongs (backend). This makes this code more accesible to more software.
Problems and limitations of pg_dump
- external to the database. This probably violates some of Codd's 12 rules for RDBMS.
- complex internals, difficult to maintain
- can't use same tool for different versions of backend, it has to be upgraded
- it cannot dump functions only
- it will not properly dump end restore my somewhat hairy database (7.4)
Quick and dirty solution
CREATE OR REPLACE FUNCTION system(text) RETURNS text AS 'my $cmd=shift; return `cd /tmp;$cmd`;' LANGUAGE plperlu; CREATE OR REPLACE FUNCTION ddl_pgdump(name, name) RETURNS text AS ' SELECT system(PGUSER=||current_user|| pg_dump --attribute-inserts --no-owner --no-reconnect || --schema || $1 || --table || $2 || || current_database()); ' LANGUAGE sql;
Of course, we would prefer something more sophisticated.
Dumper backend API
Plan is to implements a set of DDL generating functions withing the backend and an external pg_dump compatible replacement.
High-level language (plpgsql and plperl) is preffered and sufficient.
Generally, the dumper functions are partitioned according to what they dump; this way it is possible to use a high-level program (ie. the before mentioned pg_dump replacement) for ordering dumps. High level tool should also take care of creating pg_restore compatible compressed dump.
Domains:
- sql_expression
- sql_statement
DDL functions (should) return sql_statement type.
When dumping objects not in current schema, SQL identifier names are prefixed with schema name. This means you can set the search_path and have functions behave as expected.
Currently we have these end-user functions:
ddl_package(name) - dump a whole schema as one neat SQL ddl_script(name, name) - dump a particular object. this is intended to be used in GUI tools and such ddl_script(name, name, text) - dump a user-friendly script. $3 specifies what to dump.
Perhaps:
ddl_script(statement_type,object_type,namespace,name,name) statement type: create, alter od drop object type: table, view, function, type, domain, constraint, index, rule, trigger, grant, default
These are support functions used by the ones above.
ddl_all(name) ddl_class(regclass) - dump a particular class (table or view) ddl_constraints(name) - dump constraints for a whole schema ddl_constraints(regclass) - dump constraints on a particular class ddl_contrib(name) - dump contrib/dependencies. TO BE REPLACED. ddl_defaults(name) - dump column defaults for a whole schema ddl_defaults(regclass) - dump column defaults for a particular class ddl_export(name, uri) ddl_function(regproc) - dump a particular function ddl_functions(name) - dump functions for a whole schema ddl_grants(regclass) - dump grants for a particular class ddl_grants(regclass, name) ddl_indexes(name) - dump indexes for a whole schema ddl_indexes(regclass) - dump indices on a particular class ddl_pgdump(name, name, text) - generate SQL dump by calling external program pg_dump. TO BE REMOVED. ddl_rules(name) - dump rules for a whole schema ddl_rules(regclass) - dump rules on a particular class ddl_schema(name) - dump schema creation ddl_table(regclass) ddl_tables(name) ddl_triggers(name) - dump triggers for a whole schema ddl_triggers(regclass) - dump triggers on a particular class ddl_type(regclass) - dump a type (composite or sequence) ddl_type(regtype) - dump a type (domain) ddl_types(name) - dump types for a whole schema ddl_view(regclass) ddl_views(name)
Dumping order
Not so important for individual objects. More important for whole schemas.
- by dependancy (difficult)
- by increasing oid
Some things (functions, defaults, null constraints) can be CREATE OR REPLACEd or ALTERed without problems.
Apparently the smart/hairy stuff about this is in pg_dump C code.
Current status
- Implemented for Postgres 7.4 (or later)
- DDL generator functions largely implemented in plpgsql and plperl(u), as a part of WDBI schema.
- No C code (this is a feature)
- external dumper implemented as unix command wdbi-dump, which is designed to be option-compatible with pg_dump
Friendly system views
Dumper depends on friendly system views (currently those in WDBI schema). Using pg_catalog views directly is somewhat difficult.
Perhaps INFORMATION_SCHEMA should be used, but it's not entirely sufficient at the moment (eg. no OID). SQL spec says INFORMATION_SCHEMA views can be extended by implementation specific columns, though...
Investigate: maybe newsysviews could be used instead; consider implementing the dumper as an add on to newsysviews.