Compensating for Unimplemented Features in PostgreSQL 7.1

From PostgreSQL wiki
Jump to navigationJump to search



John Pagakis, President, Toolsmythe Software Services, Inc.
Todd Gauthier, Database Administrator, DevelopOnline, Inc.

August 21, 2001

Abstract

In most areas PostgreSQL 7.1 is as robust a SQL database as any. Unfortunately, there are some unimplemented features that make altering a schema after the fact a somewhat daunting task. Specifically missing from this release:

  • support for dropping a column from a table
  • changing a column from NULL to NOT NULL (or vice versa)
  • adding a primary key
  • changing column types

Other features, like defining ON DELETE/UPDATE behavior after the fact require knowledge of PostgreSQL internal administration tables.

In addition, there are some features that have been implemented, but are not as robust as they need to be. There is, for example, ALTER TABLE syntax for adding a column but that syntax does not accommodate specifying the column as a primary key, or specifying it NOT NULL. To accomplish this you are again forced down into the bowels of PostgreSQL.

This paper proposes an alternative to hacking administration tables or editing the output of pg_dump, and identifies some pitfalls you will no doubt encounter. The proposed alternative uses familiar SQL syntax and does not require knowledge of the internals of PostgreSQL. Best of all, it will work for you and keep you in your comfort zone even with your boss standing over your shoulder!

Acknowledgements

The information presented in this paper was uncovered over a period of three months at DevelopOnline, Inc. located in sunny Tempe, AZ. I did the research on the company dime and am donating the time for this write-up, but it is through DevelopOnline?s commitment to the Open Source community that you are reading this. They didn?t have to allow me to share this with you - they chose to. Be sure to drop by www.developonline.com to say thank you!

I'd also like to acknowledge Joel Burton's excellent paper 'Referential Integrity Tutorial & Hacking the Referential Integrity Tables' (Burton, J. 2001a) -- the definitive tome on the topic.

Last but not least, I'd like to acknowledge Todd Gauthier, DBA at DevelopOnline, Inc. for his help with debugging and testing during a very stressful time for us both ‹grin›.

The information in this paper is accurate to the best of my knowledge, but as with all things involving computers, your mileage may vary. Neither I, nor any member of the PostgreSQL community can be held responsible for any losses or damage incurred by using the knowledge or tools presented herein. Caveat emptor!

-- John Pagakis, President, Toolsmythe Software Services, Inc.

Problem Domain

If you are reading this, chances are you have discovered that it is difficult to make changes once you have implemented a schema in PostgreSQL 7.1. Generally, you will first learn about this with some tight deadline looming large. This paper is for you.

First and foremost I urge you, do not use this paper as an excuse to avoid learning about the internals of PostgreSQL. If you have the time, there is no substitute for knowledge. However, if you have schema mods due tomorrow and three department heads breathing down your neck, now is not the time to experiment with the whys and wherefores of hacking referential integrity.

Solutions

So, let?s get right to it. Assume the following table:

	CREATE TABLE
		user_table( user_id char( 8 ) NOT NULL PRIMARY KEY,
			first_name text,
			last_name text,
			social_security_number text NULL,
			unneeded_column text );

If you wanted to, for example, drop the unneeded_column column and change social_security_number to NOT NULL UNIQUE, you will need to execute the following steps:

  • Preserve the data
  • Drop the old table table
  • Recreate the table with your new definition
  • Re-populate the new table from the data you saved

The SQL to accomplish this is shown in Exhibit A :

Exhibit A

	DROP TABLE
		temp_user_table;

	/* Preserve the data */
	SELECT 
		user_id,
		first_name,
		last_name,
		social_security_number,
		unneeded_column
	INTO
		temp_user_table
	FROM
		user_table;


	/* Drop the old table */
	DROP TABLE
		user_table;




	/* Recreate the table with your new definition */
	CREATE TABLE
		user_table( user_id char( 8 )NOT NULL UNIQUE PRIMARY KEY,
				first_name text,
				last_name text,
				social_security_number text NOT NULL UNIQUE );


	/* Re-populate the new table from the data you saved */
	INSERT INTO
		user_table( user_id,
			    first_name,
			    last_name,
			    social_security_number )
	SELECT
		user_id,
		first_name,
		last_name,
		social_security_number 
	FROM
		temp_user_table;

Note the DROP of the temp_user_table at the beginning of this script could fail if the table does not already exist. That is fine, the important thing is that the table must not exist for the next statement to work. The SELECT INTO FROM syntax will actually create a table for you and populate it from the source table. Note this table serves only as a place to park the data ? it will not copy constraints, default values, or anything else. You should not try to use this table for any other purpose. The rest of this script should be pretty straightforward.

There are some important caveats here. First, never apply the technique shown in Exhibit B on a table with a primary key:

Exhibit B

	/* Dangerous!!!  Avoid!! */
	CREATE TABLE
   		temp_user_table( user_id char( 8 ) NOT NULL UNIQUE PRIMARY KEY,
				 first_name text,
				 last_name text,
				 social_security_number text );


	INSERT INTO
		temp_user_table( user_id,
				 first_name,
				 last_name,
				 social_security_number )
	SELECT
		user_id,
		first_name,
		last_name,
		social_security_number 
	FROM
		user_table;


	DROP TABLE
		user_table;



	ALTER TABLE
		temp_user_table
	RENAME TO
		user_table;

By using the technique in Exhibit B on a table with a primary key, you will find out the next time you have to drop and recreate the table that you have a misnamed index on the primary key. In our example, the second time you do this, the CREATE TABLE will fail because index temp_user_table_pkey already exists.

PostrgeSQL generates a name of ‹table name›_pkey as the name of the index for a table?s primary key. When you executed CREATE TABLE temp_user_table, under the covers an index named temp_user_table_pkey was created. When you executed ALTER TABLE RENAME to rename temp_user_table to user_table, the table was renamed - BUT THE INDEX WAS NOT. The ALTER TABLE RENAME was smart enough to rename the table, but not smart enough to rename the primary key index.

It could be months later when you discover this and you will spend a lot of time trying to figure out what the problem is. To avoid this problem, always use the technique outlined in Exhibit A, never the Exhibit B technique.

Please note, the technique in Exhibit B is proposed in ?Upgrading Columns in PostgreSQL Tables Effectively? (Burton, J. 2001b). Mr. Burton's example in this paper works without the aforementioned complications but only because it was performed on a table with no primary key.

On to the next caveat - if you have any tables that hold references to the table you just dropped, you will discover that referential integrity is no longer being enforced between these two tables.

Consider the following schema:

Given this schema, the equipment table will hold a foreign key reference to the user_table table. The act of dropping and recreating the user_table table will break the constraint between these two tables. This means you will be able to delete a user_table entry that has references from the equipment table; you will be able to add equipment with a reference to a user that does not exist in the user_table table (neither of these actions would be allowed with referential integrity working).

To resolve this problem you will need to add the foreign key constraint back with a bit of SQL:

Exhibit D

	ALTER TABLE	
		equipment
	ADD CONSTRAINT
		fk__equipment__user_table
	FOREIGN KEY
		( fk_user )
	REFERENCES
		user_table( user_id )
	MATCH FULL;

Exhibit D assumes that the column name of the foreign key reference in equipment to the user_table is named fk_user.

Bear in mind that the act of adding back the constraint will cause that constraint to be enforced as part of the act of adding it. In other words, if there are orphaned references in the child table (in our example, equipment) the ALTER TABLE ADD CONSTRAINT will fail with a referential integrity violation. You will then have to locate and delete the orphans before attempting again to add the constraint.

It is important to get all of this correct because it is a rather insipid problem if you miss a foreign key reference. It?s not as though a big red sign pops up saying ?WARNING!! REFERENTIAL INTEGRITY IS BROKEN!!? You could go quite some time before you find out that you have a referential integrity problem, and spend even more time tracing it back to the dropping and recreation of a table.

To that end, we have developed a Perl script ? genConstraints.pl (see Appendix A) ? to automate the creation of the SQL statements to put the foreign key constraints back. This script will generate the ALTER TABLE ADD CONSTRAINT statements for tables referencing any tables you specify. To use this script type the following:

	./genConstraints.pl dbName username password [table1 [table2 ?]]

The dbName, username and password arguments are required (if there is no password, use a pair of quotes?? as a placeholder). If you specify no table name(s), all foreign key constraints will be scripted. You may provide a list of tables ? space delimited ? in which case only references to those tables will be scripted. Review the general comments in the Perl script for further information.

It is also a very good idea to implement the system catalog view defined by Joel Burton in paper ?Referential Integrity Tutorial & Hacking the Referential Integrity Tables? (Burton, J. 2001a). This will give you a fairly easy way to see what constraints are in place.

Another caveat must be noted when switching a column from NULL to NOT NULL. Keep in mind, once you recreate the table, any NULL data in the column you changed to NOT NULL will cause the reload of the table to fail with a message indicating that NULLs were found in a field were NULL is not allowed. If possible, when you switch over to NULL, also define a default value. If that is not feasible, bear in mind you will have to modify those NULL values in the temp table prior to repopulating the real table.

In summary, to get around some of the unimplemented features in PostgreSQL 7.1, the following procedure works nicely:

  1. Create SQL to preserve constraints referencing the table(s) you will modify. The genConstraints.pl utility can help you here. If you use it, remember to redirect the output to a file.
  2. Create a SQL script using the technique outlined in Exhibit A.
  3. Execute that script. Be sure to use the ?s argument in psql so you can verify the success of each step before executing the next. Also, please note when you use ?s to execute a script held in a file, you must use ?f to indicate the file. If you redirect the file into psql, -s is ignored.
  4. Re-apply the constraints referencing the tables you have just manipulated. Execute the SQL script you created in Step 1. You must verify the success of adding the foreign key constraints ? make sure the add did not generate a foreign key violation.

Appendix A

GenConstraints.pl

#!/usr/bin/perl -w
################################################################
## This program is free software; you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 2 of the License, or
## (at your option) any later version.
##
## This program is distributed in the hope that it will be useful,
## but WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
## GNU Library General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with this program; if not, write to the Free Software
## Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307,
## USA.
################################################################
## genConstraints.pl
##
## Input:
##        databaseName - name of the database you wish to process
##        user - valid user name in postgres
##        password - user's password (or "" for nothing)
##        tableName(s) - optional.  One or more tables in the
##                       database.  Omitted, it will process all
##                       tables.
##
## Output:
##        SQL script to regenerate foreign key constraints on the
##        tables in the tableName list (or, if no tableName list
##        all foreign key constraints).  Output goes to standard
##        output - redirect to file if you want to save it.
##
## Description:
##        If you drop and rebuild a table that has foreign key
##        references from other tables, those references will be
##        lost.  This script allows you to create the SQL
##        statements to add those foreign key constraints back
##        once you have recreated and repopulated the parent
##        table.
##
## Requirements:
##        This program must be run before you drop the table.
##        Once you drop the table, the constraints are lost.
##
##        When you try to add the constraints back, the rules of
##        referential integrity will be enforced.  This means that
##        there must be an entry in the parent table (the one you
##        dropped and re-added) for every reference in the child
##        table.
############
## Modification History
############
## 2000.08.16 - Genisis
## By John Pagakis, Toolsmythe Software Services, Inc. for
## DevelopOnline, Inc.
################################################################

use DBI;

################################################################
##  main
##
##  Input:
##        See program input in general comments above
##
##  Output:
##        See program output in general comments above
##
##  Description:
##        00) Check for valid command line args.  If wrong number of
##            args, show usage.
##
##        01) Attempt to connect to the database.  If successful,
##            select all non-postgres-internal triggers from
##            pg_trigger.
##
##        02) Initialization of control variables.
##
##        03) For each trigger ....
##                00) Parse out the trigger arguments.
##                    Postgres 7.x stores constraints as triggers.
##                    The relationship between the parent and child
##                    table is held by postgres in the tgargs column
##                    of the pg_trigger table.  The targs column is
##                    a byte array, the arguments are separated by
##                    the string "\000".  For a foreign key constraint
##                    there are six arguments:
##                        0) Trigger name (or  if no trigger
##                           name was defined.
##                        1) Referencing table name.
##                        2) Referenced table name.
##                        3) Constraint type (or "UNSPECIFIED" if none).
##                        4) Referencing column name.
##                        5) Referenced column name.
##                    We are going to use this information to generate
##                    the ALTER TABLE ADD CONSTRAINT syntax needed to
##                    put the constraint back after it has been dropped.
##                    This step parses out the arguments and holds them
##                    in the @splitArgs array.
##
##                01) Now that we have the arguments parsed out, we need
##                    to see if this is a constraint we need to generate
##                    SQL for.  The answer is yes if no table list was
##                    included in the command line args, or if the
##                    referenced table name (@splitArgs[2]) is in the
##                    command line table list.  If either of these
##                    conditions is met ....
##
##                        00) Have we hit a control break?
##                            There will be duplicate triggers in the
##                            trigger table.  This is because constraints
##                            must be checked on inserts, mods, and
##                            deletes.  Each one of these is a different
##                            trigger, but the information (arguments ) is the same,
##                            so we don't want to process what for our
##                            purposes are duplicates.  On a control
##                            break ....
##
##                                00) Save off the new control info.
##
##                                01) Generate the first part of the
##                                    SQL statement.
##
##                                02) If the trigger is unnamed, generate
##                                    a name using the following format:
##                                        fk__referencingTableName__referencedTableName
##                                    If a name exists, use it.
##
##                                03) Generate the next part of the SQL
##                                    statement.
##
##                                04) If a constraint type is specified, use it,
##                                    otherwise do nothing.
##
##                                05) Finsh off the SQL statement.
##
##        04) Finish the result set.
##
##        05) Close the database connection
##
##
##  Side effects:
##        Any unnamed constraint will be given a name matching
##                fk__referencingTableName__referencedTableName
################################################################


######
# 00 #
if ( @ARGV < 3 )
{
        print "USAGE ....\n\tgenConstraints dbName user password [table1 [table2 ...]]\n";
        exit;
}

######
# 01 #
my $dbase = shift;
my $user = shift;
my $password = shift;
$host = "localhost";

######
# 02 #
my $driver = "dbi:Pg:dbname=" . $dbase . ";host=" . $host;

my $dbh = DBI->connect( $driver, $user, $password ) or die "\nError ($DBI::err):$DBI::errstr\n";

my $targResultSet = $dbh->prepare( "SELECT tgargs FROM pg_trigger WHERE tgname NOT LIKE 'pg_%';" );

$targResultSet->execute;


######
# 03 #
$saveReferedTable = "x#";
$saveReferingTable = "x#";
$saveReferedKey = "x#";
$saveReferingKey = "x#";

$numberOfArgs = @ARGV;


######
# 04 #
while ( @targs = $targResultSet->fetchrow_array )
  {

        #########
        # 04.00 #
        my @arg = $targs[0];

        my $argAsStr = pack "a*", $arg[0];

        my @splitArgs = split /\\000/, $argAsStr;


        #########
        # 04.01 #
        if ( $numberOfArgs == 0
        ||   isInTableList( $splitArgs[2], @ARGV ) )
        {
                ############
                # 04.01.00 #
                if ( $saveReferedTable ne $splitArgs[2]
                ||   $saveReferingTable ne $splitArgs[1]
                ||   $saveReferedKey ne $splitArgs[4]
                ||   $saveReferingKey ne $splitArgs[3] )
                {
                        ###############
                        # 04.01.00.00 #
                        $saveReferingTable = $splitArgs[1];
                        $saveReferedTable = $splitArgs[2];
                        $saveReferingKey = $splitArgs[3];
                        $saveReferedKey = $splitArgs[4];

                        ###############
                        # 04.01.00.01 #
                        print "ALTER TABLE\n\t$splitArgs[1]\nADD CONSTRAINT";

                        ###############
                        # 04.01.00.02 #
                        if ( $splitArgs[0] eq "" )
                        {
                                print "\n\tfk__";
                                print $splitArgs[1];
                                print "__";
                                print $splitArgs[2];
                        }
                        else
                       {
                                print "\n\t$splitArgs[0]";
                        }

                        ###############
                        # 04.01.00.03 #
                        print "\nFOREIGN KEY\n\t( $splitArgs[4] )\nREFERENCES\n\t$splitArgs[2]( $splitArgs[5] )";

                        ###############
                        # 04.01.00.04 #
                        if ( $splitArgs[3] ne "UNSPECIFIED" )
                        {
                                print "\nMATCH $splitArgs[3]";
                        }

                        ###############
                        # 04.01.00.05 #
                        print ";\n\n\n";
                }
        }

  }


######
# 05 #
$targResultSet->finish;


######
# 06 #
$dbh->disconnect;


######################## end of main ############################

sub isInTableList
################################################################
##  isInTableList
##
##  Input:
##        $candidate - This is the table name that we are testing
##                     to see if there is a matching entry in the
##                     table array.
##
##        @tableArray - This is a list of tables (from @ARGV) that
##                      we must generate SQL statements for.
##
##  Output:
##        True ( 1 ) if $candidate is in @tableArray.
##        False ( 0 ) if $candidate is not in @tableArray.
##
##  Description:
##        See output.
##
##  Side effects:
##        None
################################################################
{
        my $candidate = $_[0];
        my @tableArray = $_[1];

        foreach $tableName ( @tableArray )
        {
                if ( $tableName eq $candidate )
                {
                        return 1;
                }
        }

        return 0;
}

References

About the Postgres System Catalogs (2001). Retrieved August 22, 2001 from the World Wide Web: http://www.postgresql.org/docs/7.1/static/pg-system-catalogs.html

Burton, J. (2001a). Referential integrity tutorial & hacking the referential integrity tables. Retrieved August 22, 2001 from the World Wide Web: Referential Integrity Tutorial & Hacking the Referential Integrity tables

Burton, J. (2001b). Upgrading columns in PostgreSQL tables effectively. Retrieved August 22, 2001 from the World Wide Web.

PostgreSQL Global Development Group, The (2001). PostgreSQL 7.1 developer's guide, Chapter 3: System catalogs (pp. 9 ? 24). Retrieved August 22, 2001 from the World Wide Web: http://www.postgresql.org/docs/7.1/static/catalogs.html

John Pagakis, President, Toolsmythe Software Services, Inc.
Todd Gauthier, DBA, DevelopOnline, Inc.
Update September 28th, 2003
Marc Stosberg(marc at summersault dot com) sent in the following email:



User Patch

We have a Postgres 7.1 installation that we are /still/ running. :)

There were a couple of bugs in the related script which I have included a patch for below. If you could post it to save some trouble for other poor souls like myself, it would be appreciated.

The two issues addressed were:

  • Remote hosts were not supported. I added basic support for the PGHOST environment variable.
  • Parsing the bytea results from Postgres were broken. I assume this is because a newer version of DBD::Pg works differently. I updated the code to work with a modern DBD::Pg, 1.20. I don't know what this means for older versions.

- Mark

#####


--- GenConstraints.pl   Wed Nov  7 06:37:53 2001
+++ GenConstraints.pl.new       Sun Sep 28 16:01:53 2003
@@ -159,7 +159,7 @@
 my $dbase = shift( @ARGV );
 my $user = shift( @ARGV );
 my $password = shift( @ARGV );
-$host = "localhost";
+$host = $ENV ||  "localhost";
 
 ######
 # 02 #
@@ -193,8 +193,7 @@
 
         my $argAsStr = pack "a*", $arg[0];
 
-        my @splitArgs = split /\\000/, $argAsStr;
-
+        my @splitArgs = split /\0/, $argAsStr;
 
         #########
         # 04.01 #
@@ -300,4 +299,4 @@
         }
 
         return 0;
-}
\ No newline at end of file
+}