Error logging in COPY

From PostgreSQL wiki
Jump to navigationJump to search

History

Error logging in COPY was a proposed feature developed by Aster Data against the PostgreSQL 9.0 code base. It was submitted and reviewed (1) but not accepted into the core product for that or any other version so far.

Overview

The purpose of error logging in COPY is to prevent the backend from erroring out if a malformed tuple is encountered during a COPY operation. Bad tuples can either be skipped or logged into an error logging table.

The format of the error logging table is as follows:

 CREATE TABLE error_logging_table(
   tupletimestamp TIMESTAMP WITH TIME ZONE,
   targettable    VARCHAR,
   dmltype        CHAR(1),
   errmessage     VARCHAR,
   sqlerrcode     CHAR(5),
   label          VARCHAR,
   key            BIGINT,
   rawdata        BYTEA
 );

The COPY command returns the number of successfully copied tuples only.

COPY options

Error logging is set by adding options to the COPY command. Here is the list of the available options:

Variable name Description Default value
ERROR_LOGGING Enables error handling for COPY commands (when set to true). true
ERROR_LOGGING_SKIP_BAD_ROWS Enables the ability to skip malformed tuples that are encountered in COPY commands (when set to true). true
ERROR_LOGGING_MAX_ERRORS Maximum number of bad rows to log before stopping the COPY operation (0 means unlimited). 0
ERROR_LOGGING_SCHEMA_NAME Schema name of the table where malformed tuples are inserted by the error logging module 'public'
ERROR_LOGGING_TABLE_NAME Relation name where malformed tuples are inserted by the error logging module. The table is automatically created if it does not exist. 'error_table'
ERROR_LOGGING_LABEL Optional label that is used to identify malformed tuples COPY command text
ERROR_LOGGING_KEY Optional key to identify malformed tuples Index of the tuple in the COPY stream

Bad tuples can be rejected for a number of reasons (extra or missing column, constraint violation, ...). The error table tries to capture as much context as possible about the error. If the table does not exist it is created automatically. The format of the error logging table is as follows:

 CREATE TABLE error_logging_table(
   tupletimestamp TIMESTAMP WITH TIME ZONE,
   targettable    VARCHAR,
   dmltype        CHAR(1),
   errmessage     VARCHAR,
   sqlerrcode     CHAR(5),
   label          VARCHAR,
   key            BIGINT,
   rawdata        BYTEA
 );

tupletimestamp stores the time at which the error occured. targettable describes the table in which the row was inserted when the error occured. The exact error message and sql error code are recorded in errmessage and sqlerrcode, respectively. The original data of the row can be found in rawdata.

Example

 CREATE TEMP TABLE foo (a bigint, b text);

-- input_file.txt --

 1	one
 2	
 3	three	111
 four    4
 5	five

-- end of input_file.txt --

error logging off

 COPY foo FROM 'input_file.txt';
 ERROR:  missing data for column "b"
 CONTEXT:  COPY foo, line 2: "2"

skip bad rows

 --skip bad rows
 COPY foo FROM 'input_file.txt' (ERROR_LOGGING, ERROR_LOGGING_SKIP_BAD_ROWS);
 SELECT * from foo;
  a |  b   
 ---+------
  1 | one
  5 | five
 (2 rows)

turn error logging on (default logs in error_logging_table)

 --turn error logging on (default logs in error_logging_table)
 COPY foo FROM 'input_file.txt' (ERROR_LOGGING);
 SELECT * from foo;
  a |  b   
 ---+------
  1 | one
  5 | five
 (2 rows)
 SELECT * FROM error_logging_table;
  key |           tupletimestamp            |              label              |  targettable  | dmltype |                errmessage                | sqlerrcode |         rawdata          
 -----+-------------------------------------+---------------------------------+---------------+---------+------------------------------------------+------------+--------------------------
    2 | Thu Sep 10 07:09:17.869521 2009 PDT | COPY foo FROM 'input_file.txt'; | pg_temp_2.foo | C       | missing data for column "b"              | 22P04      | \x32
    3 | Thu Sep 10 07:09:17.86953 2009 PDT  | COPY foo FROM 'input_file.txt'; | pg_temp_2.foo | C       | extra data after last expected column    | 22P04      | \x3309746872656509313131
    4 | Thu Sep 10 07:09:17.869538 2009 PDT | COPY foo FROM 'input_file.txt'; | pg_temp_2.foo | C       | invalid input syntax for integer: "four" | 22P02      | \x666f75720934
 (3 rows)

Redirect to another table with a specific label

 -- Redirect to another table with a specific label
 COPY foo FROM 'input_file.txt' (ERROR_LOGGING, ERROR_LOGGING_SCHEMA_NAME 'error', ERROR_LOGGING_TABLE_NAME 'table1', ERROR_LOGGING_LABEL 'batch1');
 SELECT * FROM error.table1;
  key |           tupletimestamp            | label  |  targettable  | dmltype |                errmessage                | sqlerrcode |         rawdata          
 -----+-------------------------------------+--------+---------------+---------+------------------------------------------+------------+--------------------------
    2 | Thu Sep 10 07:09:17.869521 2009 PDT | batch1 | pg_temp_2.foo | C       | missing data for column "b"              | 22P04      | \x32
    3 | Thu Sep 10 07:09:17.86953 2009 PDT  | batch1 | pg_temp_2.foo | C       | extra data after last expected column    | 22P04      | \x3309746872656509313131
    4 | Thu Sep 10 07:09:17.869538 2009 PDT | batch1 | pg_temp_2.foo | C       | invalid input syntax for integer: "four" | 22P02      | \x666f75720934
 (3 rows)

Limit to 2 bad rows:

 -- Limit to 2 bad rows:  
 COPY foo FROM 'input_file.txt' (ERROR_LOGGING, ERROR_LOGGING_MAX_ERRORS 2);
 ERROR:  invalid input syntax for integer: "four"
 CONTEXT:  COPY foo, line 4, column a: "four"
 SELECT count(*) from error_logging_table;
  count 
  -------
       0
  (1 row)