From PostgreSQL wiki
Jump to navigationJump to search

General Modification Trigger and Generalized Data Queue Spec

Problem Statement

Solving different problems:

  • Replication ala Slony/Londiste transactional queues
  • Replication to heterogenous systems, possibly non-transactional
  • Differential update in Materialized Views.

Issues with current "queue table" implementations:

  • Data write and seek/scan overhead.
    • and vacuum overhead
  • No good way to include transactional DDL changes
  • Issues with large data
  • Cannot ensure that replication trigger gets fired last
  • Delay inherent in "poll" model

Benefits of "queue table" implementations:

  • Can ensure that only committed data gets replicated
  • Persistent storage

Suggestions for Implementation

Per Itagaki:

  1. Non-table storage for row modification
  2. Send items to another server
  3. Implement non-persistent option (global temp tables)

Per Josh:

  1. Write a DEFERRABLE trigger based on the current constraint-trigger code.
  2. Work on a compact format for row storage (protobuf?)
  3. Use 3rd-party queues (AMQ, AMQP, Spread?) for transport.
  4. Come up with way to inject DDL into queue at appropriate point.
    Suggestion (cbbrowne)... This seems like an orthogonal matter in two ways:
    1. Capturing DDL information is different from capturing data modifications
    2. Capturing the point at which to inject the DDL into the queue is separate, perhaps using a combination of:
      1. Shared sequence to indicate ordering
      2. Commit timestamp information (may also be a sequence number of some sort) to control access to boundaries between sets of changes

Per Marko:

  1. Replication trigger must NOT be fired last, but after all BEFORE triggers and before all (non-replication) AFTER triggers...
  2. DDL event positioning is solved problem - the usual after trigger "agreeable order" logic applies here:
    1. Start transaction.
    2. Execute DDL, this will take locks on appropriate objects.
    3. Take event number from seq - here all potentially conflicting operations are waiting on locks, thus have not yet their event-id assigned from seq.
    4. Insert event into queue.
    5. Commit, releasing locks.
  3. Queue table efficiency seems also solved problem:
    1. Use efficient fetch query (PgQ)
    2. Tables should be INSERT-only and rotated often - eg with 5-10min period - to keep tables as small as possible. See PgQ on how to do it without locking problems.
      1. Minimal reading overhead - active data is in memory.
      2. Minimal VACUUM overhead.
      3. Minimal index maintenance overhead.



Modification triggers must be fired after all BEFORE triggers and before all AFTER triggers. They can access the statement type (INSERT/UPDATE/DELETE) and old and new tuples, but cannot modify the new tuple nor cancel the statement unless an error case.

DDL triggers are discussed in the separated section. However, we might handle TRUNCATEs in the queue layer because modification events before TRUNCATEs will be discardable in some cases.


We have some choices to record modifications in the queues; Full description vs. Primary keys only. They have trade-off, so we would be better to have options to choose which we use.

Full description queues
They include not only primary keys of modified tuples but also the non-key fields. They are used by Slony and Londiste.
  • We can avoid JOINs the queue and the original table on replay.
  • We can replay all of the actual operations.
  • The size of queues will be relatively larger then PK-only queues.
PK-only queues
They include only primary keys of modified tuples. An UPDATE of PK is recorded with a combination of DELETE and INSERT. They are used by Oracle's Materialized View Log.
  • The size of queues will be relatively smaller then full description queues.
  • We can easily merge multiple UPDATE operations on the same tuples.
  • We need JOINs of the queue and the original table on replay to retrieve actual data.
  • We can only reconstruct the final status of the table. Transition state is impossible.


See also discussion in GDQ Implementation.

The queue can be implemented with standard tables, and Slony and PgQ also use them, but we can use another data structure for them because GDQ requires only INSERT/SELECT/TRUNCATE. Fully-features of standard tables are not always needed. However, the consumers (subscribers) of the queues don't always read the data FIFO. Some consumers read the queue in random order because they requires snapshot based grouping of the modification records.

  • A queue table should consist of multiple tables to be INSERT-only. We rotate them with some period, and TRUNCATE old log records to avoid overhead of DELETEs.
  • The queue could be implemented with on-memory or non-WAL-logged storage engines. If we use such kinds of optimizations, the replicated servers must re-copy all of the table contents after the master's crash.
  • Use non-table storage for the queue, like some WAL style flat file, to avoid overheads.
  • Use custom WAL records for replication. WALSender can send replication records to standby servers.
    • A disadvantage of using WAL is that consumer might require old records in some cases. WALSender needs to read old WAL records from archive, but it's impossible.
  • Custom storage for GDQ might be kind of a SQL/MED foreign tables. We could extend the SQL standard for foreign tables to handle INSERT commands, and use them as the storage.


We have a choice of syntax to create queues and register consumers for the queues. Which is better?

PgQ has pgq.create_queue() and pgq.register_consumer().
Flexible, but the table name appears as a string literal (single-quoted).
Another SQL idea is to use ALTER TABLE, like ADD MODIFICATION QUEUE.


Generalized Data Queue should have enough capabilities to support existing replication products like Slony and Londiste.

Slony's sl_log

Slony's sl_log stores each change to be propagated to subscriber nodes. The log consists of two tables, sl_log_1 and sl_log_2, to sweep old records effectively.

  log_origin    integer, -- Origin node from which the change came 
  log_xid       xxid,    -- Transaction ID on the origin node 
  log_tableid   integer, -- The table ID that this log entry is to affect
  log_actionseq bigint,
  log_cmdtype   char(1), -- U = Update, I = Insert, D = DELETE 
  log_cmddata   text,    -- The data needed to perform the log action

In version 2.1, log_cmdtype extends a little, and further is expected in 2.2. Values are:

Value Description
I Insert
U Update
D Delete
T Truncate
S DDL Script

Londiste's PgQ

PgQ defines a text-based generalized queue, and Londiste uses those queues through triggers. Each queue consists of multiple tables (default is 3) for the same reason as Slony.

CREATE TABLE pgq.event_template (
  ev_id     bigint,      -- event's id, supposed to be unique per queue
  ev_time   timestamptz, -- when the event was inserted
  ev_txid   bigint,      -- transaction id which inserted the event
  ev_owner  int4,        -- subscription id that wanted to retry this
  ev_retry  int4,        -- how many times the event has been retried, NULL for new events
  ev_type   text,        -- I/U/D
  ev_data   text,        -- partial SQL statement or column values urlencoded
  ev_extra1 text,        -- table name
  ev_extra2 text,        -- (not used in Londiste)
  ev_extra3 text,        -- (not used in Londiste)
  ev_extra4 text         -- (not used in Londiste)

Materialized View Logs in Oracle Database

Oracle Database supports row-based replication based on Materialized View Logs. The logs are used for differential updates in Materialized Views, but also for replication combined with DATABASE LINK. See Planning Your Replication Environment for details. PostgreSQL might learn the design of Replication/Generalized Data Queue from Oracle Database, where the row-based logical replication is based on 3 fundamental modules:

  1. Generalized Data Queue
  2. Materialized Views with Differential Updates
  3. Inter-database Connections

Materialized View Logs only includes primary keys in default, but users can add any columns in the log. Simple materialized views require only primary keys are required for differential updates. The actual data seem to be retrieved from the original table using JOINed with the primary keys. On the other hand, complex materialized views (ex. including aggregations) require referencing column data. Note that Oracle's logs do not contain modified or inserted columns values unlike Slony and Londiste. Changes of primary keys are described as a combination of DELETE and INSERT.

CREATE TABLE oracle_mlog_pk (
  pk_1     pk_type_1,   -- first primary key
  pk_2     pk_type_2,   -- second primary key
  pk_N     pk_type_N,   -- Nth primary key
  snaptime timestamptz, -- first snapshot sets initial refresh time
  dmltype  char(1),     -- Type of DML
  old_new  char(1),     -- O/N/U (= D/I/U in Slony)
  change_vector bytea   -- Used for subquery and LOB snapshots (?)