BDR User Guide

From PostgreSQL wiki

Revision as of 06:10, 3 May 2013 by Ringerc (Talk | contribs)

Jump to: navigation, search

This page is the users and administrators guide for BDR. If you're looking for technical details on the project plan and implementation, see BDR Project.


BDR User Guide

BDR stands for BiDrectional Replication. It is a multi-master replication system built on top of Log Level Streaming Replication (LLSR).

BDR is *not* multi-master clustering as some vendors use the term, in that it doesn't have a distributed lock manager, global transaction co-ordinator, etc. Each member server is separate but connected. This allows for greater flexibility with server connectivity and workload at the cost of weaker global consistency guarantees.

Guidance on getting a testing setup established are in #Initial setup. Please read the full documentation if you intend to put BDR into production.

Logical Log Streaming Replication

Logical log streaming replication (LLSR) allows one PostgreSQL master (the "upstream master") to stream a sequence of changes to another read/write PostgreSQL server (the "downstream master"). Data is sent in one direction only over a normal libpq connection.

Multiple LLSR connections can be used to set up bi-directional replication as discussed later in this guide.

Overview of logical replication

In some ways LLSR is similar to "streaming replication" i.e. physical log streaming replication (PLSR) from a user perspective; both replicate changes from one server to another. However, in LLSR the receiving server is also a full master database that can make changes, unlike the read-only replicas offered by PLSR hot standby. Additionally, LLSR is per-database, whereas PLSR is per-cluster and replicates all databases at once. There are many more differences discussed in the relevant sections of this document.

In LLSR the data that is replicated is change data in a special format that allows the changes to be logically reconstructed on the downstream master. The changes are generated by reading transaction log (WAL) data, making change capture on the upstream master much more efficient than trigger based replication, hence why we call this "logical log replication". Changes are passed from upstream to downstream using the libpq protocol, just as with physical log streaming replication.

One connection is required for each PostgreSQL database that is replicated. If two servers are connected, each of which has 50 databases then it would require 50 connections to send changes in one direction, from upstream to downstream. Each database connection must be specified, so it is possible to filter out unwanted databases simply by avoiding configuring replication for those databases.

Setting up replication for new databases is not (yet?) automatic, so additional configuration steps are required after CREATE DATABASE. A restart of the downstream master is also required. The upstream master only needs restarting if the max_logical_slots parameter is too low to allow a new replica to be added. Adding replication for databases that do not exist yet will cause an ERROR, as will dropping a database that is being replicated. Setup is discussed in more detail below.

Changes are processed by the downstream master using bdr plug-ins. This allows flexible handing of replication input, including:

  • BDR apply process - applies logical changes to the downstream master. The apply process makes changes directly rather than generating SQL text and then parse/plan/executing SQL.
  • Textual output plugin - a demo plugin that generates SQL text (but doesn't apply changes)
  • pg_xlogdump - examines physical WAL records and produces textual debugging output. This server program is included in PostgreSQL 9.3.

Replication of DML changes

All changes are replicated: INSERT, UPDATE, DELETE and TRUNCATE.

(TRUNCATE is not yet implemented, but will be implemented before the feature goes to final release).

Actions that generate WAL data but don't represent logical changes do not result in data transfer, e.g. full page writes, VACUUMs, hint bit setting. LLSR avoids much of the overhead from physical WAL, though it has overheads that mean that it doesn't always use less bandwidth than PLSR.

Locks taken by LOCK and SELECT ... FOR UPDATE/SHARE on the upstream master are not replicated to downstream masters. Locks taken automatically by INSERT, UPDATE, DELETE or TRUNCATE *are* taken on the downstream master and may delay replication apply or concurrent transactions - see the locking section below.

TEMPORARY and UNLOGGED tables are not replicated. In contrast to physical standby servers, downstream masters can use temporary and unlogged tables.

DELETE and UPDATE statements that affect multiple rows on upstream master will cause a series of row changes on downstream master. These are likely to go at same speed as on origin, as long as an index is defined on the Primary Key of the table on the downstream master. INSERT on upstream master do not require a unique constraint in order to replicate correctly. UPDATEs and DELETEs require some form of unique constraint, either PRIMARY KEY or UNIQUE NOT NULL. A warning is issued in the downstream master's logs if the expected constraint is absent.

UPDATEs that change the value of the Primary Key of a table will be replicated correctly.

The values applied are the final values from the UPDATE on the upstream master, including any modifications from before-row triggers, rules or functions. Any reflexive conditions, such as N = N+ 1 are resolved to their final value. Volatile or stable functions are evaluated on the master side and the resulting values are replicated. Consequently any function side-effects (writing files, network socket activity, updating internal PostgreSQL variables, etc) will not occur on the replicas as the functions are not run again on the replica.

All columns are replicated on each table. Large column values that would be placed in TOAST tables are replicated without problem, avoiding de-compression and re-compression. If we update a row but do not change a TOASTed column value, then that data is not sent downstream.

All data types are handled, not just the built-in datatypes of PostgreSQL core. The only requirement is that user-defined types are installed identically in both upstream and downstream master (see "Limitations").

The current LLSR plugin implementation uses the binary libpq protocol, so it requires that the upstream and downstream master use same CPU architecture and word-length, i.e. "identical servers", as with physical replication. A textual output option will be added later for passing data between non-identical servers, e.g. laptops or mobile devices communicating with a central server.

Changes are accumulated in memory (spilling to disk where required) and then sent to the downstream server at commit time. Aborted transactions are never sent. Application of changes on downstream master is currently single-threaded, though this process is efficiently implemented. Parallel apply is a possible future feature, especially for changes made while holding AccessExclusiveLock.

Changes are applied to the downstream master in the sequence in which they were commited on the upstream master. This is a known-good serialization ordering of changes, so no replication failures are possible, as can happen with statement based replication (e.g. MySQL) or trigger based replication (e.g. Slony version 2.0). Users should note that this means the original order of locking of tables is not maintained. Although lock order is provably not an issue for the set of locks held on upstream master, additional locking on downstream side could cause lock waits or deadlocking in some cases. (Discussed in further detail later).

Larger transactions spill to disk on the upstream master once they reach a certain size. Currently, large transactions can cause increased latency. Future enhancement will be to stream changes to downstream master once they fill the upstream memory buffer, though this is likely to be implemented in 9.5.

SET statements and parameter settings are not replicated. This has no effect on replication since we only replicate actual changes, not anything at SQL statement level. We always update the correct tables, whatever the setting of search_path. Values are replicated correctly irrespective of the values of bytea_output, TimeZone, DateStyle, etc.

NOTIFY is not supported across log based replication, either physical or logical. NOTIFY and LISTEN will work fine on the upstream master but an upstream NOTIFY will not trigger a downstream LISTENer.

In some cases, additional deadlocks can occur on apply. This causes an automatic retry of the apply of the replaying transaction and is only an issue if the deadlock recurs repeatedly, delaying replication.

From a performance and concurrency perspective the BDR apply process is similar to a normal backend. Frequent conflicts with locks from other transactions when replaying changes can slow things down and thus increase replication delay, so reducing the frequency of such conflicts can be a good way to speed things up. Any lock held by another transaction on the downstream master - LOCK statements, SELECT ... FOR UPDATE/FOR SHARE, or INSERT/UPDATE/DELETE row locks - can delay replication if the replication apply process needs to change the locked table/row.

Table definitions and DDL replication

DML changes are replicated between tables with matching "Schemaname"."Tablename" on both upstream and downstream masters. e.g. changes from upstream's public.mytable will go to downstream's public.mytable while changes to the upstream mychema.mytable will go to the downstream myschema.mytable. This works even when no schema is specified on the original SQL since we identify the changed table from its internal OIDs in WAL records and then map that to whatever internal identifier is used on the downstream node.

This requires careful synchronization of table definitions on each node otherwise ERRORs will be generated by the replication apply process. In general, tables must be an exact match between upstream and downstream masters.

There are no plans to implement working replication between dissimilar table definitions.

Tables must meet the following requirements to be compatible for purposes of LLSR:

  • The downstream master must only have constraints (CHECK, UNIQUE, EXCLUSION, FOREIGN KEY, etc) that are also present on the upstream master. Replication may initially work with mismatched constraints but is likely to fail as soon as the downstream master rejects a row the upstream master accepted.
  • The table referenced by a FOREIGN KEY on a downstream master must have all the keys present in the upstream master version of the same table.
  • Storage parameters must match except for as allowed below
  • Inheritance must be the same
  • Dropped columns on master must be present on replicas
  • Custom types and enum definitions must match exactly
  • Composite types and enums must have the same oids on master and replication target
  • Extensions defining types used in replicated tables must be of the same version or fully SQL-level compatible and the oids of the types they define must match.

The following differences are permissible between tables on different nodes:

  • The table's pg_class oid, the oid of its associated TOAST table, and the oid of the table's rowtype in pg_type may differ;
  • Extra or missing non-UNIQUE indexes
  • Extra keys in downstream lookup tables for FOREIGN KEY references that are not present on the upstream master
  • The table-level storage parameters for fillfactor and autovacuum
  • Triggers and rules may differ (they are not executed by replication apply)

Replication of DDL changes between nodes will be possible using event triggers, but is not yet integrated with LLSR (see LLSR Limitations).

Triggers and Rules are NOT executed by apply on downstream side, equivalent to an enforced setting of session_replication_role = origin.

In future it is expected that composite types and enums with non-identical oids will be converted using text output and input functions. This feature is not yet implemented.

Parameter Reference

  • bdr.connections - list of nodes that this server will connect to. For each name listed here there must be one bdr.<nodename>.dsn entry
  • bdr.<nodename>.dsn - "data source name" - connection info for connecting to upstream master. Security for LLSR is identical to physical log streaming replication
  • max_logical_slots - LLSR uses persistent slots in memory which are reserved for each node at server start
  • wal_level - allows a new setting of 'logical' which produces mildly enhanced WAL contents to allow decoding of the WAL back into a logical change stream.

LLSR limitations

Table definitions, types, extensions, etc must be near identical between upstream and downstream masters. See Table definitions and DDL replication.

DDL replication is not yet supported.

No feedback from downstream masters to the upstream master is implemented for asynchronous LLSR, so upstream masters must be configured to keep enough WAL. See Configuration.

Initial setup

To set up BDR you first need a patched PostgreSQL that can support LR/BDR, then you need to create one or more LR/BDR senders and one or more LR/BDR receivers.

Installing the patched PostgreSQL binaries

Currently BDR is only available in builds of the 'bdr' branch on Andres Freund's git repo on PostgreSQL 9.2 and below do not support BDR, and 9.3 requires patches, so this guide will not work for you if you are trying to use a normal install of PostgreSQL.

First you need to clone, configure, compile and install like normal. Clone the sources from git:// and checkout the bdr branch.

If you have an existing local PostgreSQL git tree specify it as --reference /path/to/existing/tree to greatly speed your git clone.


mkdir -p $HOME/bdr
cd bdr
git clone git:// $HOME/bdr/postgres-bdr-src
cd postgres-bdr-src
./configure --prefix=$HOME/bdr/postgres-bdr-bin
make install
cd contrib/bdr
make install

This will put everything in $HOME/bdr, with the source code and build tree in $HOME/bdr/postgres-bdr-src and the installed PostgreSQL in $HOME/bdr/postgres-bdr-bin. This is a convenient setup for testing and development because it doesn't require you to set up new users, wrangle permissions, run anything as root, etc, but it isn't recommended that you deploy this way in production.

To actually use these new binaries you will need to:

export PATH=$HOME/bdr/postgres-bdr-bin/bin:$PATH

before running initdb, postgres, etc. You don't have to use the psql or libpq you compiled but you're likely to get version mismatch warnings if you don't.


The configuration for a simple single-master to single-replica configuration looks like:

Upstream (sender):

 wal_level = 'logical'       # Include enough info for logical replication
 max_logical_slots = X       # Number of LLSR senders + any receivers
 max_wal_senders = Y         # Y = max_logical_slots plus any physical 
                             # streaming requirements
 wal_keep_segments = 5000    # Master must retain enough WAL segments to let 
                             # replicas catch up. Correct value depends on
                             # rate of writes on master, max replica downtime
                             # allowable. 5000 segments requires 78GB
                             # in pg_xlog

Downstream (receiver):

 shared_preload_libraries = 'bdr'
 bdr.connections="name_of_upstream_master"      # list of upstream master nodenames
 bdr.<nodename>.dsn = 'dbname=postgres'         # connection string for connection
                                                # from downstream to upstream master
 bdr.<nodename>.local_dbname = 'xxx'            # optional parameter to cover the case 
                                                # where the databasename on upstream 
                                                # and downstream master differ. 
                                                # (Not yet implemented)
 bdr.<nodename>.apply_delay                     # optional parameter to delay apply of
                                                # transactions, time in milliseconds 
 bdr.synchronous_commit = ...;                  # optional parameter to set the
                                                # synchronous_commit parameter the
                                                # apply processes will be using
 max_logical_slots = X                          # set to the number of remotes

wal_keep_segments should be set to a value that allows for some downtime of server/network and for heavy bursts of write activity on the master. Keep in mind that enough disk space must be available for the WAL segments, each of which is 16MB. See "Insufficient WAL segments retained".

Note that a server can be both sender and receiver, either two servers to each other or more complex configurations like replication chains/trees.

For more details on these parameters, see Parameter Reference.


Could not access file "bdr": No such file or directory

If you see the error:

FATAL:  could not access file "bdr": No such file or directory

when starting a database set up to receive BDR replication, you probably forgot to install contrib/bdr. See above.

Invalid value for parameter

An error like:

LOG:  invalid value for parameter ...

when setting one of these parameters means your server doesn't support logical replication and will need to be patched or updated.

Insufficient WAL segments retained ("requested WAL segment ... has already been removed")

If wal_keep_segments is insufficient to meet the requirements of a replica that has fallen far behind, the master will report errors like:

ERROR:  requested WAL segment 00000001000000010000002D has already been removed

Currently the replica errors look like:

WARNING:  Starting logical replication
LOG:  data stream ended
LOG:  worker process: master (PID 23812) exited with exit code 0
LOG:  starting background worker process "master"
LOG:  master initialized on master, remote dbname=master port=5434 replication=true fallback_application_name=bdr
LOG:  local sysid 5873181566046043070, remote: 5873181102189050714
LOG:  found valid replication identifier 1
LOG:  starting up replication at 1 from 1/2D9CA220

but a more explicit error message for this condition is planned.

The only way to recover from this fault is to re-seed the replica database.

This fault could be prevented with feedback from the replica to the master, but this feature is not planned for the first release of BDR. Another alternative considered for future releases is making wal_keep_segments a dynamic parameter that is sized on demand.

Monitoring of maximum replica lag and appropriate adjustment of wal_keep_segments will prevent this fault from arising.

Couldn't find logical slot

An error like:

ERROR:  couldn't find logical slot "bdr: 16384:5873181566046043070-1-24596:"

on the upstream master suggests that a downstream master is trying to connect to a logical replication slot that no longer exists. The slot can not be re-created, so it is necessary to re-seed the downstream replica database.

Operational Issues and Debugging

In LLSR there are no user-level (ie SQL visible) ERRORs that have special meaning. Any ERRORs generated are likely to be serious problems of some kind, apart from apply deadlocks, which are automatically re-tried.


The following views are available for monitoring replication activity:

  • pg_stat_replication
  • pg_stat_logical_replication


The pg_stat_replication view applies to both physical and logical replication. It is covered by the PostgreSQL user manual.


The new pg_stat_logical_replication view is specific to logical replication. It is based on the underlying pg_stat_get_logical_replication_slots function and has the following structure:

 View "pg_catalog.pg_stat_logical_replication"
          Column          |  Type   | Modifiers 
 slot_name                | text    | 
 plugin                   | text    | 
 database                 | oid     | 
 active                   | boolean | 
 xmin                     | xid     | 
 last_required_checkpoint | text    | 

It contains one row for every connection from a downstream master to the server being queried (the upstream master).

  • slot_name: An internal name for a given logical replication slot (a connection from a downstream master to this upstream master). This slot name is used by the downstream master to uniquely identify its self and is used with the pg_receivellog command when managing logical replication slots.
  • plugin: The logical replication plugin being used to decode WAL archives. You'll generally only see bdr_output here.
  • database: The oid of the database being replicated by this slot. You can get the database name by joining on pg_database.oid.
  • active: Whether this slot currently has an active connection.
  • xmin: The lowest transaction ID this replication slot can "see".
  • last_required_checkpoint: The checkpoint identifying the oldest WAL record required to bring this slot up to date with the upstream master.

Table and index usage statistics

Statistics on table and index usage are updated normally by the downstream master. This is essential for correct function of auto-vacuum. If there are no local writes on the downstream master and stats have not been reset these two views should show matching results between upstream and downstream:

  • pg_stat_user_tables
  • pg_statio_user_tables

Since indexes are used to apply changes, the identifying indexes on downstream side may appear more heavily used with workloads that perform UPDATEs and DELETEs than non-identifying indexes are.

The built-in index monitoring views are:

  • pg_stat_user_indexes
  • pg_statio_user_indexes

All these views are discussed in the PostgreSQL documentation on the statistics views.

Starting, stopping and managing replication

TODO: Extension to improve this?

Starting a new LLSR connection

Logical replication is started automatically when a database is configured as a downstream master in postgresql.conf (see Configuration) and the postmaster is started. No explicit action is required to start replication, but replication will not actually work unless the upstream and downstream databases are identical within the requirements set by LLSR in the |Table definitions and DDL replication section.

Viewing logical replication slots

Examining the state of logical replication is discussed in Monitoring.

Temporarily stopping an LLSR replica

LLSR replicas can be temporarily stopped by shutting down the downstream master's postmaster.

If the replica is not started back up before the upstream master discards the oldest WAL segment required for the downstream master to resume replay, as identified by the last_required_checkpoint column of pg_catalog.pg_stat_logical_replication then the replica will not resume replay. The error Insufficient WAL segments retained will be reported in the upstream master's logs. The replica must be re-seeded for replication to continue.

TODO: Discuss any SQL-level, per-database functions for managing replication.

Removing an LLSR replica permanently

To remove a replication connection permanently, remove its entries from the downstream master's postgresql.conf then use pg_receivellog to remove the replication slot on the upstream master.

TODO pending merge of downstream control functions.

Cleaning up abandoned replication slots

To remove a replication slot that was used for a now-defunct replica, find its slot name from the pg_stat_logical_replication view on the upstream master then run:

pg_receivellog -p 5434 -h master-hostname -d dbname \
   --slot='bdr: 16384:5873181566046043070-1-16384:' --stop

where the argument to '--slot' is the slot name you found from the view.

Bi-Directional Replication

Bi-Directional replication is built directly on LLSR by configuring two or more servers as both upstream 'and' downstream masters of each other.

Bi-Directional Replication Use Cases

Bi-Directional Replication is designed to allow a very wide range of server connection topologies. The simplest to understand would be two servers each sending their changes to the other, which would be produced by making each server the downstream master of the other and so using two connections for each database.

Logical and physical streaming replication are designed to work side-by-side. This means that a master can be replicating using physical streaming replication to a local standby server, while at the same time replicating logical changes to a remote downstream master. Logical replication works alongside cascading replication also, so a physical standby can feed changes to a downstream master, allowing upstream master sending to physical standby sending to downstream master.

Simple multi-master pair

A simple mulit-master "HA Cluster" with two servers "Alpha" and "Bravo". Each server is the other's upstream and downstream master for the same database "dbtoreplicate".



wal_level = 'logical'
max_logical_slots = 3
max_wal_senders = 4
wal_keep_segments = 5000
shared_preload_libraries = 'bdr'
bdr.bravo.dsn = 'dbname=dbtoreplicate'


wal_level = 'logical'
max_logical_slots = 3
max_wal_senders = 4
wal_keep_segments = 5000
shared_preload_libraries = 'bdr'
bdr.alpha.dsn = 'dbname=dbtoreplicate'

See Configuration for an explanation of these parameters.

HA and Logical Standby

Downstream masters allow users to create temporary tables, so they can be used as reporting servers.

  • "HA Cluster"
    • Server "Alpha" - Current Master
    • Server "Bravo" - Physical Standby - unused, apart from as failover target for Alpha - potentially specified in synchronous_standby_names
    • Server "Charlie" - "Logical Standby" - downstream master

Very High Availability Multi-Master

A typical configuration for remote multi-master would then be:

  • Site 1
    • Server "Alpha" - Master - feeds changes to Bravo using physical streaming with sync replication
    • Server "Bravo" - Physical Standby - feeds changes to Charlie using logical streaming
  • Site 2
    • Server "Charlie" - Master - feeds changes to Delta using physical streaming with sync replication
    • Server "Delta" - Physical Standby - feeds changes to Alpha using logical streaming

Bandwidth between Site 1 and Site 2 is minimised

3-remote site simple Multi-Master Plex

BDR supports "all to all" connections, so the latency for any change being applied on other masters is minimised. (Note that early designs of multi-master were arranged for circular replication, which has latency issues with larger numbers of nodes)

  • Site 1
    • Server "Alpha" - Master - feeds changes to Charlie, Echo using logical streaming
  • Site 2
    • Server "Charlie" - Master - feeds changes to Alpha, Echo using logical streaming replication
  • Site 3
    • Server "Echo" - Master - feeds changes to Alpha, Charlie using logical streaming replication

Using node names that match port numbers, for clarity

  • config for 5440:
    • port = 5440
    • bdr.connections='node_5441,node_5442'
    • bdr.node_5441.dsn='port=5441 dbname=postgres'
    • bdr.node_5442.dsn='port=5442 dbname=postgres'
  • config for 5441:
    • port = 5441
    • bdr.connections='node_5440,node_5442'
    • bdr.node_5440.dsn='port=5440 dbname=postgres'
    • bdr.node_5442.dsn='port=5442 dbname=postgres'
  • config for 5442:
    • port = 5442
    • bdr.connections='node_5440,node_5441'
    • bdr.node_5440.dsn='port=5440 dbname=postgres'
    • bdr.node_5441.dsn='port=5441 dbname=postgres'

3-remote site simple Multi-Master Circular Replication

Simpler config uses "circular replication". This is simpler but results in higher latency for changes as the number of nodes increases.

  • Site 1
    • Server "Alpha" - Master - feeds changes to Charlie using logical streaming replication
  • Site 2
    • Server "Charlie" - Master - feeds changes to Echo using logical streaming replication
  • Site 3
    • Server "Echo" - Master - feeds changes to Alpha using logical streaming replication

Using node names that match port numbers, for clarity

  • config for 5440:
    • port = 5440
    • bdr.connections='node_5441'
    • bdr.node_5441.dsn='port=5441 dbname=postgres'
  • config for 5441:
    • port = 5441
    • bdr.connections='node_5442'
    • bdr.node_5442.dsn='port=5442 dbname=postgres'
  • config for 5442:
    • port = 5442
    • bdr.connections='node_5440'
    • bdr.node_5440.dsn='port=5440 dbname=postgres'

Regrettably this doesn't actually work yet because we don't cascade logical changes (yet).

3-remote site Max Availability Multi-Master Plex

  • Site 1
    • Server "Alpha" - Master - feeds changes to Bravo using physical streaming with sync replication
    • Server "Bravo" - Physical Standby - feeds changes to Charlie, Echo using logical streaming
  • Site 2
    • Server "Charlie" - Master - feeds changes to Delta using physical streaming with sync replication
    • Server "Delta" - Physical Standby - feeds changes to Alpha, Echo using logical streaming
  • Site 3
    • Server "Echo" - Master - feeds changes to Foxtrot using physical streaming with sync replication
    • Server "Foxtrot" - Physical Standby - feeds changes to Alpha, Charlie using logical streaming

Bandwidth and latency between sites is minimised.

Config left as an exercise for the reader.

N-site symmetric cluster replication

Symmetric cluster is where all masters are connected to each other.

N=19 has been tested and works fine.

N masters requires N-1 connections to other masters, so practical limits are <100 servers, or less if you have many separate databases.

The amount of work caused by each change is O(N), so there is a much lower practical limit based upon resource limits. A future option to limit to filter rows/tables for replication becomes essential with larger or more heavily updated databases, which is planned.

Complex/Assymetric Replication

Variety of options are possible.

Conflict Avoidance

Distributed Locking

Some clustering systems use distributed lock mechanisms to prevent concurrent access to data. These can perform reasonably when servers are very close but cannot support geographically distributed applications. Distributed locking is essentially a pessimistic approach, whereas BDR advocates an optimistic approach: avoid conflicts where possible though allow some types of conflict to occur and then resolve them when that happens.

Global Sequences

Many applications require unique values be assigned to database entries. Some applications use GUIDs generated by external programs, some use database-supplied values. This is important with optimistic conflict resolution schemes because uniqueness violations are "divergent errors" and are not easily resolvable.

SQL Standard requires Sequence objects which provide unique values, though these are isolated to a single node. These can then used to supply default values using DEFAULT nextval('mysequence'), as with the SERIAL datatype.

BDR requires Sequences to work together across multiple nodes. This is implemented as a new SequenceAccessMethod API (SeqAM), which allows plugins that provide get/set functions for sequences. Global Sequences are then implemented as a plugin which implements the SeqAM API and communicates across nodes to allow new ranges of values to be stored for each sequence.

Conflict Detection & Resolution

Lock Conflicts

Changes from the upstream master are applied on the downstream master by a single apply process. That process needs to RowExclusiveLock on the changing table and be able to write lock the changing tuple(s). Concurrent activity will prevent those changes from being immediately applied because of lock waits. Use the log_lock_waits facility to look for issues there.

By concurrent activity on a row, we include

  • explicit row level locking
  • locking from foreign keys
  • implicit locking because of row updates or deletes, from local activity or apply from other servers

Data Conflicts

Concurrent updates and deletes may also cause data-level conflicts to occur, which then require conflict resolution. It is important that these conflicts are resolved in an idempotent, similar manner so that all servers end with identical results.

Concurrent updates are resolved using last-update-wins strategy using timestamps. Should timestamps be identical, the tie is broken using system identifier from pg_control (currently).

Updates and Inserts may cause uniqueness violation errors because of primary keys, unique indexes and exclusion constraints when changes are applied at remote nodes. These are not easily resolvable and represent severe application errors that cause the database contents of multiple servers to diverge from each other. Hence these are known as "divergent conflicts". Currently, replication stops should a divergent conflict occur.

Updates which cannot locate a row are presumed to be Delete/Update conflicts. These are counted but the Update is discarded.

Conflicts are logged if we specify bdr.log_conflicts = on

All conflicts are resolved at row level. Concurrent updates that touch completely separate columns can result in "false conflicts", where there is conflict in terms of the data, just in terms of the row update. Such conflicts will result in just one of those changes being made, the other discarded according to last update wins.

Changing unlogged and logged tables in same transaction can result in apparently strange outcomes since the unlogged tables aren't replicated.


As an example, lets say we have two tables Activity and Customer. There is a Foreign Key from Activity to Customer, constraining us to only record activity rows that have a matching customer row.

  • We update a row on Customer table on NodeA. The change from NodeA is applied to NodeB just as we are inserting an activity on NodeB. The inserted activity causes a FK check....
Personal tools