Atomic commit provides functionality to commit and rollback either all of foreign servers or nothing. This ensures that the database data is always left in a consistent state in terms of a federated database.

Discussion is here

To achieve an atomic commit of distributed transactions we employee two-phase commit protocol(2PC), a type of atomic commitment protocol

Distributed transaction involves, atomic commit, atomic visibility, and global consistency. 2PC is the only practical solution for atomic commit.

Usage

Users Overview

Coordinator node and participant node
- Coordinator node manages all foreign transactions on participant nodes.
- max_prepared_foreign_transactions GUC parameter controls the upper bound of the number of foreign transactions the coordinator server controls. Since one local transaction could access to multiple foreign servers, this parameter should be set,

(max_connetions) * (# of foreign servers that are capable of 2PC)

Foreign transaction launcher
- A background worker process launches foreign transaction resolver processes.
Foreign transaction resolver
- Background workers processes to resolve distributed transactions.
- One resolver process is responsible for resolving distributed transactions on one database.
  - This can be improved by having multiple resolvers on one database.
- The Maximum number of resolver processes is controlled by max_foreign_transaction_resolvers GUC parameter.
- Interval for retrying foreign transaction resolution is controlled by the foreign_transaction_resolution_interval GUC parameter.
- The timeout for resolver processes is controlled by foreign_transaction_resolver_timeout GUC parameter.
  - Resolver processes exit when there no unresolved foreign transactions on the database for foreign_transacton_resovler_timeout sec.
Enable and disable distributed atomic commit feature.
- Two-phase commit protocol is used when foreign_twophase_commit is enabled (i.g., required) and the transaction meets either following conditions:
  - The distributed transaction modifies more than two servers that are capable of 2PC, including the coordinator nodes.
  - The client executes PREPARE TRANSACTION command. In this case, 2PC is used unconditionally regardless of the number of writes.

Pseudo Code

RM: involved foreign transactions (excluding the transaction on the local node)

Procedure TransactionCommit()
    foreach RM
        lock RM;
        append a log of adding RM to WAL;
        prepare RM;
        if failed then call termination protocol;
    append a log of local commit to WAL;
    unlock all RMs;
    return;

Procedure TerminationProtocol()
    append a log of local abort to WAL;
    foreach RM
        rollback RM;
        unlock RM;
        append a log of removing RM to WAL;

Procedure Resolution()
    foreach prepared, unlocked RM
        get the status of the corresponding local transaction;
        if committed then
            commit prepared RM;
        else
            rollback prepared RM;
        append a log of removing RM to WAL;

The backend process who received SQL from the client calls TransactionCommit() and TerminationProtocol() whereas a resolver process calls Resolution().

New FDW Routines for Transaction Management

Four callback functions are newly added.

typedef void (*PrepareForeignTransaction_function) (FdwXactRslvState *frstate);
typedef void (*CommitForeignTransaction_function) (FdwXactRslvState *frstate);
typedef void (*RollbackForeignTransaction_function) (FdwXactRslvState *frstate);
typedef char *(*GetPrepareId_function) (TransactionId xid, Oid serverid,
                                        Oid userid, int *prep_id_len);

In terms of FDW developers, there are three possible choices:

1. Implement no transaction API
- Foreign transactions are not managed by the core.
2. Implement commit and rollback APIs
- The core does only commit and rollback foreign transactions. This type of FDW cannot participate in foreign two-phase commit.
3. Implement commit, rollback and prepare API
- This type of FDW supports foreign two-phase commit. The core does commit, rollback, and prepare foreign transactions.
- Optionally FDW can support GetPreapreId API for a case, for example, where an FDW has a limit of the length of global transaction id.

Synchronous and Asynchronous

In synchronous resolution, the backend keeps waiting for all foreign transactions to be committed/rolled back until the user sends an explicitly cancel request. This ensures the subsequent transaction can assume that all precedent distributed transactions are completed at the time when the client gets OK of commit. The backend can exit from the wait loop by canceling the transaction commit (e.g., pressing ctl-c or pg_cancel_backend()). Then, foreign transactions becomes in-doubt state and are processed by a resolver in background.

Asynchronously foreign transaction resolution can also be an option. The backend returns an acknowledgment to the client immediately after the local commit. This is known as the early acknowledgment technique. The downside is that the user who wants to see the result of the precedent transaction needs to make sure the precedent transaction is committed on all foreign servers.

Who is responsible for foreign transaction resolution?

There are two ideas regarding who is responsible for foreign transaction resolution (i.e., 2nd phase of 2pc) at transaction commit: the backend and the resolver.

1. The backend performs resolution.

This is a simple and straightforward idea. The backend process who received SQL from the client prepares foreign transactions and then commit them after the local commit.

2. The resolver performs resolution.

With this idea, the backend prepares foreign transactions and commit locally but ask the resolver process to commit/rollback prepared transactions. That is, different processes performs 1st phase and 2nd phase of 2PC.

Performance

In #2 idea, the resolver process could be a bottleneck if there are concurrent processes that want to use 2PC. Having resolver processes as much as the number of backend processes concurrently using 2PC is equivalent to #1 idea in terms of performance but #1 idea still might win because of no overhead of inter-process communication.

Error Handling

Foreign transaction resolution is performed after the local transaction commit. Therefore, even if FDW's commit routine raises an error during that it's too late, confusing the user because it gets an error in spite of having been committed the local transaction. #2 idea is designed to deal with this problem; having a resolver process perform 2nd phase of 2PC the process who receives SQL from the client is not affected by any error happening during resolution.

Query Cancellation

The 2nd phase of 2PC could take a long time because it involves network communication and disk access. What if the user wants to cancel the waiting for foreign transaction resolution? Given that some client (e.g., odbc_fdw?) library does not support asynchronous execution, with #1 idea, user cancellation doesn’t work in some FDWs. On the other hand with #2 idea, the user can safely cancel because the backend is just waiting. #2 wins in terms of this aspect but note that this enables query cancellation of only foreign transaction resolution but not all queries sent via FDW.

Conclution

It's not concluded yet.

How to Use

Configuration

In this section, we describe how to use this feature with one coordinator and two nodes. The step with [C] means operation on the coordinator node, and with [P] means operation on foreign servers (participant no\des).

1. [C] Set GUC parameter max_foreign_prepared_transactions

A transaction can involve multiple foreign servers and prepares on these, so max_foreign_prepared_transaction should be at least more than (max_connections) * (# of foreign servers that are capable of 2pc). To test, we set max_prepared_transactions more than 1.

$ $EDITOR postgresql.conf
max_connections = 100
max_prepared_foreign_transactions = 200 # max_connections = 100 and two foreign servers
max_foreign_transaction_resolvers = 1
foreign_twophase_commit = required
foreign_transaction_resolution_interval = 5s
froeign_transaction_resolver_timeout = 60s

2. [P] Set GUC parameter max_prepared_transactions

Also it would be easy to confirm the behavior of this feature, we set log_statement = all on all foreign servers.

$ $EDITOR postgresql.conf
max_prepared_transactions = 100 # same as max_connections of the coordinator server
log_statement = all # for testing
log_line_prefix = '<F1> ' # for fs2 server we can set '<F2> '

3. [C] Create postgres_fdw extension
4. [C] Create foreign servers with two_phase_commit parameter = on

=# CREATE SERVER fs1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'fs1', dbname 'postgres', port '5432');
CREATE SERVER
=# CREATE SERVER fs2 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'fs2', dbname 'postgres', port '5342');
CREATE SERVER
=# SELECT * FROM pg_foreign_server;
 oid  | srvname | srvowner | srvfdw | srvtype | srvversion | srvacl |              srvoptions
-------+---------+----------+--------+---------+------------+--------+--------------------------------------
16451 | fs1     |       10 |  16387 |         |            |        | {host=fs1,dbname=postgres,port=5432}
16452 | fs2     |       10 |  16387 |         |            |        | {host=fs2,dbname=postgres,port=5432}

(2 rows)

5. [C] Create user mapping

=# CREATE USER MAPPING FOR PUBLIC SERVER fs1;
CREATE USER MAPPING
=# CREATE USER MAPPING FOR PUBLIC SERVER fs2;
CREATE USER MAPPING

Example

After set up according to #How to Use section, we create two tables; ft1 on fs1 server and ft2 on fs2 server, also create corresponding foreign tables on coordinator node.

Example 1. Transaction commit using two-phase commit protocol

=# BEIGN;
=# INSERT INTO ft1 VALUES (1);
=# INSERT INTO ft2 VALUES (1);
=# COMMIT;

We will see the following server logs on fs1 server and fs2 server.

<FS1> LOG:  statement: START TRANSACTION ISOLATION LEVEL REPEATABLE READ
<FS1> LOG:  execute pgsql_fdw_prep_1: INSERT INTO public.s1(col) VALUES ($1)
<FS1> DETAIL:  parameters: $1 = '1'
<FS1> LOG:  statement: DEALLOCATE pgsql_fdw_prep_1
<FS2> LOG:  statement: START TRANSACTION ISOLATION LEVEL REPEATABLE READ
<FS2> LOG:  execute pgsql_fdw_prep_2: INSERT INTO public.s2(col) VALUES ($1)
<FS2> DETAIL:  parameters: $1 = '1'
<FS2> LOG:  statement: DEALLOCATE pgsql_fdw_prep_2
<FS1> LOG:  statement: PREPARE TRANSACTION 'fx_68464475_515_16400_10'
<FS2> LOG:  statement: PREPARE TRANSACTION 'fx_658736079_515_16410_10'
<FS1> LOG:  statement: COMMIT PREPARED 'fx_68464475_515_16400_10'
<FS2> LOG:  statement: COMMIT PREPARED 'fx_658736079_515_16410_10'

Example 2. A transaction modifies single node and then commit it in one-phase

If a transaction modifies only one foreign server, we don't need to use two-phase commit protocol.

=# BEIGN;
=# INSERT INTO ft1 VALUES (1);
=# COMMIT;

On fs1 server, we will see the following server logs.

<FS1> LOG:  statement: START TRANSACTION ISOLATION LEVEL REPEATABLE READ
<FS1> LOG:  execute pgsql_fdw_prep_3: INSERT INTO public.s1(col) VALUES ($1)
<FS1> DETAIL:  parameters: $1 = '1'
<FS1> LOG:  statement: DEALLOCATE pgsql_fdw_prep_3
<FS1> LOG:  statement: COMMIT TRANSACTION

FAQ

When is 2pc used?
- Basically, when the distributed transaction modified more than one server including the coordinator node.
- But when the client executes PREPARE TRANSACTION command, 2pc is used unconditionally.
Concurrent readers can see an inconsistent result, for example, when a reader starts a new foreign transactions on two foreign servers after the writer committed the prepared foreign transaction only on the one of the foreign server?
- Yes. This feature ensures only atomic commit, but not atomic visibility. To support the globally consistent results, other mechanisms such as providing a globally consistent snapshot will be required.
What if an involved server crashed before the local commit?
- The distributed transaction changes over rollback. It sends ROLLBACK to non-prepared foreign transactions and sends ROLLBACK PREPARED to already-prepared foreign transactions. For foreign transactions being prepared, it sends both because we are not sure the preparation has been completed. Therefore, FDWs have to tolerate ERRCODE_UNDEFINED_OBJECT error.
What if an involved server crashed after the local commit?
- The fate of distributed transaction doesn't change. That is, the transaction resolver process will keep trying to commit foreign transactions.
What if an involved server crashed and never be restored?
- pg_remove_foreign_xact() can be used to remove foreign transaction entry without resolution.
The transaction resolver running on the database prevent DROP DATABASE. How can we stop the resolver process?
- pg_stop_fdwxact_resolver() can stop the specified resolver process.
Can we disable automatic in-doubt transaction resolution?
- No, as of now.

Atomic Commit of Distributed Transactions

Contents