Distributed deadlock detection
Distributed deadlock detection
Purpose of this project
This article desribes how to extend PostgreSQL's locks system to represent wait-for-graph including transactions running at remote database server by many means, for example, FDW, libpq and dblink.
This works to detect deadloc caused by remote transactions, by wait-for-graph cycle formed with chain of remote transactins.
Producing globak deadlock with FDW
We can produce global deadlock very easily with existing FDW.
Suppose we have two servers: server1 and server2. Both server has table t1 (c int)
.
From server1, server2's table t
is seen as svr2_t
and from server2, server1's table t
is seen as svr1_t
.
Then we can produce global deadlock as follows:
server1:
BEGIN;
LOCK TABLE t IN ACCESS EXCLUSIVE MODE;
server2:
BEGIN;
LOCK TABLE t in ACCESS EXCLUSIVE MODE;
server1:
SELECT * FROM svr2_t;
server2:
SELECT * FROM svr1_t;
How to represent wait-for-graph including remote transactions
PostgreSQL itself represents object-level lock with various internal functions mainly in lock.c
. This is used to chase wait-for-graph when a transaction fails to acquire a lock within the timeframe defined by
deadlock_timeout
GUC parameter, implemented in deadlock.c
.
We can extended this to represent status of transactions waiting for completion of remote transactions.
External Lock
Here, we define new locktag type, LOCKTAG_EXTERNAL
in lock.h
, to represent that the transaction acquing this lock is waiting for a remote transaction. Because this is hold only by the transaction (upstream transaction) waiting for a remote transaction (downstream transaction) invoked by this upstream transaction, no other transactions in the database where upstream transaction is running do not care about this lock. This lock is acquired in exclusive mode only and is held only by the upstream transaction (and, in a way, upstream transaction is waiting for this lock to complete, too).
Locktag information is similar to other lock type, with reference information to additional properties as described below. For this purpose, applications which invokes remote transaction should acquire External Lock by calling ExternalLockAcquire()
. This is a wrapper to LockAcquireExtended()
and sets up locktag for the External Lock.
External Lock property
Because we are using this lock to track wait-for-graph, we need another property information to connect to the remote database where downstream transaction is running and trace wait-for-graph.
This is the connection string for the database and backend id of the downstream transaction.
In lock.c
new function ExternalLockSetProperties()
. For the help to track the wait for graph precisely and to deal with change in the remote transaction status, this function requires connection string, remote transaction pgprocno, pid and xid.
Deadlock Detection
Deadlock detection mechanism uses existing deadlock detection code in deadlock.c
with an extension.
When deadlock detection (DeadLockCheck()
) is called, it begins to trace wait-for-graph.
In the checking, when DeadLockcheck()
finds External Lock in waiting lock of PGPROC
, it begins to trace the remote transaction represented by this External Lock, to build global wait-for-graph.
This is repeated until it finds the global wait-for-graph terminates or it goes back to the original upstream transaction forming a cycle.
Dedicated functions are added to perform this check.
LWLocks during external lock trace
In local wait-for-graph tracing, all LWLocks are acquired by deadlock checking functions to simplify the tracing code.
In global wait-for-graph tracing, we acquire all LWLocks during local wait-for-graph trace.
When it goes out to check further wait-for-graph in the remote database, such LWLocks are all released so that other transactions can continue to run during time consuming remote wait-for-graph tracing.
When a cycle is found, then all the databases ivolved in the global wait-for-graph cycle will check that their local portion of wait-for-graph is stable.
If not, it means that at leas one transaction involved in this wait-for-graph is running and this is not a deadlock.
If it is stable, then we determine this is a deadlock.
During the check of stableness of the local wait-for-graph, we again acquire all the LWLocks locally.
What applications to do
Applications (or extensions, whatsoever), should call two functions before they invoke a remote transaction.
ExternalLockAcquire()
ExternalLockSetProperties()
Applications do not need to release External Locks to follow two-phase locking protocol.
External Locks will be released at the end of transactions as a part of cleanup process.
Because global lock check will be done in background, applications do not need to care about this at all.
Applications should only acquire the lock and provide information to trace the wait-for-graph.
External Lock Properties
Because current Lock struction is too small to acquire the lock property described above, we need extra space to hold them.
In the current implementation, we hold this in the files at $PGDATA/pg_external_locks
.
File name is based on the values in the locktag.
For simple implementation, External Lock properties are written in plain text and this may need more improvement for security.
External Lock properties can be stored in other place, such as dynamic shared memory.
Current Status
The code is now running with PG 14.
You can freely clone the repo from https://github.com/koichi-szk/postgres.git
.
Please checkout the branch koichi/global_deadlock_detection_14_0
.
Because we don't have actual workload to test this feature, I have separate git repo containing the test environment and several useful functions for the test.
You can visit the repo https://github.com/koichi-szk/gdd_test.git
.
Please checkout the branch PG14_GDD
.
Please also note that this repo depends on my local environment configuration and you need to arrange the environment for your own.
Future work
Because there are no actual workload around PG which causes global deadlock, I will continue to port this code to further releases to be ready to be added into PG itself when this feature is really needed.