Synchronous Transfer
Synchronous Transfer is responsible for maintaining file system level consistency between master and standby server. In Streaming Replication when crash happens, there is possibility that WAL files which are present on master may not reach to the standby server.So at the time of fail-over the fresh backup from new master needs to be taken onto new standby.
Taking fresh backup may take very long time for large databases. Such a long recovery time is not acceptable in many cases and could break the service line agreement of recovery.
By using synchronous_transfer parameter user can configure fail-back safe standby which can avoid such a situation.
- Here is discussion about this.
Usage
Users Overview
synchronous_transfer specifies whether master will wait for data page modifications AND/OR for transaction commit before corresponding WAL record is replicated to standby server.
The valid values to synchronous_transfer parameter are 'all', 'data_flush' and 'commit':
- commit
- This value specifies that transaction commit wait for the transaction's WAL records to be replicated on the standby server. This setting is same as setting synchronous_commit to 'on'.
- Set synchronous_transfer to this value if there is no need to guard against the WAL inconsistency.
- Standby server will not behave as fail-back safe standby.
- data_flush
- This value specifies that master wait for data page modifications before corresponding WAL record is replicated to standby server but transaction commit will not wait for transaction's WAL records to be replicated on the standby server.
- Standby server will not behave as Asynchronous fail-back safe standby.
- all
- This value specifies that master wait for data page modifications as well as transaction commit before corresponding WAL record is replicated to standby server.
- Standby server will not behave as Synchronous fail-back safe standby.
Types of Fail-back safe standby
With the synchronous_transfer parameter now two new types of standby setups are possible: Asynchronous fail-back safe standby and Synchronous fail-back safe standby. Hence 4 different ways of configuring Standby servers are possible.
- Synchronous fail-back safe standby
- synchronous_standby_names = <server name>
- synchronous_transfer = all
- Asynchronous fail-back safe standby
- synchronous_standby_names = <server name>
- synchronous_transfer = data_flush
- Pure synchronous standby
- synchronous_standby_names = <server name>
- synchronous_transfer = commit
- Pure asynchronous standby
- No setting required
How to Use
- 1. First step is to setup Streaming Replication,here is the link for that: Streaming Replication
- 2. Set synchronous_transfer parameter to 'all' on the master server
$ $EDITOER postgresql.conf synchronous_transfer = all
(Above setting will configure synchronous fail-back safe standby,to configure Asynchronous fail-back safe standby set synchronous_transfer parameter to 'data_flush')
that's it! configuration of fail-back safe standby is complete.
You can test the setup of fail-back safe standby with following steps:
- 3. Keep running transactions on master which generates lots of WAL and crash the master server.
#For example : Kill the postmaster process of the primary server which is 'AAA' server, whose pid is 16764. $ ps x | grep postgres 16764 pts/4 S 0:00 /home/postgres/pgsql/AAA/bin/postgres -D data 16768 ? Ss 0:00 postgres: checkpointer process 16769 ? Ss 0:00 postgres: writer process 16770 ? Ss 0:00 postgres: wal writer process 16771 ? Ss 0:00 postgres: autovacuum launcher process 16772 ? Ss 0:00 postgres: stats collector process 16780 pts/4 S 0:00 /home/postgres/pgsql/BBB/bin/postgres -D data 16783 ? Ss 0:03 postgres: startup process recovering 00000001000000000000000B 16784 ? Ss 0:00 postgres: checkpointer process 16785 ? Ss 0:00 postgres: writer process 16786 ? Ss 0:00 postgres: stats collector process 16787 ? Ss 0:00 postgres: wal receiver process streaming 0/BD4C080 16788 ? Ss 0:00 postgres: wal sender process postgres [local] streaming 0/BD4C080 $ kill -s SIGKILL 16764
- 4. Promote the standby server to the primary server
$ pg_ctl promote
- 5. Synchronizing the WAL file between primary and standby server using rsync command
$ rsync -r 192.168.1.100:/home/postgres/pgsql/BBB/pg_xlog/ /home/postgres/pgsql/AAA/pg_xlog
- 6. Create recovery.conf for the old primary server
$ $EDITOR recovery.conf recovery_target_timeline = 'latest' standby_mode = on primary_conninfo = 'application_name=AAA port=5432 user=postgres'
- 7. Start the old primary server as new standby server
# The new standby server try to fetch WAL file which from latest checkpoint location, from pg_xlog directory at first. # But we removed the all WAL file on the pg_xlog firectory of new standby server. # So the new standby server will request to fetch WAL file from the new primary server. $ pg_ctl start
Restrictions
- Currently only one fail-back safe standby with SYNC replication.
- for example: if there is 2 standby servers which connects to master server (one is SYNC, another one is ASYNC) and synchronous_transfer is set 'all'. Then master server will wait only for SYNC standby server.
Impact for Performance
Todo
- Currently this patch supports only one fail-back safe standby. It can either be synchronous or an asynchronous standby. More discussion is needed about support of multiple fail-safe standby's.