MM demo pgcon2016

From PostgreSQL wiki
Jump to navigationJump to search

What is Multimaster? (MM)

MM = BDR + DTM bidirectional replication + Distributed Transaction Manager

Currently Postgres has BDR, but no global consistency.

Without DTM: Can balance read-only queries, but write queries must go to master.

Can logically split out queries that update database and send to master.

MM solves this. All nodes symmetric, any client can run any query against any node.

Multimaster is based on logical replication. Query executes as normally. Query hit XLOG. XLOG serialized and replicated to all other nodes. Other nodes have a logical receiver and apply the changes at the local node. If we apply a transaction at a secondary node, this will be slow, and have the potential for distributed deadlock. To avoid infinite loop, we must distinguish between transactions which originated at this node, vs transactions of foreign origin.

Logical replication slots - see slides for structure

How to preserve consistency when a node crashes?

If all nodes have to preserve outstanding transactions until the node comes back up, then all nodes have to preserve states and ordering of transactions. Instead - choose which node (currently chosen randomly) and perform recovery from a single node.

HA topology is maintained via RAFT protocol. We build a global graph of node availability. Identify all critical nodes, and exclude (something about nodes that aren’t part of well-connected node subset).

Performance comparison of MM using a custom test which allows specification of % of updates. MM with 3 nodes is about 2X standalone Postgres when 0% updates in test set. Updates must be written at all nodes, so performance will be the same as standalone Postgres.

This means MM is best for applications with high ratio of reads to writes.

DTM is based on commit serial numbers, and mapping between CSN’s and SIDs adds overhead. So 1-node MM case is significantly slower for reads than standalone Postgres.

Comparison of Galera vs PostgreSQL MM running sysbench: (graph)

Live Demonstration

On Github, postgres_xtm_docker repository is available. It creates three containers, with a master and two shards. Example is to show DTM working with two nodes doing FDW. Without DTM, you will have problems with transaction isolation. (can be enabled or disabled in the docker_compose.yaml file)

Running xtmbench - with XTM disabled, will show the cluster inconsistencies as the test runs.

Q&A:

What work needs to be done still?

Failure detection (quite hard) - Suggestion: Jepsen for net-split fault injection?

Josh Berkus had another question, but I couldn't follow the question or the answer.

Question about overhead of CSN<->SID mapping. Implementation is currently single-threaded, could be improved, but focus is on correct operation, not performance, at this stage.

Possible to have streaming replication?

Yes, out of the code DTM space, you can stream replicate to nodes that aren't part of the MM set.

Question: Can you promote slaves to masters?

You can insert new nodes by switching node from being a replica to be part of the MM set.


Some slides from pg_pathman lightning talk were discussed re: better filter condition processing, which simplifies the plans and execution. with/without RuntimeAppendnode Without - Join forms nested scan which will examine all partitions. With - scan may touch only one partition.

Currently in Beta, not production ready. https://github.com/postgrespro/pg_pathman http://akorotkov.github.io/blog/categories/pg-pathman/

Example configurations for different clustering applications: https://github.com/postgrespro/postgres_cluster