PGECons CR Scaleout 2021
This is the summary of a research of scale-out solutions at 2021 and an interview to a few IT companies in Japan. They were conducted by the community relation team of PostgreSQL Enterprise Consortium. (PGECons is a non profit organization comprised of companies in Japan to promote PostgreSQL (https://www.pgecons.org)).
Scale-out Solutions
There are two types of scale-out solutions; one is for just read load-balancing, the other is for read/write load-balancing. The read load-balancing is based on replication and load-balancer. Example includes using Pgpool-II and streaming replication, or AWS Aurora's reader endpoint. Read/Write load-balancing uses sharding and replication among multiple nodes, which includes using FDW and partitioned table, Postgres-XL, or Citus.
Read load-balancing
The major method is using Streaming replication + load balancer such as Pgpool-II. Synchronous replication (remote_apply) can grant data consistency between nodes. Also, Pgpool-II has a feature for distributing read and write queries to the primary and replicas respectively. There are other alternatives for load-balancer such as HAproxy or JDBC.
Other than using streaming replication, you can use Pgpool-II's native replication mode to implement both replication and load-balancing, which is a query based replication and sends write query to all nodes. Also, Pgpool-II's snapshot isolation mode (SI-mode) is similar to native replication mode, but it provides global snapshot isolation. However, SI-mode can be used only with REPEATABLE READ isolation mode.
AWS Aurora's endpoint also provide load-balancing. However, for distributing read and write queries, you need other solution like Pgpool-II.
Read/Write load-balancing
FDW + partitioned table
As the built-in sharding in PostgreSQL, the approach that uses partitioning and foreign data wrapper (postgres_fdw) has been proposed. Features has been continually improved, and join, sort, aggregates can be pushed down to foreign servers. PostgreSQL 14 newly supports async execution of FDW.
However, distributed transaction is not currently supported. The atomic commit involving multiple foreign servers are proposed but not adopted yet. As to this, it was suggested to implement this initially on postgres_fdw not on the core. Also, this approach doesn't provide a global snapshot.
Postres-XL
Postgre-XL is a distributed database based on PostgreSQL. It has multiple coordinator nodes which receive queries and distribute them to multiple data nodes. It allows for write scalability across multiple nodes. Also, it allows for read scalability by replicating static data.
Postgres-XL supports distributed transactions. It uses two phase commit (2PC), and the global transaction manager (GTM) provides global snapshot.
However, the development is now inactive and the last release was in 2018 which is compatible with PostgreSQL 10.
Citus
Citus is an extension that makes PostgreSQl distributed database by providing the sharding and replication feature. It is an extension, so you can use with the latest version of PostgreSQL.
Citus has a coordinator and more than 0 worker nodes. Each worker also can work as a coordinator. Citus supports distributed transaction by using 2PC, but it does not provide global snapshot.
The development is active and a major update is done about four in a year pace. Since it has developed as a successor to pg_shard, and has continently improved. Citus has the distributed query planner which uses different query planning strategies for a wide range of workloads. Specifically, it was reported that Citus showed good performance on four different workload: Multi-tenant (TPC-C), Real-time analytics (GitHub archive data), High-performance CRUD (YCSB), and Data warehouse (TPC-H).[1]
Fully-managed Citus cluster is provided under the name of Hyperscale in Azure database for PostgreSQL.
Others
Greenplum
Greenplum is a massively parallel processing (MPP) database. It has a coordinator and multiple segment hosts on which data are sharded or replicated. It supports distributed transaction. The coordinator works as the transaction manager and provides 2PC and global snapshot.
Greenplum database is known that is is good at OLAP and limited ability to process OLTP workloads, but recently it has improved aiming to be a hybrid system to serve both OLTP and OLAP workloads. Specifically, global deadlock detection was implemented, 2PC was optimized, and a resource groups model was introduced to isolate the resource among different kinds of workloads. [2][3]
However, the latest Greenplum is based on PostgreSQL 9.4, so there are a few PostgreSQL features that are not supported.
TimescaleDB
TimescaleDB is an extension to PostgreSQL that allows you to more easily and quickly work with time-series data. Time-series tables are automatically and continuously "chunked" into smaller intervals. This make data and indexes for the latest chunks naturally remain in memory, ensuring fast inserts and performant queries to recent data.
TimescaleDB 2.0 supports horizontally scaling across many servers, which distributes data among multiple nodes.
Summary
Read load-balancing is based on replication and load-balancer. A load-balancer is required because PostgreSQL doesn't have load-balancing feature. Also, for distributing read and write queries to the primary and replicas respectively, you need other solution such as Pgpool-II.
Read/Write load-balancing uses sharding and replication among multiple nodes. The discussion and development on the FDW based built-in sharding in PostgreSQL has been progressed little by little, but is still lacking features. With regards to extensions, Citus is actively developed and can handling multiple workloads.The recent TimescaleDB also started to provide sharding feature. As to distributed database systems forked from PostgreSQL, while it seems that the development of Postgres-XL stops, Greemplum has started to improve OLTP performance considering hybrid workloads.
Interview to IT Companies in Japan
We asked three IT companies in Japan joining PGEcons
- What out-scaling solutions do you use or want to use?
- What do you feel of these solutions?
Here is the summary of solutions that are used in interviewee companies.
About Read load-balancing
Streaming replication and Pgpool-II
A company answered that they often used streaming replication and Pgpoo-II because it doesn't need additional servers for administration and it is simple. They also said they use EDB failover manager sometimes.
They feel a problem of streaming replication is conflict with recovery. The conflict can be reduced by parameter configuration, but it would be not avoidable. If we want to avoid conflict with recovery completely, using Pgpool-II's native replication or IS-mode instead streaming replication could be one alternative.
Other problem on streaming replication they think is inconsistency between primary and standby. It can be avoided by using remote_apply or Pgpool-II's native replication or SI-mode.
Related to this, one of problem of Pgpool-II is about delay_threshold parameter that specifies the maximum tolerance level of replication delay. In Pgpool-II 4.3 or before, we have to specify it in WAL bytes on the standby server against the primary server, but they wanted to specify it by time interval, because it makes easer to explain to their clients. They also wanted Pgpool-II support Windows. These requests have been already conveyed to Pgpool-II development group. Among these requests, the feature to specify delay_threshold parameter in time interval is already implemented for Pgpool-II 4.4.
Another company are considering load balancing using Pgpool-II and read replica. They were worry that Pgpool-II would be Singe Point of Failure, but it can be avoided by Pgpool-II's built-in HA feature (watchdog), or we can left HA to Kubernetes.
Hyperscale (Citus)
One company used Azure Database for PostgreSQL Hyperscale (Cituis) for read load-balancing instead of sharding. They didn't find any problem on the feature or performance of Hyperscale for this purpose, although they felt the cost was high.
On the other hand, they said there was a case that they gave up to migrate their application to Hyperscale because the performance was low due to joins among multiple node.
JDBC driver
One company planed to use JDBC driver's load balancing feature and read replica, but eventually gave up because the host name of Azure Single Server's read replica could change and it wouldn't work well. Also, they said they could find little information about JDBC's load balancing feature especially case examples.
About Read/Write load-balancing
FDW-based sharding
There were some discussions about Postgres-XC and FDW-based approach. It was shared that FDW-based sharding project started because Postgres-XC was a fork of PostgreSQL and following the development of the original PostgreSQL was very hard. On the other hand, adding a big or complex feature is also difficult and takes long time. Actually ,the atomic commit feature has been discussed for long time but not committed. Also, it took a time to make Async Foreign Scan committed. In addition, we would need more discussion about how to handle global transaction ID, or how to combine replication and FDW-based sharding as Postgres-XC did.
In addition, a talk (https://www.youtube.com/watch?v=Zjqmk0OTk0U) in NTT DATA tech conference which introduced a case study using FDW-based sharding was shared. This talk introduced that 3 billion records per day from 3000 smart wattmeters was been processed by PostgreSQL cluster using postgres_fdw based sharding. This talk also mentioned a problem of FDW that a connection to a foreign server didn't recover when a shard DB failed and that health checks for shard DBs were required.
Other feedback from interviewee companies about scaling
Multi-master replication
One company said that they used EDB replication server to configure a multi-master replication cluster but they also wanted to test BDR. It was agreed that there was needs for multi-master solution but that open source multi master solution was missing because BDR was no longer open source.
Write scaling using Redis as a cache
One company uses Redis as a cache of PostgreSQL to handle large data incoming concentrated in a short time. The data is firstly stored into Redis, and reflected to PostgreSQL asynchronously.
Auto-scaling
One company uses the auto-scaling feature on Azure SQL DB (not PostgreSQL). Such feature is required for a workload that the load would be concentrated at a certain time period, but maybe this should be implemented as a cloud service or clustering tool like Pgpool-II or a kubernates operator rather than PostgreSQL core.
Configuration Simplicity
It is desirable that cluster configurations and their behaviour are simple and predicable in order to allow to analyse , and recovery easily and quickly when a failure occurs. In other words, we want to easily understand what and how got broken in a cluster. It is important for on-site workers to allow to handle the problem rapidly.
Summary
Here is the summary of the interview results.
A company uses streaming replication and Pgpool-II as a read load-balancing solution, and they thought conflicts with recovery and data inconsistency between nodes were problems. Their opinions on Pgpool-II were feedbacked to the development team, and one of the them was already implemented. Also, Hyperscale (Citus) was used for read load-balancing purpose. Although JDBC's load balancing feature was also mentioned, it was not adopted.
As a read/write load-balancing solution, there was a case that FDW based sharding was used to process huge data. However, this solution still lacks some features.
As other feedback from interviewee companies about scaling, there were opinions on multi-master replication auto-scaling, and configuration simplicity. Also, one company used Redis as cache in front of PostgreSQL for write scaling.