Pgpool-II
From PostgreSQL wiki
Jump to navigationJump to searchProject Overview
pgpool-II is a middleware that works between PostgreSQL servers and a PostgreSQL database client. It provides the following features.
- Connection Pooling: pgpool-II saves connections to the PostgreSQL servers, and reuse them whenever a new connection with the same properties (i.e. username, database, protocol version) comes in. It reduces connection overhead, and improves system's overall throughput.
- Replication: pgpool-II can manage multiple PostgreSQL servers. Using the replication function enables creating a realtime backup on 2 or more physical disks, so that the service can continue without stopping servers in case of a disk failure.
- Load Balance: If a database is replicated, executing a SELECT query on any server will return the same result. pgpool-II takes an advantage of the replication feature to reduce the load on each PostgreSQL server by distributing SELECT queries among multiple servers, improving system's overall throughput. At best, performance improves proportionally to the number of PostgreSQL servers. Load balance works best in a situation where there are a lot of users executing many queries at the same time.
- Limiting Exceeding Connections: There is a limit on the maximum number of concurrent connections with PostgreSQL, and connections are rejected after this many connections. Setting the maximum number of connections, however, increases resource consumption and affect system performance. pgpool-II also has a limit on the maximum number of connections, but extra connections will be queued instead of returning an error immediately.
- Parallel Query: Using the parallel query function, data can be divided among the multiple servers, so that a query can be executed on all the servers concurrently to reduce the overall execution time. Parallel query works the best when searching large-scale data.
Project Status
In production status. There are number of commercial systems using pgpool-II.
Project Contacts
General Information
- Scalability: Yes (up to 128 DB nodes)
- Read Scaling: Yes
- Write Scaling: No (possible to have up to 128 DB nodes, but performance is 60-70% of plain PostgreSQL)
- Synchronous replication: Yes
- Triggers/procedures: Yes
- Parallel Query: Yes
- Failover/HA: Yes
- Online Provisioning: Yes
- PostgreSQL Upgrades: No
- Detached Node/WAN: No
- PostgreSQL Core Modifications Required: No
- Programming Languages: C
Clustering Model
Pgpool-II is a query based replication system.
Use Case
- There are three modes: replication(R), master/slave(M) and parallel query(P)
- Following functionalities available in each mode
- Connection pooling (R,M,P)
- Automatic failover (R,M)
- Online recovery (R, M when used with streaming replication)
- Master/slave mode can be used with Slony-I and Streaming replication
- Dedicated GUI tool is available (pgpoolAdmin)
Drawbacks
- Cannot correctly handle functions with side effects in SELECT
- nextval() can be handled correctly
- pgpool-II 3.0 removes this drawbacks by recognizing such functions in SELECT
- Need table locks when inserting tables having SERIAL data types
- pgpool-II 3.0 automatically issues row lock rather than table locks
Future plans
- Allow to replicate SERIAL/sequence in more reliable way
Project Sponsors
- Work sponsored by SRA OSS, Inc. Japan
Others
- Commercial support is avilable from SRA OSS, Inc. Japan