3rd Postgres Cluster Hackers Summit, pgCon 2012

Time and Place

Tuesday, May 15th, 10am to 5pm

Conference room at the University of Ottawa, room MRT 219

Agenda

Draft agenda follows. Please let me know of any contributions/changes to the agenda you have:

Project Reports: 5 minutes from each project

10am to 11am

Hot Standby/Binary Replication -- Simon
Pgpool-II -- Tatsuo
Postgres-XC -- Koichi
Slony -- Jan
Stado -- Jim

Technical Issues of common interest

11am to Noon, and 1pm to 5pm

SSI in cluster/replication - Koichi, Kevin
Parser export - Koichi, Tatsuo
Managing consistent views of data - Koichi
Command Triggers, Common API for logical replication - Dimitri
Detecting database change caused by triggers and cascade operations - Tatsuo
Fault detection and handling - Koichi
Node addition/removal - Koichi
Configuration and operation - Koichi
Cursor in replication/multi master - Koichi
Bi-Directional Replication - Simon

The Cluster Summit will be from 10am to 5pm, with a break for lunch, which will be provided, sponsored by NTT.

Attendees

Koichi Suzuki, Postgres-XC
Álvaro Herrera, Mammoth, Postgres-XC
Jim Mlodgenski, Stado
Mason Sharp, Stado, Postgres-XC
Dimitri Fontaine, Slony, Londiste, Bucardo
Christopher Browne, Slony
Steve Singer, Slony
Kevin Grittner, SSI
Simon Riggs, Binary Replication
Michael Paquier, Postgres-XC
Ashutosh Bapat, Postgres-XC
Tatsuo Ishii, pgPool
Kaori Inaba, pgPool
Daichi Matsusaka, pgPool
Guillaume Lelarge, pgPool
David Wheeler
Josh Berkus, moderator
Tetsuo Sakata, Postgres-XC
Andres Freund, Bi-Directional Replication
Greg Smith, Bi-Directional Replication
Jan Wieck, Slony
David Fetter

Joining the Meeting

If you will be able to attend, please email Josh (josh@agliodbs.com) with the following:

Your Name
Project(s) you work on
If you will be giving a Project Report
If you have additions to the agenda
Special dietary needs for lunch, if any
If you need travel assistance

Note that the availability of travel funding is not guaranteed; I can just agree to request it.

Notes from meeting

Clustering Meeting

Notes From the Clustering Meeting

Updates from Projects

Postgres-XC update

1.0 beta 2 released
features...

pgPool-II update

3.1.3 out
3.2 will have query cache
Simple literal results cache

Slony-II update

Slony 2.2 release coming soon.
Will include simplified, data-compressed update stream.

Stado Update

Fork from GridSQL
Being using in production now

Binary Replication update

Cascading replication
Using the replication protocol seperately

Future:

Need to reduce failover startup time
Compression, double buffering.
Slowdown for maintenance tasks, mass updates.

There was discussion about how to fix the bottlnecking on WAL replay.

Discussion Items

SSI in Clustering/Replication

Problem is how do we export "read footprint" to other servers? Suggestion is to have a master copy of each table, to make predicate locks work.

This is already a problem with hot standy. Have talked about it on the list. At certain points, we could export "safe" snapshots, but that means waiting. Exporting the commit sequence isn't sufficient, unless all transactions are serializable. Kevin suggested making a database serializable-only as one way to do it.

Multi-Tenancy

(change of topic due to communications)

Currently we can't backup and restore databases seperately. Roles and tablespaces are cluster-wide. WAL and PITR are per-instance.

This problem isn't particular to replication and clustering, though.

Parser Export

(Missed some discussion on this)

Current Postgres-XC approach is not to use the Postgres parser.

Two problems here: accessing the parser information, and extending the parser. One part of this is that the parser data structure isn't stable. We need an independant format above the internal format which can be stable.

Last year we talked about using the same format which the ECPG parser uses.

Tatsuo says that pgPool will try a totally different way. Currently creating pgpool from scratch, but could create pgpool from Postgres source code. Stado also has this need, but in Java.

Possibly generating the .y file should be produced by PostgreSQL for both ECPG and pgPool, and for Postgres-XC. Postgres-XC already has some of this done.

Command Triggers

Will be called "event triggers".

Could implement DDL support into logical replication with DDL triggers.

Things to solve:

which format do we want to present the DDL operation?
could we develop an extension in C which would interpret the DDL parsetree
core version has ability to deny DDL

Spec for DDL output extension:

canonical form (SQL)
schema-qualified (this is harder than it sounds)
environment settings which define DDL
- e.g. is it standard-confirming-strings? current user?
OIDs of changed objects? Names?
structure before alteration?
version2: hierarchical structure of touched objects? JSON?
- need to design this structure
- needs to include things like primary key

Do we need to have named environments for the above? Ones which can be saved? Would be useful anyway. Would need to replicate name/alter environment.

Problem: in a compound command, which event triggers get fired in which order? Still an issue in development. Currently each sub-command is a different event, which doesn't really work for replication.

Replicating Trigger and Cascade Operations

To invalidate query cache, need to invalidate which table is updated. If table is updated implicitly, via trigger or CASCADE, that's very difficult. Currently we can only do that by looking into the pg_depends system catalog. It would be good to have some API for this, like a system view.

We have a trigger in 9.2 which can attach to any table and automatically sends a notification with the PK whenever a table is updated. They could use that. The new facitity is "TCN", or "trigger change notification". pgPool would rather have a view, though, and it doesn't really solve their problem.

We need a user-friendly version of pg_depends.

Fault Detection and Handling

Currently we have to detect faults -- system down -- by polling. This takes much longer than the actual failover takes. How else would you do it though?

Maybe we should send out a message for expected events, so that we know an unexpected event is unexpected. Also we would like to aggregate failure notifications. Automated STONITH would be nice also.

This is not a solved problem for any network service. A watchdog service would be nice though -- a child of the postmaster which would answer polling faster. The watchdog would know if the postmaster shuts down. That could also ping the network broker, and shut down the postmaster if it can't work.

Need to spec the watchdog. There was discussion on how the watchdog would work. One question: should it be a child of the postmaster or separate? Should there be one for every network?

Configuration and Operations

This is related to error detection. PostgreSQL doesn't provide a tool for configuration of a cluster. Slony already has some tools for this. Maybe we should bring those into core.

Integrated simple archive command in core? What would it do, though? pg_receivexlog does a lot of this. Would be nice if it could be a proxy too.

Mostly we need a usability overhaul for replication.

Cursors In Replication/Clustering

SELECT FOR UPDATE, FOR CURRENT OF, scrollable cursors are issues when tables might be in another node. Currently Postgres-XC is using the primary key and various ORDERBYs, but not sure that is the best way.

There was some discussion of tables with/without primary keys. In general, people couldn't think of another approach, although people pointed out various problems with it.

Alvaro suggested instead throwing an error if there's no ORDER BY PK on the request. Large Objects are still a problem given the awkwardness of our LO support. We might need to overhaul LO. People shouldn't use OIDs in user tables is the solution to the OID issue.

Node Addition/Removal

Working on adding/removing nodes for Postgres-XC. Need to redistribute data when you add a node.

Work includes background ALTER TABLE -- concurrently. Also WAL filtering. These might be useful for other tools.

(missed a lot of discussion here)

So one issue with background ALTER TABLE is knowing when the changes are done, so that you can go ahead with the new queries. Discussion of details of ALTER ... CONCURRENTLY ensued.

Bi-Directional Replication

Brief summary...

Simon Riggs: Some people say I've been "copying" Oracle DataGuard features for a number of years now; it's a very well-specified system, so there is bound to be some parallel developments. I'm happy that in many ways we're ahead, since the PostgreSQL replication is easier to use and has some innovative features like transaction-controlled replication. The gap analysis with replication is that we don't support what Oracle calls logical replication. Jan calls it that too.

What we'd like to do is partial replication, selective by-table replication, heterogenous replication, transforming during replication. We can't easily do those things using physical replication, so the best way to approach those features is by a more flexible approach, which logical replication would give us.

The idea is to evaluate lots of different ways to do logical replication. We might need more than one way, so what I really mean by "logical" is anything-apart-from-physical replication.

Oracle's next replication system is Golden Gate which is a more flexible system, so we can do a gap analysis there as well. Specifically, bi-directional replication is about moving data around, including a global interconnected database. In use cases, we're talking about geographically distributed databases.

The main idea is to create a data stream which would use the walsender/walreceiver mechanism to sent logical data, not just binary data. Details beyond that are still TBD.

PgCon2012CanadaClusterSummit

Contents