From PostgreSQL wiki
(Edit quotes for accuracy and additional explanatory text)
|Line 221:||Line 221:|
=== Bi-Directional Replication ===
=== Bi-Directional Replication ===
'Oracle 's systemis . , -replication is we '.
to do replication. We way .
The main idea is to create a data stream which would use the walsender to sent logical data, not just binary data. Details beyond that are still TBD.
The main idea is to create a data stream which would use the walsenderto sent logical data, not just binary data. Details beyond that are still TBD.
Revision as of 18:09, 29 May 2012
3rd Postgres Cluster Hackers Summit, pgCon 2012
Time and Place
Tuesday, May 15th, 10am to 5pm
Conference room at the University of Ottawa, room MRT 219
Draft agenda follows. Please let me know of any contributions/changes to the agenda you have:
Project Reports: 5 minutes from each project
10am to 11am
- Hot Standby/Binary Replication -- Simon
- pgPoolII -- Tatsuo
- PostgresXC -- Koichi
- Slony -- Jan
- Stado -- Jim
Technical Issues of common interest
11am to Noon, and 1pm to 5pm
- SSI in cluster/replication - Koichi, Kevin
- Parser export - Koichi, Tatsuo
- Managing consistent views of data - Koichi
- Command Triggers, Common API for logical replication - Dimitri
- Detecting database change caused by triggers and cascade operations - Tatsuo
- Fault detection and handling - Koichi
- Node addition/removal - Koichi
- Configuration and operation - Koichi
- Cursor in replication/multi master - Koichi
- Bi-Directional Replication - Simon
The Cluster Summit will be from 10am to 5pm, with a break for lunch, which will be provided, sponsored by NTT.
- Koichi Suzuki, Postgres-XC
- Álvaro Herrera, Mammoth, Postgres-XC
- Jim Mlodgenski, Stado
- Mason Sharp, Stado, Postgres-XC
- Dimitri Fontaine, Slony, Londiste, Bucardo
- Christopher Browne, Slony
- Steve Singer, Slony
- Kevin Grittner, SSI
- Simon Riggs, Binary Replication
- Michael Paquier, Postgres-XC
- Ashutosh Bapat, Postgres-XC
- Tatsuo Ishii, pgPool
- Kaori Inaba, pgPool
- Daichi Matsusaka, pgPool
- Guillaume Lelarge, pgPool
- David Wheeler
- Josh Berkus, moderator
- Tetsuo Sakata, Postgres-XC
- Andres Freund, Bi-Directional Replication
- Greg Smith, Bi-Directional Replication
- Jan Wieck, Slony
- David Fetter
Joining the Meeting
If you will be able to attend, please email Josh (firstname.lastname@example.org) with the following:
- Your Name
- Project(s) you work on
- If you will be giving a Project Report
- If you have additions to the agenda
- Special dietary needs for lunch, if any
- If you need travel assistance
Note that the availability of travel funding is not guaranteed; I can just agree to request it.
Notes from meeting
Notes From the Clustering Meeting
Updates from Projects
- 1.0 beta 2 released
- 3.1.3 out
- 3.2 will have query cache
- Simple literal results cache
- Slony 2.2 release coming soon.
- Will include simplified, data-compressed update stream.
- Fork from GridSQL
- Being using in production now
Binary Replication update
- Cascading replication
- Using the replication protocol seperately
- Need to reduce failover startup time
- Compression, double buffering.
- Slowdown for maintenance tasks, mass updates.
There was discussion about how to fix the bottlnecking on WAL replay.
SSI in Clustering/Replication
Problem is how do we export "read footprint" to other servers? Suggestion is to have a master copy of each table, to make predicate locks work.
This is already a problem with hot standy. Have talked about it on the list. At certain points, we could export "safe" snapshots, but that means waiting. Exporting the commit sequence isn't sufficient, unless all transactions are serializable. Kevin suggested making a database serializable-only as one way to do it.
(change of topic due to communications)
Currently we can't backup and restore databases seperately. Roles and tablespaces are cluster-wide. WAL and PITR are per-instance.
This problem isn't particular to replication and clustering, though.
(Missed some discussion on this)
Current Postgres-XC approach is not to use the Postgres parser.
Two problems here: accessing the parser information, and extending the parser. One part of this is that the parser data structure isn't stable. We need an independant format above the internal format which can be stable.
Last year we talked about using the same format which the ECPG parser uses.
Tatsuo says that pgPool will try a totally different way. Currently creating pgpool from scratch, but could create pgpool from Postgres source code. Stado also has this need, but in Java.
Possibly generating the .y file should be produced by PostgreSQL for both ECPG and pgPool, and for Postgres-XC. Postgres-XC already has some of this done.
Will be called "event triggers".
Could implement DDL support into logical replication with DDL triggers.
Things to solve:
- which format do we want to present the DDL operation?
- could we develop an extension in C which would interpret the DDL parsetree
- core version has ability to deny DDL
Spec for DDL output extension:
- canonical form (SQL)
- schema-qualified (this is harder than it sounds)
- environment settings which define DDL
- e.g. is it standard-confirming-strings? current user?
- OIDs of changed objects? Names?
- structure before alteration?
- version2: hierarchical structure of touched objects? JSON?
- need to design this structure
- needs to include things like primary key
Do we need to have named environments for the above? Ones which can be saved? Would be useful anyway. Would need to replicate name/alter environment.
Problem: in a compound command, which event triggers get fired in which order? Still an issue in development. Currently each sub-command is a different event, which doesn't really work for replication.
Replicating Trigger and Cascade Operations
To invalidate query cache, need to invalidate which table is updated. If table is updated implicitly, via trigger or CASCADE, that's very difficult. Currently we can only do that by looking into the pg_depends system catalog. It would be good to have some API for this, like a system view.
We have a trigger in 9.2 which can attach to any table and automatically sends a notification with the PK whenever a table is updated. They could use that. The new facitity is "TCN", or "trigger change notification". pgPool would rather have a view, though, and it doesn't really solve their problem.
We need a user-friendly version of pg_depends.
Fault Detection and Handling
Currently we have to detect faults -- system down -- by polling. This takes much longer than the actual failover takes. How else would you do it though?
Maybe we should send out a message for expected events, so that we know an unexpected event is unexpected. Also we would like to aggregate failure notifications. Automated STONITH would be nice also.
This is not a solved problem for any network service. A watchdog service would be nice though -- a child of the postmaster which would answer polling faster. The watchdog would know if the postmaster shuts down. That could also ping the network broker, and shut down the postmaster if it can't work.
Need to spec the watchdog. There was discussion on how the watchdog would work. One question: should it be a child of the postmaster or separate? Should there be one for every network?
Configuration and Operations
This is related to error detection. PostgreSQL doesn't provide a tool for configuration of a cluster. Slony already has some tools for this. Maybe we should bring those into core.
Integrated simple archive command in core? What would it do, though? pg_receivexlog does a lot of this. Would be nice if it could be a proxy too.
Mostly we need a usability overhaul for replication.
Cursors In Replication/Clustering
SELECT FOR UPDATE, FOR CURRENT OF, scrollable cursors are issues when tables might be in another node. Currently Postgres-XC is using the primary key and various ORDERBYs, but not sure that is the best way.
There was some discussion of tables with/without primary keys. In general, people couldn't think of another approach, although people pointed out various problems with it.
Alvaro suggested instead throwing an error if there's no ORDER BY PK on the request. Large Objects are still a problem given the awkwardness of our LO support. We might need to overhaul LO. People shouldn't use OIDs in user tables is the solution to the OID issue.
Working on adding/removing nodes for Postgres-XC. Need to redistribute data when you add a node.
Work includes background ALTER TABLE -- concurrently. Also WAL filtering. These might be useful for other tools.
(missed a lot of discussion here)
So one issue with background ALTER TABLE is knowing when the changes are done, so that you can go ahead with the new queries. Discussion of details of ALTER ... CONCURRENTLY ensued.
Simon Riggs: Some people say I've been "copying" Oracle DataGuard features for a number of years now; it's a very well-specified system, so there is bound to be some parallel developments. I'm happy that in many ways we're ahead, since the PostgreSQL replication is easier to use and has some innovative features like transaction-controlled replication. The gap analysis with replication is that we don't support what Oracle calls logical replication. Jan calls it that too.
What we'd like to do is partial replication, selective by-table replication, heterogenous replication, transforming during replication. We can't easily do those things using physical replication, so the best way to approach those features is by a more flexible approach, which logical replication would give us.
The idea is to evaluate lots of different ways to do logical replication. We might need more than one way, so what I really mean by "logical" is anything-apart-from-physical replication.
Oracle's next replication system is Golden Gate which is a more flexible system, so we can do a gap analysis there as well. Specifically, bi-directional replication is about moving data around, including a global interconnected database. In use cases, we're talking about geographically distributed databases.
The main idea is to create a data stream which would use the walsender/walreceiver mechanism to sent logical data, not just binary data. Details beyond that are still TBD.