PGCon2009JapanClusterDeveloperMeeting

Participants' Input to the Meeting

This is the first PostgreSQL cluster developers meeting. The meeting is held as an associated event of PostgreSQL Conference 2009 Japan. For the detail of the meeting, please visit Call for Participants.

Participants of the meeting is encouraged to submit inputs to the meeting to this page. Organizational information can be found here: PostgreSQL_Conference_2009_Japan.

Links to Information about Clustering Solutions

Please put links here to software, background material, presentations, and other information about your particular clustering software.

Notes On General Design

Both for this page and for your 5-minute presentation, please try to answer the following questions about your current solution:

What is the primary use-case your solution is designed for?
General Design
- What's the general clustering architecture of your software? (e.g. statement replication, shared memory, GCS, clustered table, etc.)
- Does it consist of a collection of tools or a monolithic control-and-management architecture?
- Does your solution supply administration and monitoring tools?
- Is your solution intended to be generic, or does it require the user's application to be built around the clustering architecture?
- Does the solution require patches on core PostgreSQL, or is it entirely external?
Availability: how does your software deal with uptime availablity?
- Is failover automated?
- Is data synchronous between nodes or asynchrnous?
- What failure conditions can it protect against, and which can't it?
- Does it work over a WAN or do clustered machines have to be in the same data center?
Scalability: how does your software help with horizontal scaling?
- How does it scale for reads, if at all?
- How does it scale for writes, if at all?
- Is it more designed to scale for many small queries, a few large queries, or for geographic distrubution?
- What kinds of non-simple-query operations (reporting queries, stored procedures, triggers, etc.) can it handle, and which can it not?
Status: what's the project's current development and adoption status?
- Is it still under active development?
- How mature is it? prototype / beta / first release / second release / in maintenance
- How widely adopted is it? Is it used only by customers of the developers, or by PostgreSQL users in general?

Links

Some theory:
- Basics about DB clustering by Tamer Özsu & Patrick Valduriez

DB Cluster softwares:
- Slony-I: Documentation
- pgpool-II: Home Page
- pgbouncer: Project Home Page
- PL/Proxy: PGFoundry Home Page
- PgCluster: PGFoundry Home Page
- Postgres-R: Project Home Page (see esp. the concept document and the references for related scientific papers)
- PostgresForest: Home Page in Japanese
- Bucardo: Project Home Page
- GridSQL: Architecture Page
- Postgres-2: Postgres-2 Introduction Page
- Streaming Replication: Project Home Page
- Mammoth Replicator: Project Home Page
- rubyrep: Project Home Page

Clustering Marketplace

What do you think current commercial and user demand for clustering is? Are users trying to get scalability, availability, or other benefits from Clustering? Has interest in clustering waned or grown?

Josh Says: I've seen the desire for an "Oracle RAC Replacement" is less prominent, at least in the USA, than it was a few years ago. It's possible that folks are realizing that RAC has a lot of drawbacks, or it may just be that I don't talk to Oracle users as much. People seem to be looking more for clustering to help with horizontal scalability, especially to help with performance on cloud hosting platforms, which is a big source of demand now. With MySQL in trouble, people are really looking for an "approximate consistency, low administration" solution to replace MySQL Replication & NDB.

Challenges and Issues

What challenges are you currently facing in working on clustering and replication? What things do you think should be different about core Postgres or developed in common?

Agenda For Day

Please contribute to this agenda! It is not yet final, but we do need to have a list of items people want to talk about before the meeting itself. If you add an item to the agenda, please put your name next to it so we know who to call to start the item.

The meeting will run from 9:30AM to 5:30PM.

Introduction

Current clustering definitions, use cases and customer goals (1/2 hour short presentation, Koichi-san and Josh Berkus)

User goals
Specific use cases
Current market for PostgreSQL clustering
Competitive market of other DBMSes

Notes from Josh Berkus' presentation

NTT's Keynote: Media:NTT Proposal091119.pdf

Review of Existing Projects

Each project team will be welcome to give a 5-minute presentation about your current development work on your clustering or replication solution near the beginning of the event. Note that you are NOT obligated to give this presentation; if you feel that your current efforts are well enough known, or if you have no time to prepare, you may choose not to give a presentation.

In order to have a productive day, please design your presentation around the following:

Presentations will be 5 minutes only, strictly timed.
In order to support (1), presentations will be given on *my* laptop using PDF slides. Please bring a PDF with you to the meeting, or (better) e-mail it to me before the meeting.
Please discuss *current* work on your software and challenges you are currently facing. Summaries of the history or features of your solution are unnecessary unless they have changed in the last year. Instead, link to these on the wiki.

Each team which is giving a presentation should sign up below:

File:ClusterDeveloperMeeting - PostgresForest.pdf Postgres Forest status (Satoshi Nagayasu)
File:Bucardo in Five Minutes.pdf Bucardo (Selena Deckelmann)
File:Pgcluster4CDM.pdf PgCluster update
File:Postgres-R.pdf Postgres-R: Flashlight (Markus Wanner)
Streaming Replication: Media:SR ClusterSummit.pdf (Fujii)
File:Postgres-2 Write-Scalable Cluster.pdf Postgres-2
File:Gridsql jpug2009a.pdf GridSQL

Future Requirements and Expectations

Discussion: please add any discussion items you have around the future of database clustering, the demand for it, and user needs around clustering:

Common issues to several products
- Usability
- Administration
Application or industry specific issues

ClusterFeatures

Technical Issues in clustering design

Please add any items you have around specific technical issues in clustering design, especially unsolved or recently solved ones:

Challenges
- High Availability
- Scalability (read/write)
Specifications and APIs

Plans for future development

Please add any items you want to discuss around future development, especially development involving a collaboration between teams or with the general PostgreSQL community.

To be developed in core PostgreSQL
- ReplicationHooks -- where did they go?
- Standby/replication (sync/async)/partitioning
- Transaction Management
- 2PC callback functions?
- APIs and interfaces (ClusterFeatures)
- Tools
- (MW) common unit and/or regression testing harness?
- (MW) common benchmark framework?
- libpq improvements (keepalive / query timeout, full duplex)

To be developed separately
Merging clustering projects/products?

Visibility to the market

provide information to general users
How can we make things visible to non-development people
Make agreed matrix to describe each product
How it is measured
Info Pages / Videos / Howtos (needed later on)
still need something that's an introductory material
PostgreSQL Manual - but not simple to find which to use
See letspostgres.jp -> focus on practical information

Documentation sprint
- DBAs --> core developers there to help document
- Availability ->
- Send a DBA to do this: from all the different groups
- NTT wants to offer resources

How to implement specific solutions
- Use cases
- two cases: technical implementer, AND their boss to convince
- Updated clustering survey presentation (video)
- Webinar

Packaging
- Because some projects don't have them, they don't look officially supported
- E.g. stackbuilder and one-click installers
- Clustering packages
External module docs

Follow-up

Summary of session
Clustering portal page

Schedule and Map

Meeting Schedule and the map from stations near by will be found in Media:Schedule_and_Map.pdf.

Contact Information

Communication for the clustering summit has been on Josh Berkus's clustering@berkus.org mailing list.

Phone numbers for Koichi Suzuki, Michael Paquier and Josh Berkus have been sent by e-mail. Note that many/most foreign cell phones do not work in Japan.

Please list below your name and the hotel you are staying at in case we need to find you:

Josh Berkus: Shiba Park Hotel
Bruce Momjian: Park Hotel Tokyo
Jan Wieck: Park Hotel Tokyo
Markus Wanner: Shiba Park Hotel

Miscellaneous Travel Tips

Suica Card: Foreign Passport Holders can purchase a Suica + NEX package at Narita Airport for 3500 Yen. It consists of a one-way fare NEX to Tokyo and a 1500 Yen precharged Suica chipcard, that can be used on underground trains in Tokyo (plus 500 Yen deposit for the card). Considering that the one-way fare of NEX is 3000 Yen alone, that looks like a great deal. See http://www.japan-guide.com/e/e2359_002.html for details. -- Jan