PgCon 2012 Developer Meeting

From PostgreSQL wiki

Revision as of 16:16, 16 May 2012 by Gsmith (Talk | contribs)

Jump to: navigation, search

A meeting of the most active PostgreSQL developers is being planned for Wednesday 16th May, 2012 near the University of Ottawa, prior to pgCon 2012. In order to keep the numbers manageable, this meeting is by invitation only. Unfortunately it is quite possible that we've overlooked important code developers during the planning of the event - if you feel you fall into this category and would like to attend, please contact Dave Page (dpage@pgadmin.org).

Please note that this year the attendee numbers have been cut to try to keep the meeting more productive. Invitations have been sent only to developers that have been highly active on the database server over the 9.2 release cycle. We have not invited any contributors based on their contributions to related projects, or seniority in regional user groups or sponsoring companies, unlike in previous years.

This is a PostgreSQL Community event. Room and refreshments/food sponsored by EnterpriseDB. Other companies sponsored attendance for their developers.

Contents

Time & Location

The meeting will be from 8:30AM to 5PM, and will be in the "Red Experience" room at:

Novotel Ottawa
33 Nicholas Street
Ottawa
Ontario
K1N 9M7

Food and drink will be provided throughout the day, including breakfast from 8AM.

View on Google Maps

Attendees

The following people have RSVPed to the meeting (in alphabetical order, by surname):

  • Oleg Bartunov
  • Josh Berkus (Secretary)
  • Jeff Davis
  • Andrew Dunstan
  • Dimitri Fontaine
  • Stephen Frost
  • Peter Geoghegan
  • Kevin Grittner
  • Robert Haas
  • Magnus Hagander
  • Shigeru Hanada
  • Hitoshi Harada
  • KaiGai Kohei
  • Tom Lane
  • Noah Misch
  • Bruce Momjian
  • Dave Page (Chair)
  • Simon Riggs
  • Teodor Sigaev
  • Greg Smith

Proposed Agenda Items

Please list proposed agenda items here:

  • Agree CommitFest schedule for 9.3 (Strawman from Simon)
    • CF1 June 15, 2012 - 1 month
    • CF2 Sep 15, 2012 - 1 month
    • CF3 Nov 15, 2012 - 1 month
    • CF4 Jan 15, 2013 - 2 months
  • Queuing [Dimitri, Kevin]
    • Description: efficient and transactional queuing is a very common need for application using databases, and could help implementing some internal features
    • Goals: get an agreement that core is the right place where to solve that problem, and what parts of it we want in core exactly
  • Materialized views [Kevin]
    • Description: Declarative materialized views are a frequently requested feature, but means many things to many people. It's not likely that an initial implementation will address everything. We need a base set of functionality on which to build.
    • Goals: Reach consensus on what a minimum feature set for commit would be.
  • Partitioning and Segment Exclusion [Dimitri]
    • Description: to solve partitioning, we need to agree on a global approach
    • Goals: agreeing on SE as a basis for better partitioning, having a "GO" on working on SE
  • MERGE: Challenges and priorities [Peter G]
    • Description: Implementing the MERGE statement for 9.3. It is envisaged specifically as an atomic "upsert" operation.
    • Goals: To get buy-in on various aspects of the feature's development, and, ideally, to secure reviewer resources or other support. Because of the complexity of the feature, early interest from reviewers is preferable.
  • Row-level Access Control and SELinux [KaiGai]
    • Security label on user tables
    • Dynamic expandable enum data types
    • Enforcement of triggers by extension
  • Enhancement of FDW at v9.3 [KaiGai]
    • Writable foreign tables
    • Stuffs to be pushed down (Join, Aggregate, Sort, ...)
    • Inheritance of foreign/regular tables
    • Constraint (PK/FK) & Trigger support.
  • Type registry [Andrew]
    • Provide for known OIDs for non-builtin types, and possibly for their IO functions too
    • Would make it possible to write code in core or in extension X that handles a type defined in extension Y.
  • Ending CommitFests in a timely fashion, especially the last one. Avoiding a crush of massive feature patches at the end of the cycle. Handling big patches that aren't quite ready yet. Getting more people to help with patch review. [Robert]
  • What Developers Want [Josh]
    • Description: a top-5 list of features and obstacles to developer adoption of PostgreSQL (with slides)
    • Goal: to set priorities for some features aimed at application users
  • In-Place Upgrades & Checksums [Greg Smith, Simon]
    • Description: Revisit in-place upgrades of the page format, now that pg_upgrade is available and multiple checksum implementations needing it have been proposed.
    • Goal: Nail down some incremental milestones for 9.3 development to aim at.
  • Autonomous Transactions [Simon]
    • Overview of idea, relationship to stored procedures
    • Feedback, buy-in and/or alternatives
  • Parallel Query [Bruce Momjian]
    • Hope to get buy-in for what parallel operations we are hoping to add in upcoming releases
  • Report from Clustering Meeting [Josh] (10 min)
    • Description: to summarize the discussions of the cluster-hackers meeting from the previous day
    • Goal: inter-team synchronization. Possibly, decisions requested on specific in-core features.
  • Double Write Buffers [Simon]
    • Is anyone committing to do that for 9.3?
  • Goals, priorities, and resources for 9.3 [All]
    • For roadmap and planning purposes, set expectations and coordinate work schedules for 9.3. Confirm who is doing what, identify interested reviewers at start, and check for gaps.

Agenda

Time Item Presenter
08:00 Breakfast
08:30 - 08:45 Welcome and introductions Dave Page
08:45 - 09:15 Autonomous transactions Simon Riggs
09:15 - 09:40 Queuing Dimitri Fontaine/Kevin Grittner
09:40 - 09:50 Report from the Clustering Meeting Josh Berkus
09:50 - 10:10 Type registry Andrew Dunstan
10:10 - 10:30 Access control and SELinux KaiGai Kohei
10:30 - 10:45 Coffee break
10:45 - 11:15 Enhancement of FDWs in 9.3 KaiGai Kohei
11:15 - 11:30 What developers want Josh Berkus
11:30 - 12:00 Parallel Query Bruce Momjian
12:00 - 12:30 MERGE: Challenges and priorities Peter Geoghegan
12:30 - 13:30 Lunch
13:30 - 14:00 Materialised views Kevin Grittner
14:00 - 14:20 In place upgrades and checksums Simon Riggs/Greg Smith
14:20 - 14:45 Partitioning and segment exclusion Dimitri Fontaine
14:45 - 15:00 Commitfest Schedule All
15:00 - 15:15 Tea break
15:15 - 15:40 Commitfest management Robert Haas
15:40 - 16:45 Goals, priorities, and resources for 9.3 All
16:45 - 17:00 Any other business/group photo Dave Page
17:00 Finish

Minutes

2012 Developer Meeting Minutes

Started with introductions.

Autonomous Transactions

Simon brought this to get some feedback on the idea. Autonomous transactions (ATX) are a transaction inside a transaction ... a new top-level transaction. In Oracle, it's not just one new transaction, it's a whole new context which can submit multiple new transactions. There is no connection between parent and child transactions, which can result in new types of deadlocks.

Each new transaction context would allocate a new pg_exec from a pg_proc call. Implementation is straightforwards, just have to handle locking. Allows us to implement stored procedures in an interesting way. If we treat a stored procedure as an autonomous transaction, then this solves some problems. We can put COMMIT< ROLLBACK, other things in stored procedures.

Tom suggested that ATX don't need to conflict with parent transaction locks. Noah pointed out some issues with that. We'd need to have a switch for Stored Procedures in order to indicate they are autonomous, like using CREATE STORED PROCEDURE. We'd be using an additional client slot for each ATX, which could be a problem. Oracle's limit on ATX is 70 per connection, which seems like a lot. Maybe we should try to hold them all to a single session like it was a subtransaction. Not sure if we can do this, Simons will need to take a look at is.

ATX also need to eventually be able to run utility commands, like VACUUM and CREATE INDEX CONCURRENTLY.

Queueing

Ultimately the materialized views will need some kind of queueing. Once we have queueing in core, it could be generally useful. CLUSTER CONCURRENTLY would need it, or application queues will need queueing structure. We might want to have it exposed at the SQL level. You put things in the queue, and at commit, others can see it. LISTEN/NOTIFY is sort of a queue, but is only one item and vanishes if you're not listening.

Like a table, but access semantics are different. Would need logged/unlogged queues. Some discussion about how queues are different from tables. Haas wondered about whether what we need for interal queues are the same as what users need for user-visible queues.

Queue-tables also need different performance characteristics. We don't need queues so much as we need deferred action. We also need background processes which wake up and check the queue. Queues could be built on top of tables. Discussion about uses, designs for queues ensued.

We need a really clear design spec for how queues would work. There are specific performance improvements we want for queueing, but they're likely to be just improvements on table performance. The idea is to have a generalized API instead of reinventing a bunch of times.

Next steps is to collect use cases. Kevin & Dimitri will collect use cases on a wiki page, to design an API. Performance optimization needs to look at access pattern. Simon pointed out that this works similar to fact tables where you want to move stuff forward constantly. Users might not use queues as pure FIFO.

Unlinking segments works for deleting from the beginning of a table but indexes could be a problem. Block numbers could be a practical problem, we might need wraparound, or reset-to-zero.

Report on Clustering Meeting

See minutes.

Type Registry

WIP idea. Hstores aren't build in, so they get an arbitrary OID, which causes issues with writing generic code. Looking up they type name is expensive. It would be nice to have a registry for types where people writing extensions are allocated an OID. Andrew gave example of hacking Postgres to support upgrading from the optional JSON type in 9.1 to the built-in type in 9.2.

We need to expose the pg_upgrade stuff as well, set_binary_upgrade. Should we use something other than and OID? We need the OIDs for upgrade and for drivers. Driver identicalness isn't the same as pg_upgradability, so we might want two different switches for that. Maybe we should have a new OID if you change the storage of a type?

What's the criterion for allocating an OID? We'll need some kind of judgement. We'll also need to block off the OID reserved space into sections. People generally found this to be a good idea. Andrew will create a wiki page and follow-up. We could just do this for contrib, but that's not really a good idea.

We could have CREATE TYPE ... WITH OID = ###, for base types only. The folks who want it for ENUM etc. are just replication/clustering authors. There was discussion of other approaches to handling these problems. Users will create types with OIDs which conflict.

Access Control and SE-Linux

Several components: to add security around user tables. Second, to add additional conditions around user queries. Third, a condition around new tuples which are inserted. Fourth, we should have ENUMs to represent user-defined security labels. Did some performance testing on the last part, having labels as OIDs was much faster and closer to non-SE performance.

There's concurrency issues around seeing new labels -- we'd have a huge issue with inserting the labels into the system table. Creating a new label could be a downtime event; we can have a utility command, and we can require users to create a new label first manually. But what happens if the new label isn't there? Should error just like a constraint.

Is there a way to query SE-linux to get all of the security labels? That's hard, because it's four fields. The last field is an issue for prediction. There's a lot of value in having row-level security be completely type-agnosic; we just have a string and we don't care what's in it.

An SE Label consists of: user, row field, type field, and (something inaudible). That last part is a kind of bitmap. Do we actually need that part, though? What's multi-category security, will we support that? How many different labels would you have on a specific table?

The idea of row-level security is to force quals on people. Currently it's not transparent. The discussion on labels needs to continue elsewhere.

Also we need to address FK and PK implementation for security labels.

What Developers want

PostgreSQL is becoming the default for many web applications like Ruby and Django. But there are plenty of users complaints. They don't show up on the PostgreSQL mailing lists. The developer complaints are on stackoverflow, forums for virtual hosting companies, and application specific lists like ORM/framework layers.

Two categories of developer comments: blockers that cause to use another tool, and enhancers that would expand the market into new areas. Many of these are available features, but they seem to hard to use.

Blockers

1. Installation onto developer laptops (Windows / OS X)

  • Re-installs problematic in Windows
  • Reinstall of Redis is the competitor here, it is a closer to a true one-click installer.
  • People use Redis because it's "easy to install", while PostgreSQL ran into one of multiple problems (reported on lists like pgsql-general)
  • postgres.app is aiming at simplifying things for Mac developers, is in beta
  • Kevin: also seen issues with Rails + Rake, lots of questions on Stackoverflow.

2. Complexity of configuring PostgreSQL, i.e. postgresql.conf

  • Shared memory issues on the Mac
    • Could use POSIX shared memory instead Sys V
  • Need a configuration generator and hints for settings that are set incorrectly
    • Example: need to increase size of the transaction log with pg_xlog having X GB of space. Math to determine settings like checkpoint_segements given a GB target is complicated.

3. Better analysis and troubleshooting

  • Expose everything via SQL, i.e. autovacuum ; no parsing logs.
  • EXPLAIN needs to be easier to understand, suggest what needs to be done when planner mistakes are made.
  • Freeze a stable query plan needed for some apps.

4. Easier to understand replication

  • External projects that try to help are often less maintained/robust/documented than core
  • Same thing is true for pooling projects

5. Better pg_upgrade

  • More trustworthy
  • Handle version upgrades across large clusters
  • Deliver on <5 minutes promise. Can take a long time for statistics ANALYZE. Needs to save/restore that instead.

6. MERGE UPSERT

Enabling features to broaden userbase

1. Finish JSON support

  • Most popular new feature on news sites LWN etc. since 9.0 replication
  • Some people want simple document storage like NoSQL, but with PostgreSQl reliability
  • Needs indexing performance improvements
  • More extract from JSON features
  • Schemaless PostgreSQL is possible with JSON or hstore, but it's not obvious that's true.

2. Better extensions

  • Packaging for popular extensions on popular
  • Extensions should follow replication; move .so to standby? Lots of resistance to that idea.
  • Better visibility of extensions, and extension aggregators like PGXN.

3. Client language queries

  • Straight from, say, Python to a parse tree
  • SQL Server/.Net does move in this direction for C#
  • Competition here is the non-relational databases

4. Built-in sharding

  • PL/Proxy: must find it, minimal docs, questions around support situation
  • Target user base here doesn't like SQL or functions much either
  • Base on writable FDW?
  • Borrow ideas from notable sharded PostgreSQL deployments?
Personal tools