PgCon 2015 Developer Meeting

From PostgreSQL wiki
Jump to navigationJump to search

A meeting of the interested PostgreSQL developers is being planned for Tuesday 16 June, 2015 at the University of Ottawa, prior to pgCon 2015. In order to keep the numbers manageable, this meeting is by invitation only. Unfortunately it is quite possible that we've overlooked important individuals during the planning of the event - if you feel you fall into this category and would like to attend, please contact Dave Page (dpage@pgadmin.org).

Please note that the attendee numbers have been kept low in order to keep the meeting more productive. Invitations have been sent only to developers that have been highly active on the database server over the 9.5 release cycle. We have not invited any contributors based on their contributions to related projects, or seniority in regional user groups or sponsoring companies.

This is a PostgreSQL Community event.

Changes from Previous Developer Meetings

Note that the goals for this year's "Developer Meeting" have shifted to account for the Unconference which is being held at pgCon immediately following the Developer meeting and lasting for 1.5 days (Tuesday afternoon and all day Wednesday). This year, the "Developer meeting" will be focused on non-technical issues such as timing/schedule, policies, procedures, and Wicked problems, be they technical or non-technical in nature. The nature of such Wicked problems is that they require a sufficient number of interested individuals to make progress and generally involve both technical and non-technical issues (trade-off decisions, no clear true or false answer, no way to test if a given solution is correct, etc). The Unconference will be focused on technical discussions and design. If you have any questions regarding the nature of the Developer meeting, please contact Dave Page (dpage@pgadmin.org).

Meeting Goals

  • Define the schedule for the 9.6 release cycle
  • Address any proposed timing, policy, or procedure issues
  • Address any proposed Wicked problems

Time & Location

The meeting will be:

  • 9:00AM to 12PM
  • Room DMS 3120
  • Desmerais Building
  • University of Ottawa.

Coffee, tea and snacks will be served starting at 8:45am. Lunch will be after the meeting.

Note that this meeting is intentionally shorter this year. This is due to the Unconference being held at pgCon.

RSVPs

The following people have RSVPed to the meeting (in alphabetical order, by surname):

  • Oleg Bartunov
  • Josh Berkus
  • Jeff Davis
  • Andrew Dunstan (plane delayed)
  • Andres Freund
  • Stephen Frost
  • Masao Fujii
  • Peter Geoghegan
  • Kevin Grittner
  • Robert Haas
  • Magnus Hagander
  • Álvaro Herrera
  • Amit Kapila
  • Konstantin Knizhnik
  • Alexander Korotkov
  • Tom Lane
  • Heikki Linnakangas
  • Noah Misch
  • Bruce Momjian
  • Dave Page
  • Simon Riggs
  • Teodor Sigaev

Agenda

Time Item Presenter
09:00 - 09:10 Welcome and introductions Dave Page
09:10 - 09:40 9.5 Release Schedule / Restore Reliability Bruce Momjian
09:40 - 10:00 Should We Have a Release Team? Who? Robert Haas
10:00 - 10:30 Remaining Multixact Cleanup Andres Freund
10:30 - 10:40 Promoting Committers: new system? Josh Berkus
10:40 - 10:55 Coffee break All
10:55 - 11:20 9.6 Schedule / Quarterly Security Releases Stephen Frost
11:20 - 11:45 Getting sponsored more time for review/commit/bugfix ???
11:45 - 12:00 Any other business Dave Page
12:00 Finish

Developer Meeting Notes

Attendees

Dunstan, Davis absent

No introductions

9.5 Release Schedule / Restore Reliability

Bruce: I was taking to people yesterday and I feel a lot better than I felt 3 months ago. We've gotten complacent about reliability, but now we have stuff which doesn't get fixed with a simple bugfix. We can work on problems instead of staying on the release schedule. Because we haven't had a case like this. Mostly mulitxacts, but it could have been any failure in our process. We're super-reliable, but we're so used to it that we haven't tried to focus on reliability. It appeared in LWN.net, I think that was neutral. We need to do differently, and we're doing that.

Heikki: what could we do differently? Bruce: test harness for the WAL. That's a good example, we can't just wait for user bug reports.

Heikki: if we develop new features, we should make them testable. Josh: we don't have organized crash recovery testing. Tom: a crash test recovery framework would need to evolve; we couldn't have caught some of the issues with tests we knew about.

Noah: we choose our balance between reliability and adding new features. We're getting the bug level we can expect. Question is, do we like that balance?

Kevin: but we fell down on this because people missed posts about issues. Alvaro actually posted about some of the outstanding bugs, but people missed the emails. We should have a more visible TODO list. Haas: we should create a wiki page now. Noah: should also be on the 9.5 Open Items list. Haas: we keep finding other things with multixacts. The list needs to be more visible. Let's start with a wiki page. Andres: sounds like a bugtracker implemented in a wiki page.

Bruce: we made a joke about the bugtracker. I don't want to address that but I'm going to. We've been blessed in being able to turn bugs around quickly; now we have a bug which we can't fix quickly. Other projects have to deal with this all the time. Haas: most of the stuff we don't deal with nobody who can code cares about. We have 1100 things open against EDBAS, and most of those things are not worth fixing. We need to do feature development. I'm not opposed to a bug tracker, but it will fill up with unimportant stuff. The Multixact thing is different: we need to fix that. You need to curate a bugtracker.

Amit: new people like to look at bugtrackers as a place to get started contributing. Lowers barrier of entry for contributing. Dave, Haas: people pick things off the TODO list, but that's aged out. Noah: "If there's something I sort of think I should feel guilty about not fixing, I put it in the TODO list." Oleg: we should hire a full-time project manager. Some issues brought up. Discussion ensued.

What does having a bugtracker have to do with this? Well, it might have allowed us to not lose track of the Multixact issues. How do we close bugs? People need to take no for an answer. Frost: Debian does a good job of this.

9.5 Schedule: should we put a beta or something out next week? Andres says a new round of fixes will be added in the next 2 weeks. But that doesn't affect beta. Heikki: what are the outstanding items? It actually looks pretty good, Magnus "It's a little too good looking", we need to release a beta so people find bugs.

Haas: what do we think in this release will cause us horrible regrets. Noah: the most risky things affect persistent state. WAL format change, UPSERT. But if there's a bug in the WAL format, then we can fix that during beta. The things we need to fix any UI issues before beta, not bugs. Do we have open questions on RLS? We need to fix those before beta.

RLS needs a lot more documentation (Simon). Heikki asked for show of hands as to how RLS worked, less than half the room raised hands. Heikki is worried that too many people who have not looked at it. Peter brought up the issue with planquals; folks pointed out that that's not a new issue to 9.5. Frost thinks there are potentially changes to the UI in RLS, especially to RETURNING sets. How do we handle these? There are issues with RLS which need to be resolved before beta.

Andres says that UPSERT doesn't really affect stability. There are no persistent state changes, so corruption risk is very low. Race conditions, such but, it can't even abort a transaction.

Simon said we should have a guide to beta testing. Vote on alpha/beta/whatever. Heikki pointed out that a early beta so that people will actually try features and give feedback. Vote was putting an alpha out immediately, more than 2/3 majority.

Should We Have a Release Team

Haas: There seems to be a lot of inertia around creating a release when there's a bug fix. It seems to be very slow before we even talk about doing a release. I was disappointed when nobody was following what was going on with the multixact stuff. I think having a bigger group of people on a closed mailing list would make things happen -- when are we going to do a release and why are we going to do a release. More technical people who understand the bug.

Dave: ultimately the packagers need to decide when the release happens, since they're doing the work. Kevin: the bigger issue was that if you were following the discussion you'd know that there was a data-eating bug, and the core team missed that. The problem with the process was that the people who know about the severity couldn't see that the core team had missed the problem. Dave: you know that I don't hack on the server. I miss those discussions. Someone can propose that clearly on Hackers.

Simon: we don't have a mechanism for deciding which things are really severe. Oleg: like on Facebook? Alvaro: we need a way to tag things as really important. Add a tag which says "release". Or we could create a separate list. Simon: that would just have the same issue for whoever is not on the list. Dave: use the packagers list. Magnus: that's not what that list is for. Frost: maybe we should just use the security list, most of the relevant people are already on it.

Packagers is not the part which is not working. They do fine. The problem is that nobody started the process. You can't tell from the outside if core discussed something or not. There's no reason why the discussion around should we do a release should be confined to 6 people. Last year we went 5 months without doing a release, and you can't tell from the outside what's going on.

Haas thinks we should have a new release list with Core + active committers + a couple other people. There is a strong overlap with security. Dave brought up primary and secondary packagers. Primary packagers should be on this list. But we need to not disclose security issues.

Dave proposed to create a pgsql-releases@ mailing list, initially including the committers in the room and Core. Discussion over sharing details of security releases. Need to work out details of who's on it. Need to figure out split between security and releases list. Should the security list be able to decide that we're having a release?

Regular Security Releases

(quick schedule rearrangement to continue discussion)

Heikki thinks we should have a quarterly update release. Dave is concerned that that will make users will be confused by extra urgent releases. But we still need to do that. It's at least once a quarter, even without serious bugs.

We don't want to do 3 releases in 4 weeks again.

Doing an update at least quarterly. "At least one update per quarter", rather than specific dates. Target dates would be nice, but we might not want to make them public, just shared with packagers.

Noah pointed out that we need a way to know we've made a decision to release or not. Dave thinks we want one person whose responsibility is to make a final decision. More discussion about different systems for doing final determination.

Haas suggested we should just set up the mailing list, and we'll be able to try and figure out what works and what doesn't. And then we'll discuss what worked next year. Bruce says the value of a release list is that it will add specialists on different areas of the code.

Remaining Multixact Cleanup

Andres: there's a number of relatively bad, but hard to hit bugs. Only 2 or 3 people really understand how multixacts work. The big thing is that mxact truncation does not work correctly on standbys and during crash recovery. So we'll need to add a WAL record for truncation, but it happens fairly infrequently.

We need to make changes and test this. Heikki's harness doesn't test SLRUs. It's a combination failure which is hard to simulate. Tom: if we're saying that "we're going to take the technology which works for CLOG and apply it to mxact" seems pretty straightforwards.

Haas: we should make the wiki page so that everyone understands what's going on. Tom: should we push a fix for this into the alpha? Andres was wondering about this. We should push to all branches, but we should do it before alpha. Tom says that we shouldn't do that. Definitely should be committed to the alpha.

Kevin: do we have any remaining data loss bugs? Andres: yes. During crash recovery. Alvaro: the problem generally shows up un unusual configurations. For example, if you replay multiple checkpoints. But that happens during PITR.

We'll put up a wiki page, and then have a more meaningful discussion.

Committers

Josh: currently core chooses committers in closed session. Should we have a different system?

Andres: core should decide more than once a year.

Haas: people who are involved in the CF process should be discussing who should be nominated as committers. Magnus: Core often polls people, but there isn't an outside discussion. Haas feels that there's people outside core who could make recommendations on people we can trust to commit.

Dave: the question is, should we always poll the committers?

Andres: should we have docs-only committers. Core had a recent discussion on this, but didn't want to go ahead.

Nobody in the room was willing to push forward anyone as a new committer on discussion. Haas is worried that we give short shrift to the Japanese contributors. We tend to lump them together, which isn't fair. "They're not one big Japanese guy." One Japanese contributor was mentioned as potential, but not sure he's ready yet.

People should definitely propose people to the Core team for committers. Haas brought up the issue that discussions with Core don't get enough input because it's Core and one individual. Why are committers discussions secret? Tom: because we don't want it to be public that we passed over them.

There as discussion about setting up a closed committers list for discussing nominations. Kind of sounds like the release list. Tom was doing some stats: 13 very active committers, a few who have done commits in the last 6 months, and then 4-5 who haven't committed in years. We need a policy on bouncing inactive committers, and returning inactive committers. Are we worried about old committers coming back and committing stuff? Not really, but having the keys out there is a security risk.

Haas suggest that only active committers can be included on the closed lists. Dave suggests a policy that anyone who doesn't commit in X months becomes an "inactive committer". And gets taken off mailing lists. Need to figure out numbers; Tom just did numbers in the last couple of weeks. Frost: a few people are active in the community even if they aren't committing.

Dave proposed some action items:

  • we have a list of the active committers and all of Core.
  • use that list for proposing committers moving forward.
  • no archives. under security monitoring list

Use that list for discussion of committers to determine status. Motion passed.

Dave proposed some rules on retiring inactive committers. 24 months zero commits, they get removed from the active lists. Passed.

9.6 Schedule

We decided to release a 9.5 alpha very soon. Then betas, etc. Haas predicts that if we do an alpha now, we'll do a beta in the fall. And the final won't come out until the end of the year.

Simon says that there needs to be significant Dev time between final release and feature freeze of next version. Josh suggest releasing a beta right after the alpha, maybe a month later. Magnus says that we should release betas more frequently. Dave suggested doing a beta release every month until final.

We will do either an alpha or a Beta every month until we're done. Simon wants to target major events like the Europe conference. But many release people are associated with conferences. Josh suggested that we really need regular frequent releases for adoption.

Magnus said maybe September is a bad target. Maybe we should shift the target to late October. Dave pointed out issues with Diwali getting in the way of packaging and testing. Late October/early November is good timing. Simon points out that letting things slip makes it hard for anyone to get anything done.

A longer release cycle (18 months, 2 years) would maybe be better? Josh said that it would kill adoption. Haas pointed out that it takes 5 months from closing commits to final release. Maybe someday we can do that faster, but not soon. Discussion about November and October.

Let's say Mid-October at this meeting as an ambitious target and work our way backwards.

So, now set the 9.6 schedule.

Overlapping CFs with beta has been an issue. Should we revisit that? Haas thinks that we have to accept that the early CFs are less productive. So when's the first CF?

Is our developer bandwidth decreasing? Not so much, but the complexity of code is increasing.

Discussion of CommitFest dates ensued. Lots of discussion of how to set dates etc. The dates selected were:

  • CF1: July 1 to July 31 2015
  • CF2: Sept 1 to Sept 30 2015
  • CF3: November 1 to November 30 2015
  • CF4: Jan 2 to Jan 31 2016
  • CF5: March 1 to March 31 2016, to end on time (sudden death patch rejection)
  • Feature Freeze (committer freeze): April 15
  • Beta mid-June
  • Release mid-October

This was followed by discussion of how we arrange the CFs. Noah suggested a prioritization system. He suggested a system where various committers can provide feedback on stuff as triage. Simon agreed and volunteered. Haas suggested a 2-committer veto. Frost said we need a format way to "nack" something. We need to triage stuff earlier in the process. Haas says the big problem is the endless arguments because we don't want to say "no" to people because we have scarce committer time.

Heikki suggested that having a serious hacker as the CFM worked, like when he was CFM. Simon suggested using +1 and -1. Haas says that the method isn't as important as coming up with a way to kick out the stuff early.

Sponsorship of Reviewers

PostgreSQL Europe is running a cut-rate training thing about how to be a Postgres hacker in order to train people up in hacking Postgres. This is a way to encourage more developers.

Peter suggested that the main issue is not just employer time. The issue is that it's draining. Especially rejecting patches is draining, so you can't necessarily do so many.

Other Business

Holding the developer meeting at some other conference.

Many people would like to move the meeting around the world. Dave said that its important to attach it to an established conference. Suggested that New York could work. Europe was good, but it's the wrong time of year. We need to decide way in advance because companies need to sponsor travel.

For Europe, what about FOSDEM? Some pros and cons.

People in the room are OK with Ottawa. But what about the folks who aren't in the room. We'll start the discussion again on email with this group. The deadline to make a decision will be July 31.

Bug Reports

Simon pointed out that we don't give any credit for bugs which are reported, or bug fixes in beta. He thinks this has led to a decrease in bug reports. Dave suggests that we should credit folks in the commit.

Magnus pointed out that we need a standard format for crediting people in commit logs, so we can extract names. Someone suggested we use the same format as Linux.

Conclusion

Several attendees remarked that this was the most productive Developer Meeting in years. People thought that it was because we removed the technical issues and only discussed project management. Also, WIFI wasn't working. Mostly it was that people had discussed most of the issues on email before the meeting.