FOSDEM/PGDay 2016 Developer Meeting
A meeting of the interested PostgreSQL developers is being planned for Thursday 28th January, 2016 at the Brussels Marriott Hotel, prior to FOSDEM/PGDay 2016. In order to keep the numbers manageable, this meeting is by invitation only. Unfortunately it is quite possible that we've overlooked important individuals during the planning of the event - if you feel you fall into this category and would like to attend, please contact Dave Page (dpage@pgadmin.org).
Please note that the attendee numbers have been kept low in order to keep the meeting more productive. Invitations have been sent only to developers that have been highly active on the database server over the 9.5 release cycle. We have not invited any contributors based on their contributions to related projects, or seniority in regional user groups or sponsoring companies.
This is a PostgreSQL Community event.
Meeting Goals
- Review the progress of the 9.6 schedule, and formulate plans to address any issues
- Address any proposed timing, policy, or procedure issues
- Address any proposed Wicked problems
Time & Location
The meeting will be:
- 9:00AM to 5:00PM
- Brussels Marriott Hotel
Coffee, tea and snacks will be served starting at 8:45am. Lunch will be provided.
RSVPs
The following people have RSVPed to the meeting (in alphabetical order, by surname) and will be attending:
- Joe Conway
- Dimitri Fontaine
- Andres Freund
- Magnus Hagander
- Petr Jelinek
- Peter Geoghegan
- Kevin Grittner
- Álvaro Herrera
- Heikki Linnakangas
- Tom Lane
- Bruce Momjian
- Dave Page
- Dean Rasheed
- David Rowley
- Craig Ringer
- Simon Riggs
- Teodor Sigaev
- Tomas Vondra
The following people have sent their apologies:
- Josh Berkus
- Jeff Davis
- Andrew Dunstan
- Peter Eisentraut
- Stephen Frost
- Etsuro Fujita
- Amit Kapila
- Kohei Kaigai
- Robert Haas
- Fujii Masao
- Noah Misch
- Michael Paquier
- Masahiko Sawada
- Pavel Stehule
Agenda
Time | Item | Presenter |
---|---|---|
09:00 - 09:10 | Welcome and introductions | Dave |
09:10 - 10:10 | 9.6 and beyond - improving the process:
|
All |
10:10 - 10:30 | 9.6 Release Schedule | All |
10:30 - 11:00 | Coffee break | All |
11:00 - 11:45 | Future of storage | Álvaro et al |
11:45 - 12:30 | pglogical, BDR and logical decoding | Petr, Craig, Simon |
12:30 - 13:30 | Lunch | All |
13:30 - 15:00 | Future shared infrastructure for tightly and loosely coupled clustering and multimaster replication:
|
Petr, Craig, Simon |
15:00 - 15:30 | Tea break | All |
15:30 - 16:30 | Testing:
|
Tom, Craig (?) |
16:30 - 17:00 | Any other business | Dave |
17:00 | Finish |
Minutes
Present:
- Joe Conway
- Dimitri Fontaine
- Andres Freund
- Magnus Hagander
- Petr Jelinek
- Peter Geoghegan
- �lvaro Herrera
- Tom Lane
- Bruce Momjian
- Dave Page
- Dean Rasheed
- David Rowley
- Craig Ringer
- Simon Riggs
- Tomas Vondra
Expected to arrive late due to travel:
- Kevin Grittner
- Heikki Linnakangas
Missing:
- Teodor Sigaev
9.6 and beyond - improving the process
Tom: ppl moved straight to 9.6 development when 9.5 was branched
Andres: 9.4 was the same
Bruce: Too many commitfest?
Craig: Lot's of pressure to get more stuff done.
Magnus: More fun to build new features
Dave: PG was a hobby in the early days. Now we're all making a living from it.
Peter: Why was release in January, not December?
Andres: We didn't do anything until deadlines were set?
Dave: So do we need pre-defined deadlines?
Magnus: Backing out large patches causes serious delays
Craig: Open items on RLS caused delays
Simon: Coordination is the bigger problem. Individual issues are small but mount up.
Joe: Finding time is hard and sporadic
Dave: Is it harder for everyone now this is a paid gig, not a hobby we pickup whenever we like?
Others seem to agree.
Andres wonders if Magnus has graphed commitfest submissions, expecting activity has gone way up.
Tom thinks the level has remained around 100 for a while. Simon suggests 80 patches in the beginning, up to 120 at the end of a cycle.
Andres thinks more patches, less review.
Dave: Is is that employers prioritise new features, but not patch review?
Andres: Should we force reviews in return for submissions?
Dean: We should encourage more patch review/reviewers to help find bugs early.
Alvaro: We can't force people to review
Andres: We can force people to write tests, but they'll end up being stupid ones in some cases.
Peter: People are only interested in reviewing what they consider to be important.
Simon: Should we have official "reviewers"?
Dave: That might have the opposite effect - not my job!
Andres: Should we credit reviewers at the end of the release notes?
Dave: It's certainly worthy of credit
Bruce: I've been one of the big poo-pooers of that. It's a slippery slope of saying those names are their for credit.
Dave/Peter: That's not a bad thing.
Simon: Regardless of patch credits, reviewer credit should be given.
Simon: We need a bar for inclusion
Where do we get the list from? Commit logs? Not everyone includes reviewers. Commitfest app? Often only contain the principle reviewer, who may remove themselves when they're done.
Magnus: We should properly credit in the commit logs
Dave: We can have review of release notes to make sure we didn't miss anyone
Dave: This doesn't fix everything. Let's resolve to include reviewers on commit messages, and list them on release notes
Joe: Let's define a commit message structure
Action point: Include reviewers in commit messages and list them on release notes. AND TELL EVERYONE!! (all)
Action point: Bruce to propose commit message template (Bruce)
Tom: Can't just let commitfest drag on. Need to close out firmly on the correct date.
Andres: The author should be responsible for pushing things to the next commitfest.
Simon: Need to handle patches for which we get no response from the author - it's not rejected, and may well be wanted, but is currently inactive
Peter: Need a culture of really tracking status of patches on the cf app so people can see a good overview from one page. Need to handle patch dependencies
Dim: Readme file for patch series?
David: Should the CF app remind reviewers if they haven't submitted review notes?
Andres: That has the negative effect on me
Magnus: Personal contact from the CF manager is better
Alvaro: CF app has a mail feature, and that has worked for me to nag people
Craig: We need to avoid CF manager burnout, and keep the process simple.
Petr: Would be good to have activity log updates for my patches
Andres: RSS feed doesn't work well. Update notifications would be good.
Tomas: Would like to see enforced review for patch. Problem with balance though - big patch for 2 small reviews?
Action point: Magnus to add a subscribe button to the CF app, to allow users to receive email updates when metadata is changed. (Magnus)
Simon: We have CF managers, but we have noone managing releases. I propose a team of three people for an RM team - 3 for redundancy and voting on challenges. Team would be able to veto patches.
Magnus: Would they take over after the last CF?
Simon: Not sure
Peter: This could have meant that UPSERT wouldn't have gone into 9.5 which would have been bad.
Dave: But for every UPSERT, there are 10 patches that wouldn't have been whipped into shape in time.
Joe: The release team should decide when we branch, based on current status
Tom: Should we stop branching early to keep focus on the current release?
Petr: Commitfest schedule should be based on branch date
Simon: Fixed dates are helpful for planning holidays etc.
Tom: We already avoid Christmas etc. MAybe we should keep the existing schedule, but just drop the July/September fest if we're delayed. We should probably drop July regardless.
Simon: Make July a "Review fest"?
Action point: Propose release management team to release mailing list (not a public list), and form for 9.6. (Simon)
Action point: Propose dropping July commitfest and making September conditional on progress. (Simon)
9.6 Release Schedule
Magnus: Do we want to try to get back to a September release?
Bruce: If we end up on 3 CFs, is that enough? If the last is in March, we still won't hit the date
Dave: That's what the release team is for - to progress things though March - Sept.
Magnus: Need to get out of cycle of slippage.
Dave: We should release end of September
Tom: Beta in June, release in September
Magnus: What about the betas? We have too much time between them.
S'i'mon: Do betas need to be full releases?
Dave: Yes, as we're also testing installers, build systems, test systems etc.
Joe: How can we encourage beta testers
Dave: Give out t-shirts for good bug reports
Peter: I like that idea
Simon: Present shirts at a conference
We could have custom T-Shirt
Action point: Investigate SPI funding and t-shirt design and logistics
Action point: Propose September release, May beta (before PG Con?) to release (Dave).
Future of storage
Alvaro: et al is Tomas, David and Simon! We're working on columnar storage. Posted a patch on hackers, but not happy with the result. Want to improve massive query performance - looking for 10x - 100x increases.
The current patch shows maybe 10 - 25% improvements. Current patch is essentially vertical partitioning, by moving data off the heap into another relation. Not really columnar storage - just moving a column to it's own relation. Looking at a new approach based on experiences.
One of the first ideas is to split the concept of a tuple descriptor into 2 pieces - one is coming from the main table, the other a smaller descriptor for each column store on the table. Proposed here as this would require splitting up pg_attribute, and wants buy-in before doing so.
One option is to split up pg_attribute
One option is to have a new storage abstract layer which can handle columns which are not part of the vestigial heap in HeapLockTuple.
Andres: Wonders how much this would help as you'd still need cmin, cmax et al.
Alvaro: That data could be centralised on the heap to some extent (paraphrased, not entirely convinced I understood correctly)!
Simon: We don't want to radically change Postgres. Look at Monet - they proved columnar worked, but you have to effectively turn the database off to load.
Tomas: Initial patch was written to avoid breaking as many things as possible. This is the first step to abstract the locking. In the next step we need to do more radical restructuring.
Andres: Theres a reason why practically no column store supports features like HeapLockTuple, but making the API more general has other advantages.
Tomas: In the next step we could do locking of blocks of tuples
Simon: If we accept restrictions on DB functionality, we'll end up with something so far from Postgres that you'll end up choosing one thing or another - it'll be a spearate product.
Tomas: Greenplum only supports append-only columnar stores
Alvaro: Would want to make incremental changes to storage/catalogs as DDL support is added
Alvaro: Another proposal is to allow multiple tuple datums to be stored consequetively
David: [Splitting pg_attribute]. The idea is to have a physical and logical descriptor so things can be easily rearranged into an efficient storage order.
Alvaro: The new design allows us to have attributes stored in different places, which really doesn't work well with just a couple of other columns on pg_attribute.
Bruce: You should be getting much higher performance. We don't do two things - columnar and graph. Is it about compression, header, row format?
Simon: Vectorising the executor has a massive performance benefit
Dim: This is what Greenplum does for seq scans etc.
Simon: Have someone experienced to review our work, but can't talk to him yet as he's on a review committee for funding a project of 2ndQ's. Restriction should be lifted soon.
Simon: Vectorised column storage can be up in the 100s of % performance increase.
Andres: Just vectorising has given 300%
Simon: If we go down the road of allowing restrictions, we'll end up like MySQL with MyISAM and InnoDB.
Petr: Sucky updates are fine - just don't use columnar for regularly updated stuff.
Tomas: Columnar updates can be fast if done in a batch manner. No good for OLTP of course.
Tomas: I'd be happy with 25x performance if I get updates
Alvaro: Was looking for any objection to restructuring. Seems like there is none - will move forwards with multiple patches.
Tom: Looked at this at Salesforce. Putting in anything that looks like a storage manager API is a *much* bigger task than you might think. There is much more chance of making this work if you can avoid changing catalog access and thus having to touch DDL code.
Dave: Maybe we need a wiki page to write up the current evolution to the patch, so people know how the current state was reached.
Simon: I disagree with Stonebreaker - having restricted features in lots of systems may be lucrative, but we want everything in Postgres.
Joe: Columnar storage is not a feature - it's a solution to a particular problem. We should know the use cases, as maybe there are other solutions.
Alvaro: This is why we want a generic infrastructure for this, to allow future alternative storage options.
Craig: We don't want restrictions, like you can't use BRIN indexes or FTS on a table with columnar storage.
Action point: Setup a wiki page to describe the project and work to date (Alvaro & Tomas)
pglogical, BDR and logical decoding
Simon: We've submitted pglogical for 9.6. Wanted to discuss whether people feel it should be committed to this release, and discuss roadmap of future items. When we originally wrote BDR, we were driven by our funding model. Now we need to get that into core so we started with pglogical. Once we have data transport into code, then we add multimaster.
Craig: pglogical came from the guts of BDR. We took the parts that were usable with PG 9.4, and turned it into a data transport mechanism that allows replication in a flexible arrangement of nodes. This can allow others to build multimaster, sharding, DW etc. Hooks are present to allow filtering. Initial code allows selective replication, online upgrade (bar sequences), data merge into DW. Looking at adding audit feature where changes are fed into an audit table or text fie.
Petr: Working on data transformation
Craig: Looking at adding selective replication within a table, e.g. only replicate data for a particular customer, sharding.
Peter: So you're puting logical on a level field with Slony?
Craig: Yes. And to make ETL easy.
Kevin: Can you support disjointed multi-master, where different nodes contain different data sets?
Craig: Yes
Kevin: What about wheere a change on one node can cause a delete on another?
Craig: You mean like a re-shard? [should be possible]
Dim: What about skipping columns in replication to hide them?
Craig: If we can't do it yet, it should be easy.
Petr: We tried to make the plugin so you can use it on it's own to send data to things other than Postgres
Craig: We've tried to avoid having too many plugins on plugins, but we could make the wire protocol pluggable as well. We're really trying to make it so people can use this for everything.
Andres: One of the dangers here is making the output plugin too complex.
Craig: That's what's so cool - it's actually really simple! I'm really suprised how the output plugin naturally formed boundaries and allowed for hooks.
Craig: pglogical is another part of us getting bits of BDR into core, We'll eventually wrap BDR around it.
Petr: No action items - this is really a status update. We'll talk after lunch more infrastructure we need.
Andres: We have 10 minutes now, lets talk about sequences etc.
Craig: We need the ability to decode a sequence advance out of WAL.
Petr: Craig is working on sequence advance. We also need to work on sequence access methods for multimaster, clustered setups etc. We need the generator to not be locally owned. Have taken multiple approaches to storing sequence AM data in the catalogs.
Craig: We need this for sequences across multimaster and sharding, as well as an idiot proof gapless sequence.
Heikki: What are the AMs you need? Cluster-wide, gapless
Petr: Gapless locks the sequence until commit to ensure numbers don't end up unused.
Andres: Every German will thank you for that!
Lunch
Craig: Wanted to talk to the Russians about their work exposing the transaction and lock manager. First though, failover slots.
Kevin: I was invited to Russia to talk about SSI with an eye toward how that could be spread across multiple nodes.
Peter: There's a restriction on SSI with parallelism - you can't use them together. Probably something to do with predicate locks.
Kevin: On a related note, afterr SSI went in Berkus told me a customer had implemented their own sharded solution, but performance wasn't greast (10x latency)
Craig: If you have data distributed across slow wans on a 4 node MM cluster, if one node fails you can't switch over because you can't create new slots at the right point in time. Failover slots provide a minimal way to allow logical replication to play nicely with HA. Patch is mostly done, needs ability to follow a timeline change and review.
Simon: Interested in transaction manager work, but it should be in core, not an external extension.
Heikki: Simon; how far do we want to go to allow people to write custom extensions? WAL logging in extensions seems popular. Do we want it though?
Simon: Opens the door for people to write patent-encumbered extensions
Dave: I don't really care - we're a BSD project. The question for me, is would it be useful for other OSS projects like PostGIS?
Andres: cstore_fdw could use such a feature for replication and crash safety
Simon: I want a way to skip a broken index during recovery for example
Heikki & Andres question if this would even work. How would you know what is broken during recovery?
Craig: What scares me is lack of disk space checks and crash testing.
Discussion moves onto database consistency checking - Heikki says people don't want it until they need it.
Dave: EDB customers often have it as a check-list due diligence question (that's why EDB wrote pg_catcheck). Peter wrote a tool for checking btree consistency that users have used.
Peter: It's loosly based on pageinspect. It takes shared locks on buffers one at a time.
Various: We need a secret option hidden in the docs without which people cannot run pg_resetxlog!!
Craig: We need someing like "rm -rf / --including-root"
Tom: That works fine until Google archives a post with the magic parameter in it.
Action point: Post new version of btree consistency checker patch (Peter)
Action point: Add warning notice and confirmation requirement to pg_resetxlog (Craig)
Action point: Reword delete backup label hint (Kevin)
We need a safer mechanism for start/stop backup...
Magnus: We could disallow disconnection during backup - i.e. if you disconnect, pg_stopbackup() is run automatically.
Kevin: Need some reliable way of telling the difference between a tarball and a crashed datadir.
Tom: By definition you can't, or tar is broken.
Magnus: We need a new robust API fornon-exclusive backups
Simon: Keep but deprecate the existing API.
Need to find a better way to ensure users have the required xlog in backups
Craig: Our docs are in the wrong order. pg_basebackup should be first, ahead of manual methods.
Action point: Re-arrange backup docs page (Bruce)
Andres: We could rename pg_xlog to pg_wal
Simon: pg_clog to pg_commit
Magnus: Renaming pg_xlog will break all backup scripts
Bruce: If we're telling users to check their scripts for renamed directories, we should tell them what else to check as well.
[Much discussion about trying to figure out the difference between a crashed data directory and incorrectly created backup]
Magnus: We should include links in the docs to trusted backup management tools and encourage users to use them rather than roll their own low level processes.
Magnus: We need to make sure people are aware that their backups are broken if we rename directories, e.g. by changing pg_startbackup() so it barfs unless the update their scripts.
Action point: Finish sanitizing the backup API (Magnus)
By a show of hands, most people favour renaming pg_xlog/pg_clog and risking breaking user scripts.
Heikki: I have no issue renaming pg_clog as that shouldn't break anything
Kevin: One user of ours had a filesystem configured for a huge default allocation size, thus pg_clog took a huge amount of space. Not really something that would break though.
Action point: Submit patch to rename pg_clog/pg_xlog (Bruce)
Action point: Allow tablespaces to use relative paths to avoid issues during testing with multiple instances on one box (Andres)
Testing
Tom: We need to start think hanrder about testing infrastructure. We have buildfarm and isolation tester, but have no performance farm or crash safety testing. Would be good to get Heikki's test tool into common use.
Dave: Wasn't Stephen (Frost) working on the performance farm?
Joe: I think it's still on his mind but don't know much more.
Heikki: I found my tool useful when hacking on xlog stuff, and found some existing bugs. I ran it just before 9.5
Alvaro: We could setup a buildfarm animal to run the test.
Heikki: We need a data generator to do this testing for different index types etc.
Alvaro: In BRIN page evacuation is not tested, but other coverage is complete. We should have a machine running tests constantly.
Heikki: Didn't Peter E have something running?
Kevin: It only ran make check I think, not make check world.
Heikki: We need the workload, and the regression suite which we should keep adding to.
Kevin: We don't want to overload make check to do this, but what about make check world.
Kevin: We could have a numeric level for make check, to add more and more tests.
Tom: This may require infrastructure that machines don't have, so it doesn't make sense to use the same targets.
Alvaro: Michael Paquier had a patch to run tap tests with a master and standby
Andres: I had a test for multixact testing but it wrote ~500G of data. Is this sort of thing worth keeping?
Kevin: Yes, so we don't lose it. But in a different target.
Alvaro: Need to ensure modules don't write data to the same path. May need a BF fix.
Heikki: We need to ensure these tests don't bitrot.
Heikki: I also ran a test for SSL stuff, but that's probably broken now.
Andres: We can ask Andrew to fix that.
Heikki: We didn't do that because it uses TCP connectivity, which is a potential security issue. We could have animal owners enable if they're happy.
Joe: Would be nice to have an easy way to identify "special" animals on the buildfarm.
Simon: Maybe a different view of the BF database to show animals doing certain tests, e.g. all SELinux tests.
Magnus: Access to the BF database should not be an issue for known community members.
Peter: Jeff Janes had a useful test for UPSERT. Originally simulated torn pages, then checked everything was consistent.
Heikki: Would be nice to polish that up.
Andres: Over three releases that test has found bugs
Craig: I'd like to look at using Docker or KVM for simulating power loss
Heikki: Many of these tools are only interesting whilst writing code, and not in the long term
Kevin: But someone will likely modify that code again in the future. I'd like to see them run at least yearly as a check.
Joe: make annual check? :-)
Various: Stress testing often needs to be run for periods of time before bugs are seen
Action item: Heikki to look at polishing his test tool (Heikki)
Action item: Alvaro to push Mr. Paquier's patch for recovery testing (Alvaro)
Action item: Alvaro to setup machine for public 'make coverage' html reports (Alvaro)
Andres: Concurrency primitives are woefully untested. This is difficult because it can take a long time to see, and we often don't run such tests on non-intel.
Dave: And presumably this can be difficult on virtualised environments anyway, e.g. PowerKVM where NUMA node affinity may be configured in different ways
Kevin: Performance testing is hard. Machine config, kernel versions etc. can make a huge difference
Dave: We should do simple baseline testing on a per machine basis and do comparative benchmarks. Having both stable and daily updated machines could show things we break and things the OS vendor breaks (or fixes)
Thomas: I'm willing to spend some time on this work if we can get machines.
Action item: Dave & Tomas to look into getting some basic hardware and writing a framework to get started (Dave & Tomas)
Alvaro: Can we use the buildfarm
Tomas: I don't know Perl
Dave: Neither do I. Something new in Python would probably be quicker and easier to write.
Magnus: The BF schema probably isn't good for this anyway.
Kevin: Flame graphs are very helpful
Tomas: I don't think this should be for diagnostics, just regression testing
Craig: Does anyone see a problem with asking some BF owners to run Docker or similar for crash testing?
Dave: Might be difficult to fit into BF framework
Heikki: Wouldn't hurt to ask people though
Joe: Has someone been doing fuzz testing?
Dave: Yes, Greg, with libfuzzer
Joe: Can that be scripted?
Andres: Probably not worth it - Greg may have exhausted the usefulness
AOB
Dave: Simon had a couple of topics, plus we may want a quick meeting of the security team
Simon: One item was the optimiser roadmap. Should we consider specific optimisation cases from TPC-H for example? We haven't even started on TPC-DS yet (which has 100 queries)
Heikki: Is the question should we bother because some of this work may be long and complex and not pay off in the real world?
Simon: There are various branches to this - sharding etc. materialised views
David: Parallel query - we're adding more brawn, but not brain. Planner improvements here may apply only in a small number of cases, but have a massive effect on some OLAP queries. I feel like we're in an OLTP world in the optimiser, but moving to an OLAP world in the executer.
Simon: This is not to talk about specific decisions, but what we see happening and where we want to go, and making sure we go in avenues that make sense.
Tomas: We rely on costs being accurate, which reflect in some way on query runtime. We could enable some optimisations only when we expect overall cost to be expensive. David proposed 2 phase optimisation - do it as we do now, and if the numbers remain high, try again.
Heikki: That sounds good, but lets look at specific optimisations first.
David: We have some of that now - e.g. left join removals.
Simon: It would be useful to begin documenting what we do already and why
[Discussion on specific optimiser cases]
Simon: We're not allowed to keep adding optimisations that keep adding a microsecond each, but where does that take us?
Heikki: We need to test cases to know what we need to look at
Joe: Maybe for us we can just say "this will be long running - optimise the hell out of it" or treat as normal.
Tom: It's like self tuning - if you're searching one table you're not going to spend time doing join optimisation. If you have lots, you spend more.
Bruce: I want to talk about 9.6. Big three things - seq scan and join parallelism, and FDW sort push down. We have open items parallel computation of sorting and aggregates, Peter's faster sorting, Tomas' multi-variant statistics, pg_logical, auditing, high concurrency performance, relation extension lock, snapshot caching, partitioning syntax and join pushdown in the FDWs themselves.
Action point: Bruce to add a link to his slides to the meeting wiki page (Bruce)
Agenda Items
Please list any agenda items below for inclusion on the schedule.
- 9.6ff Release Schedule
- Future of storage (Álvaro Herrera et al)
- pglogical, BDR and logical decoding (Petr, Craig, Simon)
- Future shared infrastructure for tightly and loosely coupled clustering and multimaster replication (Petr, Craig, Simon)
- Multiple sync replicas
- Sequence replication, sequence access methods
- Exposing transaction management and lock management for distributed xact managers/lock managers
- ...
Possibly also to consider:
- Getting in-core and buildfarm test coverage of replication/failover/promotion (re multixact 9.3 issues etc)
- Better testing infrastructure in general (crash safety, performance, etc)