EnterpriseDB database server roadmap

From PostgreSQL wiki
Jump to navigationJump to search

For the coming PostgreSQL development cycle, 2016-2017, probable version number 10 or 10.0, the database server team at EnterpriseDB intends to work on the following projects. As with any other code contribution, there's no guarantee that this work will be finished on time or that the community will accept it. EnterpriseDB and its staff welcome other work in these same areas and will collaborate with people working at other companies or with individual contributors where there is overlap. Our plans may change based on community input or corporate priorities, so we can't guarantee that everything will work out exactly as described here.

(Note that not everyone at EnterpriseDB who works on PostgreSQL is a member of EnterpriseDB's database server group. In particular, Bruce Momjian is not a member of the database server group and any development plans he may have are not reflected here.)

Parallelism

We intend to work on improving parallel query, and also on adding parallel utility commands.

  • Thomas Munro is working on a hash table based that stores its keys, values, and metadata in dynamic shared memory segments. It's not quite ready to be shared with the PostgreSQL community yet, but we intend to do that early in this release cycle. It's based on earlier work by Robert Haas which was posted but never committed. We intend to use this dynamic hash table to build a Parallel Hash node, so that a parallel Hash Join can use a shared hash table instead of a separate hash table per worker.
  • We also intend to use this hash table to build a Parallel Bitmap Heap Scan node. Analysis of TPC-H query plans suggests that this will improve performance significantly on multiple TPC-H queries.
  • Rushabh Lathia is working on a Gather Merge node, which is like Gather but merges the sorted streams of tuples into a single sorted output stream of tuples.
  • Rahila Syed is working on a Parallel Index Scan node, so that the driving table for a parallel plan could be an index scan rather than a sequential scan. The first version of Rahila's patch will most likely support only btree indexes; we could use help from experts on other index types in extending this to work with all types of indexes.
  • Parallel query has severe restrictions related to subqueries which we would like to see relaxed. We're not sure that we have someone who will have time to work on this in time for this release cycle, but we think it's important. If someone else has time to work on it, that would be great. If not, we'll get to it if time permits.
  • Robert Haas and maybe others will help review and commit any patches proposed to add parallel utility commands. If we work on any ourselves, the most likely candidate would be parallel VACUUM.

Partitioning

  • Once we have partitioning, Ashutosh Bapat will work on making the query planner smarter about partitioned tables. For example, a join of two Append nodes can be converted to an Append of joins if the tables are compatibly partitioned; this figures to perform much better. Similar optimizations are possible for aggregation.

Foreign Data Wrappers

  • We hope to work on adding aggregate pushdown to postgres_fdw. This will use the new planner hooks added by Tom Lane late in the PostgreSQL 9.6 development cycle. It will be similar to the join pushdown work done during the 9.5 and 9.6 development cycles, and will make queries like SELECT COUNT(*) FROM remote_table run quickly instead of slowly.
  • We hope to add support for asynchronous execution to the PostgreSQL executor. Suppose we have an inheritance hierarchy, and some of the members are foreign tables - or if we have partitioning, perhaps some of the partitions are foreign tables. Currently, we'll query each child table one at a time. It would be better to query them all at once and return whichever results are returned first. Asynchronous execution would let us do that. Asynchronous execution also seems likely to be useful for other applications like parallel query, but more work is needed to figure out exactly how it can be useful in that context.
  • We are interested in seeing a pluggable API for heap storage in core, either as an extension of the FDW API or as a separate thing. Currently, we don't have a specific design proposal, but we think that there are important performance optimizations (read-only data, column stores, index-only tables, non-transactional tables) that can only be done using a specialized storage format. Also, this would open up room for experimentation on replacing the existing heap format, which might be something that someone wants to do some day.

Replication

  • Robert Haas will help with review of any patches related to logical replication that may be posted by others.

Vertical Scalability

  • Mithun Cy will continue to pursue the patch to cache MVCC snapshots to reduce contention on ProcArrayLock. (This is based on work originally by Andres Freund, who currently works at Citus Data.)

Performance

  • We're very interested in trying to rejigger executor data structures so that it becomes possible to operate on batches of tuples instead of individual tuples. We think this could speed things up considerably. We're also interested in other idea for making the executor faster, and will review patches posted by others and may also write other patches.
  • Amit Kapila is working on improving the concurrency and performance of hash indexes and adding WAL logging.
  • nodeSeqscan.c has a comment which says that it is very bad that we do not push down scan keys into the heapam layer. Dilip Kumar has done some prototyping which suggests that we could improve performance by pushing down scan keys, so we'll probably try to figure out something in this area.

SQL Features

  • Kevin Grittner still plans to pursue his patch to add OLD and NEW tuplestores to AFTER STATEMENT triggers that request them. This is intended as infrastructure toward incrementally updated materialized views.
  • We may work on some patches to add additional wait events on top of what's been added in PostgreSQL 9.6, and will help with review of such patches posted by others. We think it would be good for PostgreSQL to improve this system to cover basically everything that can produce a significant wait.