Fujitsu roadmap

From PostgreSQL wiki
Jump to navigationJump to search

Fujitsu is developing many features for PostgreSQL, to increase its adoption in enterprise. Especially in very large enterprises that have requirements for very high frequency transaction processing.

Given below are some of these features that we are open sourcing to the community. We welcome any inputs, ideas, suggestions on the implementation of these features.

Server

Vertical clustered Index (columnar storage extension)

An extension that supports data storage in columnar format without any major changes in the PostgreSQL core. The columnar storage is preserved on-disk and in-memory providing fault tolerance and increased performance.

The implementation of columnar storage is through Vertical Clustered Indexes (VCI). Index access methods are used to create a new type of index which stores data in two forms to provide high performance OLAP and OLTP. These stores are: 1) Write Optimized Storage (WOS) 2) Read Optimized Storage (ROS).

Vertical Clustered Index


Multi-model database

Big data is often characterized by three words -- variety, velocity, and volume. This focuses mainly on the data variety, which means to address various database models beyond the relational model. Needless to say, velocity and volume are not irrelevant.

Multi-model will be the near future of many databases. By doing this, we would like to make PostgreSQL more popular and adoptable for more use cases. PostgreSQL is in a good position; the fourth place in DB-Engines.com DBMS popularity ranking.

First of all in version 11, we will address the following:

  • key-value: Provide dedicated API that bypasses SQL layer for minimal latency and maximum concurrency, which is similar to MySQL's Memcached API.
  • document: Standard compliance. Help to review and test the SQL/JSON patch submitted by Oleg.
  • wide column: Not intend to adopt the "millions of columns in a row" data model, but the velocity of accumulating large amounts of data (targeting the sensor data for IoT). Investigate how the log-structured merge tree can speed up INSERTs.
  • graph: Natively support graph data model. Implement Cypher and/or Gremlin as the query language through UDFs.

In the future, we are also looking into embracing RDF, multi-dimensional array, time series, event store, and search engine.

Mainly Takayuki Tsunakawa will start working on the multi-model. He will put ideas on another wiki page and link it from here. But the data variety would probably be beyond the scope of what one can do. So people's cooperation would be appreciated. He will create a wiki page with more details and link from here.

Transaction control in stored programs

Currently, PL/pgSQL does not allow committing or rolling back transactions inside user-defined functions. This is often a barrier to migrating from other DBMSs to PostgreSQL. Users of other DBMSs write long and complex stored programs (e.g. in PL/SQL) that involve multiple transactions.

Takayuki Tsunakawa will work on enabling stored programs to commit/rollback the running transaction and start a new one. The community suggested that this require the stored procedure feature, not the stored function.


Statement-level rollback

Like transaction control in stored programs, this is sometimes another barrier to people who try to migrate from other DBMSs. They expect that a failure of one SQL statement should not abort the entire transaction, and their apps (client programs and stored procedures) can continue the transaction with a different SQL statement.

psqlODBC has a connection parameter to simulate statement-level rollback. But this is not performant, because the driver issues SAVEPOINT and RELEASE SAVEPOINT statement before and after each SQL statement the application executes. That is, it requires tree round-trips to the server for each SQL statement.

Takayuki Tsunakawa is working on the statement-level rollback by providing a START TRANSACTION option and a new GUC parameter, which controls the rollback scope when an SQL statement fails.

Statement-level rollback


Monitoring

SQL statements statistics counter view (pg_stat_sql)

A view to display the number of times a command or a group of commands have been executed in a running PostgreSQL instance. This view can be reset at anytime to get a view of the SQL command execution time, for any given period of time. This feature is very useful for the DBA to know the profile of a dormant database.

pg_stat_sql statistics view


Wal writes statistics view (pg_stat_walwrites)

This view shows the details of the WAL write operation, such as the number of blocks written to I/O, the time it took to write those blocks, etc. Knowing this information can be very helpful for the DBA to tune WAL related GUC variables efficiently.

pg_stat_walwrites view


Pluggable Storage

This feature provides facility to extensions to load their own storage by using storage access methods. This removes all the reference of the HeapTuple and HeapScanDesc and etc dependency structure and functions from other part of the code and use them through the access method functions.


Parallel Queries

The existing parallelism framework in PostgreSQL only supports Read queries. This feature extends the current parallelism framework to also write queries.

Utility statements support

This feature allows the utility statements that have queries underneath such as CREATE TABLE AS, CREATE MATERIALIZED VIEW and REFRESH, to benefit from parallel plan.

Write opeartions support


DML operations(Insert, Delete and Update) support

This feature allows the DML statements that contains queries/scan underneath to benefit from using parallel plan.

Write opeartions support


Utility Commands

Optional clause 'AS' and 'OR REPLACE' for a trigger

This feature provides optional clause 'AS' and 'OR REPLACE' syntax to CREATE TRIGGER. With this implementation, it is possible to use create trigger efficiently as a single command.

Create trigger option clause


Client Applications

Refactor handling of database attributes between pg_dump and pg_dumpall

This feature adds support for the dumping of database ACLs and other configuration settings with pg_dump --create option. This change will remove the duplicate code between pg_dump and pg_dumpall

pg_dump and pg_dumpall refactoring


Client Interfaces

Libpq batch pipelineing

Allow libpq clients to avoid excessive round trips by pipelining multiple commands into batches. A sync is only sent at the end of a batch. Commands in a batch succeed or fail together.

Batch support for libpq


"DECLARE STATEMENT” syntax in ECPG

This feature provides the users an option to connect to a non-default target server. “AT” clause can be used once in the “DECLARE STATEMENT” and thereafter, all dynamic SQL statements will be executed on the specified target server.

Declare statement


Keep comment in ECPG

The current ECPG preprocessor removes the comments from the SQL statements that are present in the ECPG application. But these comments are useful for the database administrators to identify exactly which ECPG application is executing these queries, so that they can be easily identified for various purposes.

Keep ecpg comments


Bulk Insert support for ECPG

This feature adds the support of bulk insert operations in ECPG with arrays using the batch pipelining support of libpq.