New in postgres 10

From PostgreSQL wiki

Jump to: navigation, search

Contents

What's New In PostgreSQL 10

Big Data

Native Partitioning

In 10, partitioning tables is now an attribute of the table:

   CREATE TABLE table_name ( ... )
   [ PARTITION BY { RANGE | LIST } ( { column_name | ( expression ) }
   CREATE TABLE table_name
   PARTITION OF parent_table [ (
   ) ] FOR VALUES partition_bound_spec

Example

Before:

   CREATE TABLE padre (
      id                 serial,
      pais             integer,
      fch_creado  timestamptz not null
   );
   CREATE TABLE hija_2017 (
     CONSTRAINT pk_2017 PRIMARY KEY (id),
     CONSTRAINT ck_2017 CHECK (fch_creado < DATE '2015-01-01' )
   ) INHERITS (padre);
   CREATE INDEX idx_2017 ON hija_2017 (fch_creado);

Today:

   CREATE TABLE padre (
     id             serial not null,
     nombre         text not null,
     fch_creado     timestamptz not null
    )
    PARTITION BY RANGE ( id );
   CREATE TABLE hijo_0
      partition of padre (id, primary key (id), unique (nombre))
      for values from (unbounded) to (9);
   CREATE TABLE hijo_1
      partition of padre (id, primary key (id), unique (nombre))
      for values from (10) to (unbounded);

This means that users no longer need to create triggers for routing data; it's all handled by the system.

Another Example:

For example, we might decide to partition the `book_history` table, probably a good idea since that table is liable to accumulate data forever. Since it's a log table, we'll range partition it, with one partition per month.

First, we create a "master" partition table, which will hold no data but forms a template for the rest of the partitions:


   libdata=# CREATE TABLE book_history (
   book_id INTEGER NOT NULL,
   status BOOK_STATUS NOT NULL,
   period TSTZRANGE NOT NULL )
   PARTITION BY RANGE ( lower (period) );


Then we create several partitions, one per month:


   libdata=# CREATE TABLE book_history_2016_09
   PARTITION OF book_history
   FOR VALUES FROM ('2016-09-01 00:00:00') TO ('2016-10-01 00:00:00');
   CREATE TABLE
   libdata=# CREATE TABLE book_history_2016_08
   PARTITION OF book_history
   FOR VALUES FROM ('2016-08-01 00:00:00') TO ('2016-09-01 00:00:00');
   CREATE TABLE
   libdata=# CREATE TABLE book_history_2016_07
   PARTITION OF book_history
   FOR VALUES FROM ('2016-07-01 00:00:00') TO ('2016-09-01 00:00:00');
   ERROR:  partition "book_history_2016_07" would overlap partition "book_history_2016_08"


As you can see, the system even prevents accidental overlap. New rows will automatically be stored in the correct partition, and SELECT queries will search the appropriate partitions.


Additional Parallelism

Some additional plan nodes can be executed in parallel, particularly Index Scans.

Example:

For example, if we wanted to search financial transaction history by an indexed column, I can now execute it in one-quarter the time by using four parallel workers:

   accounts=# \timing
   Timing is on.
   accounts=# select bid, count(*) from account_history
     where delta > 1000 group by bid;
   ...
   Time: 324.903 ms
   accounts=# set max_parallel_workers_per_gather=4;
   SET
   Time: 0.822 ms
   accounts=# select bid, count(*) from account_history
   where delta > 1000 group by bid;
   ...
   Time: 72.864 ms

(this assumes an index on bid, delta)

Links:

Additional FDW Push-Down

Faster Analytics Queries

Replication and Scaling

Logical Replication

example:

Suppose I decide I want to replicate just the fines and loans tables from my public library database to the billing system so that they can process amounts owed. I would create a publication from those two tables with this command:


   libdata=# CREATE PUBLICATION financials FOR TABLE ONLY loans, ONLY fines;
   CREATE PUBLICATION


Then, in the billing database, I would create two tables that looked identical to the tables I'm replicating, and have the same names. They can have additional columns and a few other differences. Particularly, since I'm not copying the patrons or books tables, I'll want to drop some foreign keys that they origin database has. I also need to create any special data types or other database artifacts required for those tables. Often the easiest way to do this is selective use of the `pg_dump` and `pg_restore` backup utilities:


   origin# pg_dump libdata -Fc -f /netshare/libdata.dump
   replica# pg_restore -d libdata -s -t loans -t fines /netshare/libdata.dump


Following that, I can start a Subscription to those two tables:


   libdata=# CREATE SUBSCRIPTION financials
   CONNECTION 'dbname=libdata user=postgres host=172.17.0.2'
   PUBLICATION financials;
   NOTICE:  synchronized table states
   NOTICE:  created replication slot "financials" on publisher
   CREATE SUBSCRIPTION


This will first copy a snapshot of the data currently in the tables, and then start catching up from the transaction log. Once it's caught up, you can check status in pg_stat_subscription:


   libdata=# select * from pg_stat_subscription;
   -[ RECORD 1 ]---------+---------------------
   subid                 | 16475
   subname               | financials
   pid                   | 167
   relid                 |
   received_lsn          | 0/1FBEAF0
   last_msg_send_time    | 2017-06-07 00:59:44
   last_msg_receipt_time | 2017-06-07 00:59:44
   latest_end_lsn        | 0/1FBEAF0
   latest_end_time       | 2017-06-07 00:59:44


blogs:

Quorum Commit for Synchronous Replication

While version 9.6 introduced quorum based synchronous replication,

synchronous_commit = 'remote_apply'

All quorum nodes must be up to date, When are many, should everyone expect to be updated by the Master?

version 10, improves the synchronous_standby_names GUC by adding the FIRST and ANY keywords.

synchronous_standby_names = ANY 2(node1,node2,node3);

synchronous_standby_names = FIRST 2(node1,node2);

FIRST was the previous behaviour, and the nodes priority is following the list order in order to get a quorum. ANY now means that any node in the list is now able to provide the required quorum. This will give extra flexibility to complex replication setups.

Connection "Failover" in libpq

Implement failover on libpq connect level

Traceable Commit

Traceable commit for PostgreSQL 10

Physical Replication

Improved performance of the replay of 2-phase commits

Improved performance of replay when access exclusive locks are held on objects on the standby server. This can significantly improve performance in cases where temporary tables are being used.

Administration

Compression support for pg_receivewal

SQL features

Identity Columns

PostgreSQL 10 identity columns explained

Crash Safe, Replicable Hash Indexes

Transition Tables for Triggers

This feature makes AFTER STATEMENT triggers both useful and performant by exposing, as appropriate, the old and new rows to queries. Before this feature, AFTER STATEMENT triggers had no direct access to these, and the workarounds were byzantine and had poor performance. Much trigger logic can now be written as AFTER STATEMENT, avoiding the need to do the expensive context switches at each row that FOR EACH ROW triggers require.

XML and JSON

XMLTable

XMLTABLE is a SQL-standard feature that allows transforming an XML document to table format, making it much easier to process XML data in the database. Coupled with foreign tables pointing to external XML data, this can greatly simplify ETL processing.

Full Text Search support for JSON and JSONB

You can now create Full Text Indexes on JSON and JSONB columns.

This involves converting the JSONB field to a `tsvector`, then creating an specific language full-text index on it:


   libdata=# CREATE INDEX bookdata_fts ON bookdata
   USING gin (( to_tsvector('english',bookdata) ));
   CREATE INDEX

Once that's set up, you can do full-text searching against all of the values in your JSON documents:

   libdata=# SELECT bookdata -> 'title'
   FROM bookdata
   WHERE to_tsvector('english',bookdata) @@ to_tsquery('duke');            
   ------------------------------------------
    "The Tattooed Duke"
    "She Tempts the Duke"
    "The Duke Is Mine"
    "What I Did For a Duke"

Security

SCRAM Authentication

New "monitoring" roles for permission grants

Restrictive Policies for Row Level Security

Performance

Cross-column Statistics

Real-world data frequently contains correlated data in table columns, which can easily fool the query planner into thinking WHERE clauses are more selective than they really are, which can cause some queries to become very slow. Multivariate statistics objects can be used to let the planner learn about this, which proofs it against making such mistakes. This manual section explains the feature in more detail, and this section shows some examples. This feature in PostgreSQL represents an advance in the state of the art for all SQL databases.

Latch Wait times in pg_stat_activity

Query Planner Improvements

In join planning, detect cases where the inner side of the join can only produce a single row for each outer side row. During execution this allows early skipping to the next outer row once a match is found. This can also remove the requirement for mark and restore during Merge Joins, which can significantly improve performance in some cases.

Other Features

ICU Collation Support

More robust collations with ICU support in PostgreSQL 10

Backwards-Incompatible Changes

Version 10 has a number of backwards-incompatible changes which may affect system administration, particularly around backup automation. Users should specifically test for the incompatibilities before upgrading in production.

Change in Version Numbering

As of Version 10, PostgreSQL no longer uses three-part version numbers, but is shifting to two-part version numbers. This means that version 10.1 will be the first patch update to PostgreSQL 10, instead of a new major version. Scripts and tools which detect PostgreSQL version may be affected.

The community strongly recommends that tools use either the GUC server_version_num (on the backend), or the libpq status function PQserverVersion in libpq to get the server version. This returns a six-digit integer version number which will be consistently sortable and comparable between versions 9.6 and 10.

Version String Major Version Update Number version_num
9.6.0 9.6 0 090600
9.6.3 9.6 3 090603
10.0 10 0 100000
10.1 10 1 100001

Renaming of "xlog" to "wal" Globally

Drop Support for FE/BE 1.0 Protocol

PostgreSQL's original client/server protocol, version 1.0, will no longer be supported as of PostgreSQL 10. Since version 1.0 was superceded by version 2.0 in 1998, it is unlikely that any existing clients still use it.

Change Defaults around Replication and pg_basebackup

Drop Support for Floating Point Timestamps

Remove contrib/tsearch2

Tsearch2, the older, contrib module version of our built-in full text search, has been removed from contrib and will no longer be built as part of PostgreSQL packages. Users who have been continuously upgrading since before version 8.3 will need to either manually modify their databases to use the built-in tsearch objects before upgrading to PostgreSQL 10, or will need to compile tsearch2 themselves from scratch and install it.

Drop pg_dump Support for Databases Older than 8.0

Databases running on PostgreSQL version 7.4 and earlier will not be supported by 10's pg_dump or pg_dumpall. If you need to convert a database that old, use version 9.6 or earlier to upgrade it in two stages.

Personal tools