New in postgres 10

From PostgreSQL wiki

Jump to: navigation, search

Contents

What's New In PostgreSQL 10

Big Data

Native Partitioning

In 10, partitioning tables is now an attribute of the table:

   CREATE TABLE table_name ( ... )
   [ PARTITION BY { RANGE | LIST } ( { column_name | ( expression ) }
   CREATE TABLE table_name
   PARTITION OF parent_table [ (
   ) ] FOR VALUES partition_bound_spec

Example

Before:

   CREATE TABLE padre (
      id                 serial,
      pais             integer,
      fch_creado  timestamptz not null
   );
   CREATE TABLE hija_2017 (
     CONSTRAINT pk_2017 PRIMARY KEY (id),
     CONSTRAINT ck_2017 CHECK (fch_creado < DATE '2015-01-01' )
   ) INHERITS (padre);
   CREATE INDEX idx_2017 ON hija_2017 (fch_creado);

Today:

   CREATE TABLE padre (
     id             serial not null,
     nombre         text not null,
     fch_creado     timestamptz not null
    )
    PARTITION BY RANGE ( id );
   CREATE TABLE hijo_0
      partition of padre (id, primary key (id), unique (nombre))
      for values from (unbounded) to (10);
   CREATE TABLE hijo_1
      partition of padre (id, primary key (id), unique (nombre))
      for values from (10) to (unbounded);

This means that users no longer need to create triggers for routing data; it's all handled by the system.

Another Example:

For example, we might decide to partition the `book_history` table, probably a good idea since that table is liable to accumulate data forever. Since it's a log table, we'll range partition it, with one partition per month.

First, we create a "master" partition table, which will hold no data but forms a template for the rest of the partitions:


   libdata=# CREATE TABLE book_history (
   book_id INTEGER NOT NULL,
   status BOOK_STATUS NOT NULL,
   period TSTZRANGE NOT NULL )
   PARTITION BY RANGE ( lower (period) );


Then we create several partitions, one per month:


   libdata=# CREATE TABLE book_history_2016_09
   PARTITION OF book_history
   FOR VALUES FROM ('2016-09-01 00:00:00') TO ('2016-10-01 00:00:00');
   CREATE TABLE
   libdata=# CREATE TABLE book_history_2016_08
   PARTITION OF book_history
   FOR VALUES FROM ('2016-08-01 00:00:00') TO ('2016-09-01 00:00:00');
   CREATE TABLE
   libdata=# CREATE TABLE book_history_2016_07
   PARTITION OF book_history
   FOR VALUES FROM ('2016-07-01 00:00:00') TO ('2016-09-01 00:00:00');
   ERROR:  partition "book_history_2016_07" would overlap partition "book_history_2016_08"


As you can see, the system even prevents accidental overlap. New rows will automatically be stored in the correct partition, and SELECT queries will search the appropriate partitions.


Additional Parallelism

Some additional plan nodes can be executed in parallel, particularly Index Scans.

Example:

For example, if we wanted to search financial transaction history by an indexed column, I can now execute it in one-quarter the time by using four parallel workers:

   accounts=# \timing
   Timing is on.
   accounts=# select bid, count(*) from account_history
     where delta > 1000 group by bid;
   ...
   Time: 324.903 ms
   accounts=# set max_parallel_workers_per_gather=4;
   SET
   Time: 0.822 ms
   accounts=# select bid, count(*) from account_history
   where delta > 1000 group by bid;
   ...
   Time: 72.864 ms

(this assumes an index on bid, delta)

Links:

Additional FDW Push-Down

Faster Analytics Queries

Replication and Scaling

Logical Replication

example:

Suppose I decide I want to replicate just the fines and loans tables from my public library database to the billing system so that they can process amounts owed. I would create a publication from those two tables with this command:


   libdata=# CREATE PUBLICATION financials FOR TABLE ONLY loans, ONLY fines;
   CREATE PUBLICATION


Then, in the billing database, I would create two tables that looked identical to the tables I'm replicating, and have the same names. They can have additional columns and a few other differences. Particularly, since I'm not copying the patrons or books tables, I'll want to drop some foreign keys that they origin database has. I also need to create any special data types or other database artifacts required for those tables. Often the easiest way to do this is selective use of the `pg_dump` and `pg_restore` backup utilities:


   origin# pg_dump libdata -Fc -f /netshare/libdata.dump
   replica# pg_restore -d libdata -s -t loans -t fines /netshare/libdata.dump


Following that, I can start a Subscription to those two tables:


   libdata=# CREATE SUBSCRIPTION financials
   CONNECTION 'dbname=libdata user=postgres host=172.17.0.2'
   PUBLICATION financials;
   NOTICE:  synchronized table states
   NOTICE:  created replication slot "financials" on publisher
   CREATE SUBSCRIPTION


This will first copy a snapshot of the data currently in the tables, and then start catching up from the transaction log. Once it's caught up, you can check status in pg_stat_subscription:


   libdata=# select * from pg_stat_subscription;
   -[ RECORD 1 ]---------+---------------------
   subid                 | 16475
   subname               | financials
   pid                   | 167
   relid                 |
   received_lsn          | 0/1FBEAF0
   last_msg_send_time    | 2017-06-07 00:59:44
   last_msg_receipt_time | 2017-06-07 00:59:44
   latest_end_lsn        | 0/1FBEAF0
   latest_end_time       | 2017-06-07 00:59:44


blogs:

Quorum Commit for Synchronous Replication

While version 9.6 introduced quorum based synchronous replication,

   synchronous_commit = 'remote_apply'

version 10 improves the synchronous_standby_names GUC by adding the FIRST and ANY keywords:

   synchronous_standby_names = ANY 2(node1,node2,node3);
   synchronous_standby_names = FIRST 2(node1,node2);

FIRST was the previous behaviour, and the nodes priority is following the list order in order to get a quorum. ANY now means that any node in the list is now able to provide the required quorum. This will give extra flexibility to complex replication setups.

Connection "Failover" in libpq

Implement failover on libpq connect level

Traceable Commit

Traceable commit for PostgreSQL 10

Physical Replication

Improved performance of the replay of 2-phase commits

Improved performance of replay when access exclusive locks are held on objects on the standby server. This can significantly improve performance in cases where temporary tables are being used.

Administration

Compression support for pg_receivewal

SQL features

Identity Columns

PostgreSQL 10 identity columns explained

Crash Safe, Replicable Hash Indexes

Transition Tables for Triggers

This feature makes AFTER STATEMENT triggers both useful and performant by exposing, as appropriate, the old and new rows to queries. Before this feature, AFTER STATEMENT triggers had no direct access to these, and the workarounds were byzantine and had poor performance. Much trigger logic can now be written as AFTER STATEMENT, avoiding the need to do the expensive context switches at each row that FOR EACH ROW triggers require.

XML and JSON

XMLTable

XMLTABLE is a SQL-standard feature that allows transforming an XML document to table format, making it much easier to process XML data in the database. Coupled with foreign tables pointing to external XML data, this can greatly simplify ETL processing.

Full Text Search support for JSON and JSONB

You can now create Full Text Indexes on JSON and JSONB columns.

This involves converting the JSONB field to a `tsvector`, then creating an specific language full-text index on it:


   libdata=# CREATE INDEX bookdata_fts ON bookdata
   USING gin (( to_tsvector('english',bookdata) ));
   CREATE INDEX

Once that's set up, you can do full-text searching against all of the values in your JSON documents:

   libdata=# SELECT bookdata -> 'title'
   FROM bookdata
   WHERE to_tsvector('english',bookdata) @@ to_tsquery('duke');            
   ------------------------------------------
    "The Tattooed Duke"
    "She Tempts the Duke"
    "The Duke Is Mine"
    "What I Did For a Duke"

Security

SCRAM Authentication

New "monitoring" roles for permission grants

Restrictive Policies for Row Level Security

Performance

Cross-column Statistics

Real-world data frequently contains correlated data in table columns, which can easily fool the query planner into thinking WHERE clauses are more selective than they really are, which can cause some queries to become very slow. Multivariate statistics objects can be used to let the planner learn about this, which proofs it against making such mistakes. This manual section explains the feature in more detail, and this section shows some examples. This feature in PostgreSQL represents an advance in the state of the art for all SQL databases.

PG Phriday: Crazy Correlated Column Crusade

Latch Wait times in pg_stat_activity

Query Planner Improvements

In join planning, detect cases where the inner side of the join can only produce a single row for each outer side row. During execution this allows early skipping to the next outer row once a match is found. This can also remove the requirement for mark and restore during Merge Joins, which can significantly improve performance in some cases.

Other Features

ICU Collation Support

More robust collations with ICU support in PostgreSQL 10

amcheck B-Tree consistency/corruption checking tool

PostgreSQL 10 amcheck documentation

Backwards-Incompatible Changes

Version 10 has a number of backwards-incompatible changes which may affect system administration, particularly around backup automation. Users should specifically test for the incompatibilities before upgrading in production.

Change in Version Numbering

As of Version 10, PostgreSQL no longer uses three-part version numbers, but is shifting to two-part version numbers. This means that version 10.1 will be the first patch update to PostgreSQL 10, instead of a new major version. Scripts and tools which detect PostgreSQL version may be affected.

The community strongly recommends that tools use either the GUC server_version_num (on the backend), or the libpq status function PQserverVersion in libpq to get the server version. This returns a six-digit integer version number which will be consistently sortable and comparable between versions 9.6 and 10.

Version String Major Version Update Number version_num
9.6.0 9.6 0 090600
9.6.3 9.6 3 090603
10.0 10 0 100000
10.1 10 1 100001

Renaming of "xlog" to "wal" Globally (and location/lsn)

In order to avoid confusion leading to data loss, everywhere we previously used the abbreviation "xlog" to refer to the transaction log, including directories, functions, and parameters for executables, we now use "wal". Similarly, the word "location" in function names, where used to refer to transaction log location, has been replaced with "lsn".

This will require many users to reprogram custom backup and transaction log management scripts, as well as monitoring replication.

Two directories have been renamed:

9.6 Directory 10 Directory
pg_xlog pg_wal
pg_clog pg_xact

Additionally, depending on where your installation packages come from, the default activity log location may have been renamed from "pg_log" to just "log".

Many administrative functions have been renamed to use "wal" and "lsn":

9.6 Function Name 10 Function Name
pg_current_xlog_flush_location pg_current_wal_flush_lsn
pg_current_xlog_insert_location pg_current_wal_insert_lsn
pg_current_xlog_location pg_current_wal_lsn
pg_is_xlog_replay_paused pg_is_wal_replay_paused
pg_last_xlog_receive_location pg_last_wal_receive_lsn
pg_last_xlog_replay_location pg_last_wal_replay_lsn
pg_switch_xlog pg_switch_wal
pg_xlog_location_diff pg_wal_lsn_diff
pg_xlog_replay_pause pg_wal_replay_pause
pg_xlog_replay_resume pg_wal_replay_resume
pg_xlogfile_name pg_walfile_name
pg_xlogfile_name_offset pg_walfile_name_offset

Some system views and functions have had attribute renames:

  • pg_stat_replication:
    • write_location -> write_lsn
    • sent_location -> sent_lsn
    • flush_location -> flush_lsn
    • replay_location -> replay_lsn
  • pg_create_logical_replication_slot: wal_position -> lsn
  • pg_create_physical_replication_slot: wal_position -> lsn
  • pg_logical_slot_get_changes: location -> lsn
  • pg_logical_slot_peek_changes: location -> lsn

Several command-line executables have had parameters renamed:

  • pg_receivexlog has been renamed to pg_receivewal.
  • pg_resetxlog has been renamed to pg_resetwal.
  • pg_xlogdump has been renamed to pg_waldump.
  • initdb and pg_basebackup have a --waldir option rather than --xlogdir.
  • pg_basebackup now has --wal-method rather than --xlog-method.

Drop Support for FE/BE 1.0 Protocol

PostgreSQL's original client/server protocol, version 1.0, will no longer be supported as of PostgreSQL 10. Since version 1.0 was superceded by version 2.0 in 1998, it is unlikely that any existing clients still use it.

Change Defaults around Replication and pg_basebackup

Drop Support for Floating Point Timestamps

Remove contrib/tsearch2

Tsearch2, the older, contrib module version of our built-in full text search, has been removed from contrib and will no longer be built as part of PostgreSQL packages. Users who have been continuously upgrading since before version 8.3 will need to either manually modify their databases to use the built-in tsearch objects before upgrading to PostgreSQL 10, or will need to compile tsearch2 themselves from scratch and install it.

Drop pg_dump Support for Databases Older than 8.0

Databases running on PostgreSQL version 7.4 and earlier will not be supported by 10's pg_dump or pg_dumpall. If you need to convert a database that old, use version 9.6 or earlier to upgrade it in two stages.

Personal tools