What's new in PostgreSQL 9.3

From PostgreSQL wiki

(Difference between revisions)
Jump to: navigation, search
(LATERAL JOIN: add self-contained example query)
m (LATERAL JOIN: link to docs)
Line 118: Line 118:
 
'''Links'''
 
'''Links'''
  
 +
* [http://www.postgresql.org/docs/9.3/static/sql-select.html Documentation: SELECT] ''(see section <tt>LATERAL</tt>)''
 
* [http://www.depesz.com/2012/08/19/waiting-for-9-3-implement-sql-standard-lateral-subqueries/ Waiting for 9.3: Implement SQL standard lateral subqueries]
 
* [http://www.depesz.com/2012/08/19/waiting-for-9-3-implement-sql-standard-lateral-subqueries/ Waiting for 9.3: Implement SQL standard lateral subqueries]
 
* [http://www.postgresonline.com/journal/archives/284-PostgreSQL-9.3-Lateral-Part-1-Use-with-HStore.html PostgreSQL 9.3 Lateral Part 1: Use with HStore]  
 
* [http://www.postgresonline.com/journal/archives/284-PostgreSQL-9.3-Lateral-Part-1-Use-with-HStore.html PostgreSQL 9.3 Lateral Part 1: Use with HStore]  

Revision as of 11:00, 15 May 2013

This page contains additional information about PostgreSQL Version 9.3's features, including descriptions, testing information, and usage information. See also the Release Notes and this page for a list of blog posts explaining some of the new features.

Some features may still have outstanding issues; see PostgreSQL_9.3_Open_Items.

Contents

New features

... in no particular order ...

Items marked as "(DONE)" have a basic description + links, but can of course be improved / expanded.

  • LATERAL JOINs
  • Additional JSON constructor and extractor functions
  • Indexed regular expression search
  • Disk page checksums to detect filesystem failures
  • Performance and locking improvements for Foreign Key locks
  • 64-bit Large Object API
  • COPY FREEZE for reduced IO bulk loading


  • Switch to Posix shared memory and mmap(). (DONE)
  • Writeable Foreign Tables: write to external databases as well as read from them (DONE)
  • pgsql_fdw driver for federation of PostgreSQL databases (DONE)
  • Replication improvements (DONE)
    • Streaming-only remastering of replicas
    • Streaming replication protocol is now architecture-independent.
    • Faster promotion of a streaming standby to primary ("Standby promotion is almost instant, allowing 99.999% availability for a replicated cluster.")
  • VIEW features
    • Automatically updatable VIEWs (DONE)
    • MATERIALIZED VIEWs declaration (DONE)
    • Recursive view declaration (DONE)
  • Parallel pg_dump for faster backups (DONE)
  • Directories for configuration files (DONE)
  • pg_isready database connection checker (DONE)
  • User-defined background workers for automating database tasks (DONE)

Configuration directive 'include_dir'

In addition to including separate configuration files via the 'include' directive, postgresql.conf now also provides the 'include_dir' directive which reads all files ending in ".conf" in the specified directory or directories.

Directories can be specified either as an absolute path or relative from the location of the main configuration file. Directories will be read in the order they occur, while files will be read sorted by C locale rules. It is possible for included files to contain their own 'include_dir' directives.

Links

COPY FREEZE for more efficient bulk loading

To improve initial bulk loading of tables, a FREEZE parameter has been added to the COPY command to enable data to be copied with rows already frozen. See the documentation for usage and caveats.

Links

Custom Background Workers

This functionality enables modules to register themselves as "background worker processes", effectively operating as customised server processes. This is a powerful new feature with a wide variety of possible use cases, such as monitoring server activity, performing tasks at pre-defined intervals, customised logging etc.

Background worker processes can attach to PostgreSQL's shared memory area and to connect to databases internally; by linking to libpq they can also connect to the server in the same way as a regular client application. Background worker processes are written in C, and as server processes they have unrestricted access to all data and can potentially impact other server processes, meaning so they represent a potential security / stability risk. Consequently background worker processes should be developed and deployed with appropriate caution.

Providing an example would go beyond the scope of this article; please refer to the blogs linked below, which provide annotated sample code. The PostgreSQL source also contains a sample background worker process in contrib/worker_spi.

Links

JSON: Additional functionality

The JSON datatype and two supporting functions for converting rows and arrays were introduced in PostgreSQL 9.2. With PostgreSQL 9.3, dedicated JSON operators have been introduced and the number of functions expanded to 12, including JSON parsing support. The JSON parser has exposed for use by other modules such as extensions as an API.

Additionally, the hstore extension has gained two JSON-related functions, hstore_to_json(hstore) and hstore_to_json_loose(hstore). The former is used when an hstore value is cast to json.

Links

LATERAL JOIN

Put simply, a LATERAL JOIN enables a subquery in the FROM part of a clause to reference columns from preceding items in the FROM list.

The following is a self-contained (if quite pointless) example of the kind of clause it is sometimes useful to be able to write:

SELECT base.nr,
       multiples.multiple
  FROM (SELECT generate_series(1,10) AS nr) base
  JOIN (SELECT generate_series(1,10) AS b_nr, base.nr * 2 AS multiple) multiples
       ON multiples.b_nr = base.nr

but which produces an error message like the following:

 LINE 4:    JOIN (SELECT generate_series(1,10) AS base, base.nr * 2 A...
                                                       ^
 HINT:  There is an entry for table "base", but it cannot be referenced from this part of the query.

Using LATERAL JOIN, it's now possible for the second subquery to reference a value from the first:

SELECT base.nr,
       multiples.multiple
  FROM (SELECT generate_series(1,10) AS nr) base,
LATERAL (
  SELECT multiples.multiple FROM
   ( SELECT generate_series(1,10) AS b_nr, base.nr * 2 AS multiple ) multiples
    WHERE multiples.b_nr = base.nr
  ) multiples;

See the articles linked below for some more realistic examples.

Links

Parallel pg_dump for faster backups

The new -j njobs (--jobs='njobs') option enables pg_dump to dump njobs tables simultaneously, reducing the time it takes to dump a database. Example:

 pg_dump -U postgres -j4  -Fd -f /tmp/mydb-dump mydb

This dumps the contents of database "mydb" to the directory "/tmp/mydb-dump" using four simultaneous connections.

Caveats:

  • Parallel dumps can only be in directory format
  • Parellel dumps will place more load on the database, although total dump time should be shorter
  • pg_dump will open njobs + 1 connections to the database, so max_connections should be set appropriately
  • Requesting exclusive locks on database objects while running a parallel dump could cause the dump to fail
  • Parallel dumps from pre-9.2 servers need special attention

An ad-hoc test of this feature on a 4.5GB database (which compresses to around 370MB as a dump) with different values of -j produced following timings:

  • (no -j): 1m3s
  • -j2: 0m28s
  • -j3: 0m24s
  • -j4: 0m24s
  • -j5: 0m25s

Links

'pg_isready' server monitoring tool

pg_isready is a wrapper for PQping created as a standard client application. It accepts a libpq-style connection string and returns one of four exit statuses:

  • 0: server is accepting connections normally
  • 1: server is rejecting connections (for example during startup)
  • 2: server did not response to the connection attempt
  • 3: no connection attempt was made (e.g. due to invalid connection parameters)

Example usage:

barwick@localhost:~$ pg_isready
/tmp:5432 - accepting connections
barwick@localhost:~$ pg_isready --quiet && echo "OK"
OK
barwick@localhost:~$ pg_isready -p5431 -h localhost
localhost:5431 - accepting connections
barwick@localhost:~$ pg_isready -h example.com
example.com:5432 - no response

Links

Switch to Posix shared memory and mmap()

In 9.3, PostgreSQL has switched from using SysV shared memory to using Posix shared memory and mmap for memory management. This allows easier installation and configuration of PostgreSQL, and means that except in usual cases, system parameters such as SHMMAX and SHMALL no longer need to be adjusted. We need users to rigorously test and ensure that no memory management issues have been introduced by the change.

Links

VIEW Features

Materialized Views

Materialized views are a special kind of view which cache the view's output as a physical table, rather than executing the underlying query on every access. Conceptually they are similar to "CREATE TABLE AS", but store the view definition so it can be easily refreshed.

Note that materialized views cannot be auto-refreshed; refreshes are not incremental; and the base table cannot be manipulated. It is currently uncertain whether they will be automatically populated by pg_restore (??? need to confirm this)

Contrived example

Create and populate a table with some arbitrary data:

CREATE TABLE matview_test_table (
 id SERIAL PRIMARY KEY,
 ts TIMESTAMPTZ NOT NULL
)
INSERT INTO matview_test_table VALUES (
 DEFAULT,
 ((NOW() - '2 days'::INTERVAL) + (generate_series(1,1000) || ' seconds')::INTERVAL)::TIMESTAMPTZ
)

Create a materialized view which lists the 5 most recent entries:

CREATE MATERIALIZED VIEW matview_test_view AS
  SELECT id, ts
    FROM matview_test_table
ORDER BY id DESC 
   LIMIT 5
postgres=# SELECT * from matview_test_view ;
  id  |              ts               
------+-------------------------------
 1000 | 2013-05-06 12:02:10.974711+09
  999 | 2013-05-06 12:02:09.974711+09
  998 | 2013-05-06 12:02:08.974711+09
  997 | 2013-05-06 12:02:07.974711+09
  996 | 2013-05-06 12:02:06.974711+09
(5 rows)

Add more data to the table:

INSERT INTO matview_test_table VALUES (
 DEFAULT,
 ((NOW() - '1 days'::INTERVAL) + (generate_series(1,1000) || ' seconds')::INTERVAL)::TIMESTAMPTZ
)

View output does not change:

postgres=# SELECT * from matview_test_view ;
  id  |              ts               
------+-------------------------------
 1000 | 2013-05-06 12:02:10.974711+09
  999 | 2013-05-06 12:02:09.974711+09
  998 | 2013-05-06 12:02:08.974711+09
  997 | 2013-05-06 12:02:07.974711+09
  996 | 2013-05-06 12:02:06.974711+09
(5 rows)

Refresh the view to display the latest table entries:

postgres=# REFRESH MATERIALIZED VIEW matview_test_view ;
REFRESH MATERIALIZED VIEW
postgres=# SELECT * from matview_test_view ;
  id  |              ts               
------+-------------------------------
 2001 | 2013-05-07 12:03:10.696626+09
 2000 | 2013-05-07 12:03:09.696626+09
 1999 | 2013-05-07 12:03:08.696626+09
 1998 | 2013-05-07 12:03:07.696626+09
 1997 | 2013-05-07 12:03:06.696626+09
(5 rows)

The links below contain more detailed information and examples.

Links

Recursive View Syntax

The CREATE RECURSIVE VIEW syntax provides a shorthand way of formulating a recursive common table expression (CTE) as a view.

Taking the example from the CTE documentation:

WITH RECURSIVE t(n) AS (
    VALUES (1)
  UNION ALL
    SELECT n+1 FROM t WHERE n < 100
)
SELECT * FROM t;

This can be created as a recursive view as follows:

CREATE RECURSIVE VIEW t(n) AS
    VALUES (1)
  UNION ALL
    SELECT n+1 FROM t WHERE n < 100;

Links

Updatable Views

Simple views can now be updated in the same way as regular tables. The view can only reference one table (or another updatable view) and must not contain more complex operators, join types etc.

If the view has a WHERE condition, UPDATEs and DELETEs on the underlying table will be restricted to those rows it defines. However UPDATEs may change a row so that it is no longer visible in the view, and an INSERT command can potentiall insert rows which do not satisfy the WHERE condition.

More complex views can be made updatable as before using INSTEAD OF triggers or INSTEAD rules.

Simple example using the following table and view:

CREATE TABLE postgres_versions (
 version VARCHAR(3) PRIMARY KEY,
 nickname TEXT NOT NULL
);
INSERT INTO postgres_versions VALUES
 ('8.0', 'Excitable Element'),
 ('8.1', 'Fishy Foreign Key'),
 ('8.2', 'Grumpy Grant'),
 ('8.3', 'Hysterical Hstore'),
 ('8.4', 'Insane Index'),
 ('9.0', 'Jumpy Join'),
 ('9.1', 'Killer Key'),
 ('9.2', 'Laconical Lexer'),
 ('9.3', 'Morose Module');
CREATE VIEW postgres_versions_9 AS
 SELECT version, nickname
   FROM postgres_versions
  WHERE version LIKE '9.%';

postgres=# SELECT * from postgres_versions_9;
 version |    nickname     
---------+-----------------
 9.0     | Jumpy Join
 9.1     | Killer Key
 9.2     | Laconical Lexer
 9.3     | Morose Module
(4 rows)

postgres=# UPDATE postgres_versions_9 SET nickname='Maniac Master' WHERE version='9.3';
UPDATE 1
postgres=# SELECT * from postgres_versions_9;
 version |    nickname     
---------+-----------------
 9.0     | Jumpy Join
 9.1     | Killer Key
 9.2     | Laconical Lexer
 9.3     | Maniac Master
(4 rows)

Links

Writeable Foreign Tables

"Foreign Data Wrappers" (FDW) were introduced in PostgreSQL 9.1, providing a way of accessing external data sources from within PostgreSQL using SQL. The original implementation was read-only, but 9.3 will enable write access as well, provided the individual FDW drivers have been updated to support this. At the time of writing, only the Redis and PostgreSQL drivers have write support (need to verify this).

See below for more information on the PostgreSQL driver and a simple example.

Links

postgres_fdw

A new contrib module, postgres_fdw, provides the eponymous foreign data wrapper for read/write access to remote PostgreSQL servers (or to another database on the local server).

A simple usage example (connecting to a different database on the same server for ease of testing).

1. Build the postgres_fdw contrib module

cd contrib/postgres_fdw
make install

2. Install the module as an extension

postgres=# CREATE EXTENSION postgres_fdw;
CREATE EXTENSION

3. Create a test "remote" database

postgres=# CREATE DATABASE fdw_test;
CREATE DATABASE
postgres=# \c fdw_test
You are now connected to database "fdw_test" as user "barwick".
fdw_test=# CREATE TABLE world (hello TEXT);
CREATE TABLE

4. Create the server, user and table mapping so that the local PostgreSQL server knows about the remote database:

postgres=# CREATE SERVER postgres_fdw_test FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localhost', dbname 'fdw_test');
CREATE SERVER
postgres=# CREATE USER MAPPING FOR PUBLIC SERVER postgres_fdw_test OPTIONS (password '');
CREATE USER MAPPING
postgres=# CREATE FOREIGN TABLE other_world (greeting TEXT) SERVER postgres_fdw_test OPTIONS (table_name 'world');
CREATE FOREIGN TABLE
postgres=# \det
          List of foreign tables
 Schema |    Table    |      Server       
--------+-------------+-------------------
 public | other_world | postgres_fdw_test
(1 row)

5. Manipulate the remote table as if it were a local one:

postgres=# INSERT INTO other_world VALUES('Take me to your leader');
INSERT 0 1 
postgres=# \c fdw_test
You are now connected to database "fdw_test" as user "barwick".
fdw_test=# SELECT * FROM world;
         hello          
------------------------
 Take me to your leader
(1 row)

Here's another example, where we link to the "account" and "branches" tables on a remote pgbench database:

 create extension postgres_fdw;
 create user mapping for current_user server remotesrv options ( user 'postgres', password 'password' );
 create server remotesrv foreign data wrapper postgres_fdw options ( host '192.168.1.5', port '5433', dbname 'bench');
 create foreign table remoteacct (aid int, bid int, abalance int, filler char(84)) 
 server remotesrv options ( table_name 'pgbench_accounts' );
 create foreign table remotebranch ( bid int, bbalance int, filler char(88) ) 
 server remotesrv  options ( table_name 'pgbench_branches');

Having set this up, we can query the remote server:

   explain select * from remotebranch join remoteacct using ( bid ) where bid = 5;
                                   QUERY PLAN
   ----------------------------------------------------------------------------
   Nested Loop  (cost=200.00..225.40 rows=1 width=712)
   ->  Foreign Scan on remotebranch  (cost=100.00..112.66 rows=1 width=364)
   ->  Foreign Scan on remoteacct  (cost=100.00..112.73 rows=1 width=352)

Notice a couple of things: first, JOIN push-down to the remote server isn't implemented yet (wait for 9.4!). Second, we're not getting real estimates for the remote tables. This is fixable, but telling Postgres to query the remote DB for EXPLAIN information:

   alter foreign table remotebranch options (add use_remote_estimate 'true');
   alter foreign table remoteacct options (add use_remote_estimate 'true');
   bench=# explain select * from remotebranch join remoteacct using ( bid ) where bid = 5;
                                   QUERY PLAN
   ------------------------------------------------------------------------------
   Nested Loop  (cost=200.42..7648.07 rows=99400 width=712)
   ->  Foreign Scan on remotebranch  (cost=100.00..101.14 rows=1 width=364)
   ->  Foreign Scan on remoteacct  (cost=100.42..6552.93 rows=99400 width=97)

Links

Replication Improvements

PostgreSQL's built-in binary replication has been improved in four ways: streaming-only remastering, fast failover, and architecture-independent streaming, and pg_basebackup conf setup.

Streaming-Only Remastering

"Remastering" is the process whereby a replica in a set of replicas becomes the new master for all of the other replicas. For example:

  1. Master M1 is replicating to replicas R1, R2 and R3.
  2. Master M1 needs to be taken down for a hardware upgrade.
  3. The DBA promotes R1 to be the master.
  4. R2 and R3 are reconfigured & restarted, and now replicate from R1

That's remastering in a nutshell. It's even more useful in combination with cascading replication (introduced in 9.2).

In prior versions of PostgreSQL, remastering required using WAL file archiving. Cascading replicas could not switch masters using streaming alone; they would have to be re-cloned. That restriction has now been lifted, allowing remastering from just the stream. This makes it much easier to set up large replication clusters; administrators no longer have to set up an online WAL archive if they don't need one for disaster recovery.

Incidentally, this also makes it possible to set up "cycles" where replication is going in a circle. Whether that's a feature or a bug depends on your perspective.

Links:

Fast Failover

Allows replicas to be promoted in less than a second, permitting 99.999% uptime. More details TBD.

Architecture-Independent Streaming

Allows replicating between different OSes. More details TBD.

pg_basebackup conf setup

If you use the -R switch, pg_basebackup will create a simple (streaming-only) recovery.conf file in the newly cloned data directory. This means that you can immediately start the new database server without doing additional editing.

Backward compatibility

These changes may incur regressions in your applications.

CREATE TABLE output

CREATE TABLE will no longer output messages about implicit index and sequence creation unless the log level is set to DEBUG1.

Server settings

  • Parameter 'commit_delay' is restricted to superusers only
  • Parameter 'replication_timeout' has been renamed to 'wal_sender_timeout'
  • Parameter 'unix_socket_directory' has been replaced 'unix_socket_directories'
  • In-memory sorts to use their full memory allocation; if work_mem was set on the basis of the pre-9.3 behavior, its value may need to be reviewed.

WAL filenames may end in FF

WAL files will now be written in a continuous stream, rather than skipping the last 16MB segment every 4GB, meaning WAL filenames may end in FF. WAL backup or restore scripts may need to be adapted.

Personal tools