What's new in PostgreSQL 9.3

From PostgreSQL wiki

(Difference between revisions)
Jump to: navigation, search
m (Switch to Posix shared memory and mmap())
(Architecture-Independent Streaming)
Line 490: Line 490:
 
=== Architecture-Independent Streaming ===
 
=== Architecture-Independent Streaming ===
  
Allows replicating between different OSes. More details TBD.
+
Allows streaming of base backups (using pg_basebackup) and log archiving (using pg_receivexlog) between different OSes and hardware architectures. (Note that you still need the same architecture to restore the backups, but this is useful for example with centralized backup servers)
  
 
=== pg_basebackup conf setup ===
 
=== pg_basebackup conf setup ===

Revision as of 11:45, 12 September 2013

This page contains an overview of PostgreSQL Version 9.3's features, including descriptions, testing and usage information, and links to blog posts containing further imformation. See also the Release Notes and PostgreSQL 9.3 Open Items.

Contents

Configuration directive 'include_dir'

In addition to including separate configuration files via the 'include' directive, postgresql.conf now also provides the 'include_dir' directive which reads all files ending in ".conf" in the specified directory or directories.

Directories can be specified either as an absolute path or relative from the location of the main configuration file. Directories will be read in the order they occur, while files will be read sorted by C locale rules. It is possible for included files to contain their own 'include_dir' directives.

Links

COPY FREEZE for more efficient bulk loading

To improve initial bulk loading of tables, a FREEZE parameter has been added to the COPY command to enable data to be copied with rows already frozen. See the documentation for usage and caveats.

Links

Custom Background Workers

This functionality enables modules to register themselves as "background worker processes", effectively operating as customised server processes. This is a powerful new feature with a wide variety of possible use cases, such as monitoring server activity, performing tasks at pre-defined intervals, customised logging etc.

Background worker processes can attach to PostgreSQL's shared memory area and to connect to databases internally; by linking to libpq they can also connect to the server in the same way as a regular client application. Background worker processes are written in C, and as server processes they have unrestricted access to all data and can potentially impact other server processes, meaning so they represent a potential security / stability risk. Consequently background worker processes should be developed and deployed with appropriate caution.

Providing an example would go beyond the scope of this article; please refer to the blogs linked below, which provide annotated sample code. The PostgreSQL source also contains a sample background worker process in contrib/worker_spi.

Links

Data Checksums

It is now possible for PostgreSQL to checksum data pages and report corruption. This is a cluster-wide setting and cannot be applied to individual databases or objects. Also be aware that this facility may incur a noticeable performance penalty. This option must be enabled during initdb and cannot be changed (although there is a new GUC parameter "ignore_checksum_failure" which will force PostgreSQL to continue processing a transaction even if corruption is detected).

Links

JSON: Additional functionality

The JSON datatype and two supporting functions for converting rows and arrays were introduced in PostgreSQL 9.2. With PostgreSQL 9.3, dedicated JSON operators have been introduced and the number of functions expanded to 12, including JSON parsing support. The JSON parser has exposed for use by other modules such as extensions as an API.

Additionally, the hstore extension has gained two JSON-related functions, hstore_to_json(hstore) and hstore_to_json_loose(hstore). The former is used when an hstore value is cast to json.

Links

LATERAL JOIN

Put simply, a LATERAL JOIN enables a subquery in the FROM part of a clause to reference columns from preceding items in the FROM list.

The following is a self-contained (if quite pointless) example of the kind of clause it is sometimes useful to be able to write:

SELECT base.nr,
       multiples.multiple
  FROM (SELECT generate_series(1,10) AS nr) base
  JOIN (SELECT generate_series(1,10) AS b_nr, base.nr * 2 AS multiple) multiples
       ON multiples.b_nr = base.nr

but which produces an error message like the following:

 LINE 4:    JOIN (SELECT generate_series(1,10) AS base, base.nr * 2 A...
                                                       ^
 HINT:  There is an entry for table "base", but it cannot be referenced from this part of the query.

Using LATERAL JOIN, it's now possible for the second subquery to reference a value from the first:

SELECT base.nr,
       multiples.multiple
  FROM (SELECT generate_series(1,10) AS nr) base,
LATERAL (
  SELECT multiples.multiple FROM
   ( SELECT generate_series(1,10) AS b_nr, base.nr * 2 AS multiple ) multiples
    WHERE multiples.b_nr = base.nr
  ) multiples;

Note that function calls can now directly reference columns from preceding FROM items, even without the LATERAL keyword. Example:

CREATE FUNCTION multiply(INT, INT)
  RETURNS INT
  LANGUAGE SQL
AS
$$
  SELECT $1 * $2;
$$

Query with function call in the FROM list:

SELECT base.nr,
       multiple
  FROM (SELECT generate_series(1,10) AS nr) base,
       multiply(base.nr, 2) AS multiple

In previous versions, this query would generate an error like this:

ERROR:  function expression in FROM cannot refer to other relations of same query level
LINE 4:        multiply(base.nr, 2) AS multiple

See the articles linked below for some more realistic examples.

Links

Parallel pg_dump for faster backups

The new -j njobs (--jobs='njobs') option enables pg_dump to dump njobs tables simultaneously, reducing the time it takes to dump a database. Example:

 pg_dump -U postgres -j4  -Fd -f /tmp/mydb-dump mydb

This dumps the contents of database "mydb" to the directory "/tmp/mydb-dump" using four simultaneous connections.

Caveats:

  • Parallel dumps can only be in directory format
  • Parellel dumps will place more load on the database, although total dump time should be shorter
  • pg_dump will open njobs + 1 connections to the database, so max_connections should be set appropriately
  • Requesting exclusive locks on database objects while running a parallel dump could cause the dump to fail
  • Parallel dumps from pre-9.2 servers need special attention

An ad-hoc test of this feature on a 4.5GB database (which compresses to around 370MB as a dump) with different values of -j produced following timings:

  • (no -j): 1m3s
  • -j2: 0m28s
  • -j3: 0m24s
  • -j4: 0m24s
  • -j5: 0m25s

Links

'pg_isready' server monitoring tool

pg_isready is a wrapper for PQping created as a standard client application. It accepts a libpq-style connection string and returns one of four exit statuses:

  • 0: server is accepting connections normally
  • 1: server is rejecting connections (for example during startup)
  • 2: server did not response to the connection attempt
  • 3: no connection attempt was made (e.g. due to invalid connection parameters)

Example usage:

barwick@localhost:~$ pg_isready
/tmp:5432 - accepting connections
barwick@localhost:~$ pg_isready --quiet && echo "OK"
OK
barwick@localhost:~$ pg_isready -p5431 -h localhost
localhost:5431 - accepting connections
barwick@localhost:~$ pg_isready -h example.com
example.com:5432 - no response

Links

Switch to Posix shared memory and mmap()

In 9.3, PostgreSQL has switched from using SysV shared memory to using Posix shared memory and mmap for memory management. This allows easier installation and configuration of PostgreSQL, and means that except in unusual cases, system parameters such as SHMMAX and SHMALL no longer need to be adjusted. We need users to rigorously test and ensure that no memory management issues have been introduced by the change.

Links

Trigger Features

Event Triggers

Triggers can now be defined on DDL events (CREATE, ALTER, DROP).

Links

VIEW Features

Materialized Views

Materialized views are a special kind of view which cache the view's output as a physical table, rather than executing the underlying query on every access. Conceptually they are similar to "CREATE TABLE AS", but store the view definition so it can be easily refreshed.

Note that materialized views cannot be auto-refreshed; refreshes are not incremental; and the base table cannot be manipulated. They will however be automatically populated by pg_restore (more precisely, pg_dump includes a "REFRESH MATERIALIZED VIEW" statement).

Contrived example

Create and populate a table with some arbitrary data:

CREATE TABLE matview_test_table (
 id SERIAL PRIMARY KEY,
 ts TIMESTAMPTZ NOT NULL
)
INSERT INTO matview_test_table VALUES (
 DEFAULT,
 ((NOW() - '2 days'::INTERVAL) + (generate_series(1,1000) || ' seconds')::INTERVAL)::TIMESTAMPTZ
)

Create a materialized view which lists the 5 most recent entries:

CREATE MATERIALIZED VIEW matview_test_view AS
  SELECT id, ts
    FROM matview_test_table
ORDER BY id DESC 
   LIMIT 5
postgres=# SELECT * from matview_test_view ;
  id  |              ts               
------+-------------------------------
 1000 | 2013-05-06 12:02:10.974711+09
  999 | 2013-05-06 12:02:09.974711+09
  998 | 2013-05-06 12:02:08.974711+09
  997 | 2013-05-06 12:02:07.974711+09
  996 | 2013-05-06 12:02:06.974711+09
(5 rows)

Add more data to the table:

INSERT INTO matview_test_table VALUES (
 DEFAULT,
 ((NOW() - '1 days'::INTERVAL) + (generate_series(1,1000) || ' seconds')::INTERVAL)::TIMESTAMPTZ
)

View output does not change:

postgres=# SELECT * from matview_test_view ;
  id  |              ts               
------+-------------------------------
 1000 | 2013-05-06 12:02:10.974711+09
  999 | 2013-05-06 12:02:09.974711+09
  998 | 2013-05-06 12:02:08.974711+09
  997 | 2013-05-06 12:02:07.974711+09
  996 | 2013-05-06 12:02:06.974711+09
(5 rows)

Refresh the view to display the latest table entries:

postgres=# REFRESH MATERIALIZED VIEW matview_test_view ;
REFRESH MATERIALIZED VIEW
postgres=# SELECT * from matview_test_view ;
  id  |              ts               
------+-------------------------------
 2001 | 2013-05-07 12:03:10.696626+09
 2000 | 2013-05-07 12:03:09.696626+09
 1999 | 2013-05-07 12:03:08.696626+09
 1998 | 2013-05-07 12:03:07.696626+09
 1997 | 2013-05-07 12:03:06.696626+09
(5 rows)

The links below contain more detailed information and examples.

Links

Recursive View Syntax

The CREATE RECURSIVE VIEW syntax provides a shorthand way of formulating a recursive common table expression (CTE) as a view.

Taking the example from the CTE documentation:

WITH RECURSIVE t(n) AS (
    VALUES (1)
  UNION ALL
    SELECT n+1 FROM t WHERE n < 100
)
SELECT * FROM t;

This can be created as a recursive view as follows:

CREATE RECURSIVE VIEW t(n) AS
    VALUES (1)
  UNION ALL
    SELECT n+1 FROM t WHERE n < 100;

Links

Updatable Views

Simple views can now be updated in the same way as regular tables. The view can only reference one table (or another updatable view) and must not contain more complex operators, join types etc.

If the view has a WHERE condition, UPDATEs and DELETEs on the underlying table will be restricted to those rows it defines. However UPDATEs may change a row so that it is no longer visible in the view, and an INSERT command can potentiall insert rows which do not satisfy the WHERE condition.

More complex views can be made updatable as before using INSTEAD OF triggers or INSTEAD rules.

Simple example using the following table and view:

CREATE TABLE postgres_versions (
 version VARCHAR(3) PRIMARY KEY,
 nickname TEXT NOT NULL
);
INSERT INTO postgres_versions VALUES
 ('8.0', 'Excitable Element'),
 ('8.1', 'Fishy Foreign Key'),
 ('8.2', 'Grumpy Grant'),
 ('8.3', 'Hysterical Hstore'),
 ('8.4', 'Insane Index'),
 ('9.0', 'Jumpy Join'),
 ('9.1', 'Killer Key'),
 ('9.2', 'Laconical Lexer'),
 ('9.3', 'Morose Module');
CREATE VIEW postgres_versions_9 AS
 SELECT version, nickname
   FROM postgres_versions
  WHERE version LIKE '9.%';

postgres=# SELECT * from postgres_versions_9;
 version |    nickname     
---------+-----------------
 9.0     | Jumpy Join
 9.1     | Killer Key
 9.2     | Laconical Lexer
 9.3     | Morose Module
(4 rows)

postgres=# UPDATE postgres_versions_9 SET nickname='Maniac Master' WHERE version='9.3';
UPDATE 1
postgres=# SELECT * from postgres_versions_9;
 version |    nickname     
---------+-----------------
 9.0     | Jumpy Join
 9.1     | Killer Key
 9.2     | Laconical Lexer
 9.3     | Maniac Master
(4 rows)

Links

Writeable Foreign Tables

"Foreign Data Wrappers" (FDW) were introduced in PostgreSQL 9.1, providing a way of accessing external data sources from within PostgreSQL using SQL. The original implementation was read-only, but 9.3 will enable write access as well, provided the individual FDW drivers have been updated to support this. At the time of writing, only the Redis and PostgreSQL drivers have write support (need to verify this).

See below for more information on the PostgreSQL driver and a simple example.

Links

postgres_fdw

A new contrib module, postgres_fdw, provides the eponymous foreign data wrapper for read/write access to remote PostgreSQL servers (or to another database on the local server).

A simple usage example (connecting to a different database on the same server for ease of testing).

1. Build the postgres_fdw contrib module

cd contrib/postgres_fdw
make install

2. Install the module as an extension

postgres=# CREATE EXTENSION postgres_fdw;
CREATE EXTENSION

3. Create a test "remote" database

postgres=# CREATE DATABASE fdw_test;
CREATE DATABASE
postgres=# \c fdw_test
You are now connected to database "fdw_test" as user "barwick".
fdw_test=# CREATE TABLE world (greeting TEXT);
CREATE TABLE

4. Create the server, user and table mapping so that the local PostgreSQL server knows about the remote database:

postgres=# CREATE SERVER postgres_fdw_test FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localhost', dbname 'fdw_test');
CREATE SERVER
postgres=# CREATE USER MAPPING FOR PUBLIC SERVER postgres_fdw_test OPTIONS (password '');
CREATE USER MAPPING
postgres=# CREATE FOREIGN TABLE other_world (greeting TEXT) SERVER postgres_fdw_test OPTIONS (table_name 'world');
CREATE FOREIGN TABLE
postgres=# \det
          List of foreign tables
 Schema |    Table    |      Server       
--------+-------------+-------------------
 public | other_world | postgres_fdw_test
(1 row)

5. Manipulate the remote table as if it were a local one:

postgres=# INSERT INTO other_world VALUES('Take me to your leader');
INSERT 0 1 
postgres=# \c fdw_test
You are now connected to database "fdw_test" as user "barwick".
fdw_test=# SELECT * FROM world;
         hello          
------------------------
 Take me to your leader
(1 row)

Here's another example, where we link to the "account" and "branches" tables on a remote pgbench database:

 create extension postgres_fdw;
 create user mapping for current_user server remotesrv options ( user 'postgres', password 'password' );
 create server remotesrv foreign data wrapper postgres_fdw options ( host '192.168.1.5', port '5433', dbname 'bench');
 create foreign table remoteacct (aid int, bid int, abalance int, filler char(84)) 
 server remotesrv options ( table_name 'pgbench_accounts' );
 create foreign table remotebranch ( bid int, bbalance int, filler char(88) ) 
 server remotesrv  options ( table_name 'pgbench_branches');

Having set this up, we can query the remote server:

   explain select * from remotebranch join remoteacct using ( bid ) where bid = 5;
                                   QUERY PLAN
   ----------------------------------------------------------------------------
   Nested Loop  (cost=200.00..225.40 rows=1 width=712)
   ->  Foreign Scan on remotebranch  (cost=100.00..112.66 rows=1 width=364)
   ->  Foreign Scan on remoteacct  (cost=100.00..112.73 rows=1 width=352)

Notice a couple of things: first, JOIN push-down to the remote server isn't implemented yet (wait for 9.4!). Second, we're not getting real estimates for the remote tables. This is fixable, but telling Postgres to query the remote DB for EXPLAIN information:

   alter foreign table remotebranch options (add use_remote_estimate 'true');
   alter foreign table remoteacct options (add use_remote_estimate 'true');
   bench=# explain select * from remotebranch join remoteacct using ( bid ) where bid = 5;
                                   QUERY PLAN
   ------------------------------------------------------------------------------
   Nested Loop  (cost=200.42..7648.07 rows=99400 width=712)
   ->  Foreign Scan on remotebranch  (cost=100.00..101.14 rows=1 width=364)
   ->  Foreign Scan on remoteacct  (cost=100.42..6552.93 rows=99400 width=97)

Links

Replication Improvements

PostgreSQL's built-in binary replication has been improved in four ways: streaming-only remastering, fast failover, and architecture-independent streaming, and pg_basebackup conf setup.

Streaming-Only Remastering

"Remastering" is the process whereby a replica in a set of replicas becomes the new master for all of the other replicas. For example:

  1. Master M1 is replicating to replicas R1, R2 and R3.
  2. Master M1 needs to be taken down for a hardware upgrade.
  3. The DBA promotes R1 to be the master.
  4. R2 and R3 are reconfigured & restarted, and now replicate from R1

That's remastering in a nutshell. It's even more useful in combination with cascading replication (introduced in 9.2).

In prior versions of PostgreSQL, remastering required using WAL file archiving. Cascading replicas could not switch masters using streaming alone; they would have to be re-cloned. That restriction has now been lifted, allowing remastering from just the stream. This makes it much easier to set up large replication clusters; administrators no longer have to set up an online WAL archive if they don't need one for disaster recovery.

Incidentally, this also makes it possible to set up "cycles" where replication is going in a circle. Whether that's a feature or a bug depends on your perspective.

Links:

Fast Failover

Allows replicas to be promoted in less than a second, permitting 99.999% uptime. More details TBD.

Architecture-Independent Streaming

Allows streaming of base backups (using pg_basebackup) and log archiving (using pg_receivexlog) between different OSes and hardware architectures. (Note that you still need the same architecture to restore the backups, but this is useful for example with centralized backup servers)

pg_basebackup conf setup

If you use the -R switch, pg_basebackup will create a simple (streaming-only) recovery.conf file in the newly cloned data directory. This means that you can immediately start the new database server without doing additional editing.

Backward compatibility

These changes may incur regressions in your applications.

CREATE TABLE output

CREATE TABLE will no longer output messages about implicit index and sequence creation unless the log level is set to DEBUG1.

Server settings

  • Parameter 'commit_delay' is restricted to superusers only
  • Parameter 'replication_timeout' has been renamed to 'wal_sender_timeout'
  • Parameter 'unix_socket_directory' has been replaced 'unix_socket_directories'
  • In-memory sorts to use their full memory allocation; if work_mem was set on the basis of the pre-9.3 behavior, its value may need to be reviewed.

WAL filenames may end in FF

WAL files will now be written in a continuous stream, rather than skipping the last 16MB segment every 4GB, meaning WAL filenames may end in FF. WAL backup or restore scripts may need to be adapted.

Personal tools