What's new in PostgreSQL 9.3
This page contains an overview of PostgreSQL Version 9.3's features, including descriptions, testing and usage information, and links to blog posts containing further information. See also the Release Notes and PostgreSQL 9.3 Open Items.
Configuration directive 'include_dir'
In addition to including separate configuration files via the 'include' directive, postgresql.conf now also provides the 'include_dir' directive which reads all files ending in ".conf" in the specified directory or directories.
Directories can be specified either as an absolute path or relative from the location of the main configuration file. Directories will be read in the order they occur, while files will be read sorted by C locale rules. It is possible for included files to contain their own 'include_dir' directives.
Links
COPY FREEZE for more efficient bulk loading
To improve initial bulk loading of tables, a FREEZE parameter has been added to the COPY command to enable data to be copied with rows already frozen. See the documentation for usage and caveats.
Links
- Documentation - see the FREEZE parameter
Custom Background Workers
This functionality enables modules to register themselves as "background worker processes", effectively operating as customised server processes. This is a powerful new feature with a wide variety of possible use cases, such as monitoring server activity, performing tasks at pre-defined intervals, customised logging etc.
Background worker processes can attach to PostgreSQL's shared memory area and to connect to databases internally; by linking to libpq they can also connect to the server in the same way as a regular client application. Background worker processes are written in C, and as server processes they have unrestricted access to all data and can potentially impact other server processes, meaning they represent a potential security / stability risk. Consequently background worker processes should be developed and deployed with appropriate caution.
Providing an example would go beyond the scope of this article; please refer to the blogs linked below, which provide annotated sample code. The PostgreSQL source also contains a sample background worker process in contrib/worker_spi.
Links
- Documentation
- Background worker processes
- Postgres 9.3 feature highlight: handling signals with custom bgworkers
- Custom background workers
- "Hello World" with custom bgworkers
- Custom Background Workers - a practical example
Data Checksums
It is now possible for PostgreSQL to checksum data pages and report corruption. This is a cluster-wide setting and cannot be applied to individual databases or objects. Also be aware that this facility may incur a noticeable performance penalty. This option must be enabled during initdb and cannot be changed (although there is a new GUC parameter "ignore_checksum_failure" which will force PostgreSQL to continue processing a transaction even if corruption is detected).
Links
JSON: Additional functionality
The JSON datatype and two supporting functions for converting rows and arrays were introduced in PostgreSQL 9.2. With PostgreSQL 9.3, dedicated JSON operators have been introduced and the number of functions expanded to 12, including JSON parsing support. The JSON parser has exposed for use by other modules such as extensions as an API.
Additionally, the hstore extension has gained two JSON-related functions, hstore_to_json(hstore) and hstore_to_json_loose(hstore). The former is used when an hstore value is cast to json.
Links
- Documentation
- Waiting for 9.3 – JSON generation improvements
- Waiting for 9.3 – Add new JSON processing functions and parser API
- Postgres 9.3 feature highlight: JSON data generation
- Postgres 9.3 feature highlight: JSON operators
- Postgres 9.3 feature highlight: JSON parsing functions
LATERAL JOIN
Put simply, a LATERAL JOIN enables a subquery in the FROM part of a clause to reference columns from preceding items in the FROM list.
The following is a self-contained (if quite pointless) example of the kind of clause it is sometimes useful to be able to write:
SELECT base.nr,
       multiples.multiple
  FROM (SELECT generate_series(1,10) AS nr) base
  JOIN (SELECT generate_series(1,10) AS b_nr, base.nr * 2 AS multiple) multiples
       ON multiples.b_nr = base.nr
but which produces an error message like the following:
 LINE 4:    JOIN (SELECT generate_series(1,10) AS b_nr, base.nr * 2 A...
                                                       ^
 HINT:  There is an entry for table "base", but it cannot be referenced from this part of the query.
Using LATERAL JOIN, it's now possible for the second subquery to reference a value from the first:
SELECT base.nr,
       multiples.multiple
  FROM (SELECT generate_series(1,10) AS nr) base,
LATERAL (
  SELECT multiples.multiple FROM
   ( SELECT generate_series(1,10) AS b_nr, base.nr * 2 AS multiple ) multiples
    WHERE multiples.b_nr = base.nr
  ) multiples;
Note that function calls can now directly reference columns from preceding FROM items, even without the LATERAL keyword. Example:
CREATE FUNCTION multiply(INT, INT) RETURNS INT LANGUAGE SQL AS $$ SELECT $1 * $2; $$
Query with function call in the FROM list:
SELECT base.nr,
       multiple
  FROM (SELECT generate_series(1,10) AS nr) base,
       multiply(base.nr, 2) AS multiple
In previous versions, this query would generate an error like this:
ERROR: function expression in FROM cannot refer to other relations of same query level LINE 4: multiply(base.nr, 2) AS multiple
See the articles linked below for some more realistic examples.
Links
- Documentation: SELECT (see section LATERAL)
- Waiting for 9.3: Implement SQL standard lateral subqueries
- PostgreSQL 9.3 Lateral Part 1: Use with HStore
- PostgreSQL 9.3 Lateral Part 2: The Lateral Left Join
Parallel pg_dump for faster backups
The new -j njobs (--jobs='njobs') option enables pg_dump to dump njobs tables simultaneously, reducing the time it takes to dump a database. Example:
pg_dump -U postgres -j4 -Fd -f /tmp/mydb-dump mydb
This dumps the contents of database "mydb" to the directory "/tmp/mydb-dump" using four simultaneous connections.
Caveats:
- Parallel dumps can only be in directory format
- Parellel dumps will place more load on the database, although total dump time should be shorter
- pg_dump will open njobs + 1 connections to the database, so max_connections should be set appropriately
- Requesting exclusive locks on database objects while running a parallel dump could cause the dump to fail
- Parallel dumps from pre-9.2 servers need special attention
An ad-hoc test of this feature on a 4.5GB database (which compresses to around 370MB as a dump) with different values of -j produced following timings:
- (no -j): 1m3s
- -j2: 0m28s
- -j3: 0m24s
- -j4: 0m24s
- -j5: 0m25s
Links
- pg_dump documentation
- Waiting for 9.3 – Add parallel pg_dump option
- Postgres 9.3 feature highlight: parallel pg_dump
'pg_isready' server monitoring tool
pg_isready is a wrapper for PQping created as a standard client application. It accepts a libpq-style connection string and returns one of four exit statuses:
- 0: server is accepting connections normally
- 1: server is rejecting connections (for example during startup)
- 2: server did not response to the connection attempt
- 3: no connection attempt was made (e.g. due to invalid connection parameters)
Example usage:
barwick@localhost:~$ pg_isready /tmp:5432 - accepting connections barwick@localhost:~$ pg_isready --quiet && echo "OK" OK barwick@localhost:~$ pg_isready -p5431 -h localhost localhost:5431 - accepting connections barwick@localhost:~$ pg_isready -h example.com example.com:5432 - no response
Links
In 9.3, PostgreSQL has switched from using SysV shared memory to using Posix shared memory and mmap for memory management. This allows easier installation and configuration of PostgreSQL, and means that except in unusual cases, system parameters such as SHMMAX and SHMALL no longer need to be adjusted. We need users to rigorously test and ensure that no memory management issues have been introduced by the change.
Links
Trigger Features
Event Triggers
Triggers can now be defined on DDL events (CREATE, ALTER, DROP).
Links
- Documentation:
VIEW Features
Materialized Views
Materialized views are a special kind of view which cache the view's output as a physical table, rather than executing the underlying query on every access. Conceptually they are similar to "CREATE TABLE AS", but store the view definition so it can be easily refreshed.
Note that materialized views cannot be auto-refreshed; refreshes are not incremental; and the base table cannot be manipulated. They will however be automatically populated by pg_restore (more precisely, pg_dump includes a "REFRESH MATERIALIZED VIEW" statement).
Contrived example
Create and populate a table with some arbitrary data:
CREATE TABLE matview_test_table ( id SERIAL PRIMARY KEY, ts TIMESTAMPTZ NOT NULL )
INSERT INTO matview_test_table VALUES ( DEFAULT, ((NOW() - '2 days'::INTERVAL) + (generate_series(1,1000) || ' seconds')::INTERVAL)::TIMESTAMPTZ )
Create a materialized view which lists the 5 most recent entries:
CREATE MATERIALIZED VIEW matview_test_view AS
  SELECT id, ts
    FROM matview_test_table
ORDER BY id DESC 
   LIMIT 5
postgres=# SELECT * from matview_test_view ; id | ts ------+------------------------------- 1000 | 2013-05-06 12:02:10.974711+09 999 | 2013-05-06 12:02:09.974711+09 998 | 2013-05-06 12:02:08.974711+09 997 | 2013-05-06 12:02:07.974711+09 996 | 2013-05-06 12:02:06.974711+09 (5 rows)
Add more data to the table:
INSERT INTO matview_test_table VALUES ( DEFAULT, ((NOW() - '1 days'::INTERVAL) + (generate_series(1,1000) || ' seconds')::INTERVAL)::TIMESTAMPTZ )
View output does not change:
postgres=# SELECT * from matview_test_view ; id | ts ------+------------------------------- 1000 | 2013-05-06 12:02:10.974711+09 999 | 2013-05-06 12:02:09.974711+09 998 | 2013-05-06 12:02:08.974711+09 997 | 2013-05-06 12:02:07.974711+09 996 | 2013-05-06 12:02:06.974711+09 (5 rows)
Refresh the view to display the latest table entries:
postgres=# REFRESH MATERIALIZED VIEW matview_test_view ; REFRESH MATERIALIZED VIEW postgres=# SELECT * from matview_test_view ; id | ts ------+------------------------------- 2001 | 2013-05-07 12:03:10.696626+09 2000 | 2013-05-07 12:03:09.696626+09 1999 | 2013-05-07 12:03:08.696626+09 1998 | 2013-05-07 12:03:07.696626+09 1997 | 2013-05-07 12:03:06.696626+09 (5 rows)
The links below contain more detailed information and examples.
Links
- Documentation:
- Waiting for 9.3 – Add a materialized view relations
- Postgres 9.3 feature highlight: Materialized views
Recursive View Syntax
The CREATE RECURSIVE VIEW syntax provides a shorthand way of formulating a recursive common table expression (CTE) as a view.
Taking the example from the CTE documentation:
WITH RECURSIVE t(n) AS (
    VALUES (1)
  UNION ALL
    SELECT n+1 FROM t WHERE n < 100
)
SELECT * FROM t;
This can be created as a recursive view as follows:
CREATE RECURSIVE VIEW t(n) AS
    VALUES (1)
  UNION ALL
    SELECT n+1 FROM t WHERE n < 100;
Links
Updatable Views
Simple views can now be updated in the same way as regular tables. The view can only reference one table (or another updatable view) and must not contain more complex operators, join types etc.
If the view has a WHERE condition, UPDATEs and DELETEs on the underlying table will be restricted to those rows it defines. However UPDATEs may change a row so that it is no longer visible in the view, and an INSERT command can potentiall insert rows which do not satisfy the WHERE condition.
More complex views can be made updatable as before using INSTEAD OF triggers or INSTEAD rules.
Simple example using the following table and view:
 
CREATE TABLE postgres_versions (
 version VARCHAR(3) PRIMARY KEY,
 nickname TEXT NOT NULL
);
INSERT INTO postgres_versions VALUES
 ('8.0', 'Excitable Element'),
 ('8.1', 'Fishy Foreign Key'),
 ('8.2', 'Grumpy Grant'),
 ('8.3', 'Hysterical Hstore'),
 ('8.4', 'Insane Index'),
 ('9.0', 'Jumpy Join'),
 ('9.1', 'Killer Key'),
 ('9.2', 'Laconical Lexer'),
 ('9.3', 'Morose Module');
CREATE VIEW postgres_versions_9 AS
 SELECT version, nickname
   FROM postgres_versions
  WHERE version LIKE '9.%';
 
postgres=# SELECT * from postgres_versions_9;
 version |    nickname     
---------+-----------------
 9.0     | Jumpy Join
 9.1     | Killer Key
 9.2     | Laconical Lexer
 9.3     | Morose Module
(4 rows)
postgres=# UPDATE postgres_versions_9 SET nickname='Maniac Master' WHERE version='9.3';
UPDATE 1
postgres=# SELECT * from postgres_versions_9;
 version |    nickname     
---------+-----------------
 9.0     | Jumpy Join
 9.1     | Killer Key
 9.2     | Laconical Lexer
 9.3     | Maniac Master
(4 rows)
Links
- Documentation
- Waiting for 9.3 – Support automatically-updatable views
- Postgres 9.3 feature highlight: auto-updatable views
Writeable Foreign Tables
"Foreign Data Wrappers" (FDW) were introduced in PostgreSQL 9.1, providing a way of accessing external data sources from within PostgreSQL using SQL. The original implementation was read-only, but 9.3 will enable write access as well, provided the individual FDW drivers have been updated to support this. At the time of writing, only the PostgreSQL driver has write support.
See below for more information on the PostgreSQL driver and a simple example.
Links
- CREATE SERVER
- CREATE FOREIGN DATA WRAPPER
- Documentation: Writing A Foreign Data Wrapper
- Postgres 9.3 feature highlight: writable foreign tables
- Waiting for 9.3 – Support writable foreign tables
postgres_fdw
A new contrib module, postgres_fdw, provides the eponymous foreign data wrapper for read/write access to remote PostgreSQL servers (or to another database on the local server).
A simple usage example (connecting to a different database on the same server for ease of testing).
1. Build the postgres_fdw contrib module
cd contrib/postgres_fdw make install
2. Install the module as an extension
postgres=# CREATE EXTENSION postgres_fdw; CREATE EXTENSION
3. Create a test "remote" database
postgres=# CREATE DATABASE fdw_test; CREATE DATABASE postgres=# \c fdw_test You are now connected to database "fdw_test" as user "barwick". fdw_test=# CREATE TABLE world (greeting TEXT); CREATE TABLE
4. Create the server, user and table mapping so that the local PostgreSQL server knows about the remote database:
postgres=# CREATE SERVER postgres_fdw_test FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localhost', dbname 'fdw_test');
CREATE SERVER
postgres=# CREATE USER MAPPING FOR PUBLIC SERVER postgres_fdw_test OPTIONS (password '');
CREATE USER MAPPING
postgres=# CREATE FOREIGN TABLE other_world (greeting TEXT) SERVER postgres_fdw_test OPTIONS (table_name 'world');
CREATE FOREIGN TABLE
postgres=# \det
          List of foreign tables
 Schema |    Table    |      Server       
--------+-------------+-------------------
 public | other_world | postgres_fdw_test
(1 row)
5. Manipulate the remote table as if it were a local one:
postgres=# INSERT INTO other_world VALUES('Take me to your leader');
INSERT 0 1 
postgres=# \c fdw_test
You are now connected to database "fdw_test" as user "barwick".
fdw_test=# SELECT * FROM world;
         hello          
------------------------
 Take me to your leader
(1 row)
Here's another example, where we link to the "account" and "branches" tables on a remote pgbench database:
create extension postgres_fdw; create user mapping for current_user server remotesrv options ( user 'postgres', password 'password' ); create server remotesrv foreign data wrapper postgres_fdw options ( host '192.168.1.5', port '5433', dbname 'bench'); create foreign table remoteacct (aid int, bid int, abalance int, filler char(84)) server remotesrv options ( table_name 'pgbench_accounts' ); create foreign table remotebranch ( bid int, bbalance int, filler char(88) ) server remotesrv options ( table_name 'pgbench_branches');
Having set this up, we can query the remote server:
   explain select * from remotebranch join remoteacct using ( bid ) where bid = 5;
                                   QUERY PLAN
   ----------------------------------------------------------------------------
   Nested Loop  (cost=200.00..225.40 rows=1 width=712)
   ->  Foreign Scan on remotebranch  (cost=100.00..112.66 rows=1 width=364)
   ->  Foreign Scan on remoteacct  (cost=100.00..112.73 rows=1 width=352)
Notice a couple of things: first, JOIN push-down to the remote server isn't implemented yet (wait for 9.4!). Second, we're not getting real estimates for the remote tables. This is fixable, but telling Postgres to query the remote DB for EXPLAIN information:
   alter foreign table remotebranch options (add use_remote_estimate 'true');
   alter foreign table remoteacct options (add use_remote_estimate 'true');
   bench=# explain select * from remotebranch join remoteacct using ( bid ) where bid = 5;
                                   QUERY PLAN
   ------------------------------------------------------------------------------
   Nested Loop  (cost=200.42..7648.07 rows=99400 width=712)
   ->  Foreign Scan on remotebranch  (cost=100.00..101.14 rows=1 width=364)
   ->  Foreign Scan on remoteacct  (cost=100.42..6552.93 rows=99400 width=97)
Links
Replication Improvements
PostgreSQL's built-in binary replication has been improved in four ways: streaming-only remastering, fast failover, and architecture-independent streaming, and pg_basebackup conf setup.
Streaming-Only Remastering
"Remastering" is the process whereby a replica in a set of replicas becomes the new master for all of the other replicas. For example:
- Master M1 is replicating to replicas R1, R2 and R3.
- Master M1 needs to be taken down for a hardware upgrade.
- The DBA promotes R1 to be the master.
- R2 and R3 are reconfigured & restarted, and now replicate from R1
That's remastering in a nutshell. It's even more useful in combination with cascading replication (introduced in 9.2).
In prior versions of PostgreSQL, remastering required using WAL file archiving. Cascading replicas could not switch masters using streaming alone; they would have to be re-cloned. That restriction has now been lifted, allowing remastering from just the stream. This makes it much easier to set up large replication clusters; administrators no longer have to set up an online WAL archive if they don't need one for disaster recovery.
Incidentally, this also makes it possible to set up "cycles" where replication is going in a circle. Whether that's a feature or a bug depends on your perspective.
Links:
Fast Failover
Allows replicas to be promoted in less than a second, permitting 99.999% uptime. More details TBD.
Architecture-Independent Streaming
Allows streaming of base backups (using pg_basebackup) and log archiving (using pg_receivexlog) between different OSes and hardware architectures. (Note that you still need the same architecture to restore the backups, but this is useful for example with centralized backup servers)
pg_basebackup conf setup
If you use the -R switch, pg_basebackup will create a simple (streaming-only) recovery.conf file in the newly cloned data directory. This means that you can immediately start the new database server without doing additional editing.
Backward compatibility
These changes may incur regressions in your applications.
CREATE TABLE output
CREATE TABLE will no longer output messages about implicit index and sequence creation unless the log level is set to DEBUG1.
Server settings
- Parameter 'commit_delay' is restricted to superusers only
- Parameter 'replication_timeout' has been renamed to 'wal_sender_timeout'
- Parameter 'unix_socket_directory' has been replaced 'unix_socket_directories'
- In-memory sorts to use their full memory allocation; if work_mem was set on the basis of the pre-9.3 behavior, its value may need to be reviewed.
WAL filenames may end in FF
WAL files will now be written in a continuous stream, rather than skipping the last 16MB segment every 4GB, meaning WAL filenames may end in FF. WAL backup or restore scripts may need to be adapted.
