Table partitioning

From PostgreSQL wiki

(Redirected from Todo:PartitionedTables)
Jump to: navigation, search



Status Quo

Currently we allow the user to (manually) create arbitrary nested tables with arbitrary constraints and then the planner tries to detect at run-time which child tables are candidates for the query. See PostgreSQL Partitioning for details. There are some 3rd party plugins that simplify the (manual) task/triggers, etc. see bottom of this page.

Today, at create time you create a master table, children that inherit from it (and how they are partitioned), separate indices for each child table, and create an insert trigger so that new rows are inserted to the appropriate (child) table (and/or more aggressive measures, such as allowing updates to the partitioned key [by default, updates to rows' partitioned key leave them in the same partition, possibly in error], dynamically allocating new child tables [be careful with race conditions], etc. see the various blogs out there).

Resolved Issues


  • We never exclude the parent table
  • We don't handle INSERTs on the parent table
  • It requires a lot of manual work to set it up

Overviews of Project Goals

List discussions

Possible Directions


Allow users to declare their intention with partitioned tables. Ie, declare what the partition key is and what range or values are covered by each partition.

I think this would mean two new types of relation. One "meta-table" that acts like a view, in that it doesn't have an attached filenode. It would also have some kind of meta data about the partition key but no view definition, it would act like parent tables in nested table structure do now. The other would be "partition" which would be a separate namespace from tables and would have attached information about what values of the partition key it covered.


  • Makes it more reasonable to handle inserts automatically since the structure is explicit and doesn't require making logical deductions.
  • More idiot-proof, ie you can't set up nonsensical combinations of constraints.
  • Consistent with other databases and DBA expectations.


  • Less flexible, you can't set up arbitrary non-traditional structures such as having some data in the parent table or having extra columns in some children.



First attempts to support auto-partitioning have been made using triggers.

  • avoid specific languages such as pgpsql that requires 'CREATE LANGUAGE'
  • performance of C trigger 4 to 5 times faster than pgpsql
  • insert/copy returns 0 rows when all rows have been routed by trigger from master to child tables
  • chaining triggers allow tunable behavior in case of rows not matching any partition: add an error trigger, move to an overflow table, create new partitions dynamically
  • constraint_exclusion does not work well with prepared statements. It might possible to convert CHECKs to One-Time Filter plan nodes if the condition is a variable.

Active Work In Progress


Syntax is proposed at "Syntax for partitioning", second version. The syntax resembles Oracle and MySQL. See also Todo#Administration (Simplify ability to create partitioned tables).

-- create partitioned table and child partitions at once.
CREATE TABLE parent (...)
PARTITION BY [ RANGE | LIST ] ( key ) [ opclass ]
[ (
     PARTITION child
           VALUES LESS THAN { ... | MAXVALUE } -- for RANGE
         | VALUES [ IN ] ( { ... | DEFAULT } ) -- for LIST
       [ WITH ( ... ) ] [ TABLESPACE tbs ]
     [, ...]
  ) ] ;

-- add a partition key to a table.
ALTER TABLE parent PARTITION BY  [ RANGE | LIST ] ( key ) [ opclass ] [ (...) ] ;

-- create a new partition on a partitioned table.

-- add a table as a partition.

-- Remove a partition as a normal table.

Internal representation

On-disk structure is included in the "Syntax for partitioning" patch. On-memory structure will be proposed in a future patch

On-disk structure

A new system table "pg_partition" added. Partition keys are stored in it.

CREATE TABLE pg_catalog.pg_partition
   partrelid   oid    NOT NULL, -- partitioned table oid
   partopclass oid    NOT NULL, -- operator class to compare keys
   partkind    "char" NOT NULL, -- kind of partition: RANGE or LIST
   partkey     text,            -- partition key expression

   PRIMARY KEY (partrelid),
   FOREIGN KEY (partrelid)   REFERENCES pg_class (oid),
   FOREIGN KEY (partopclass) REFERENCES pg_opclass (oid)

A new column "inhvalues" are added into pg_inherits. Partition values for each partition are stored in it.

ALTER TABLE pg_class.pg_inherits ADD COLUMN inhvalues anyarray ;
  • RANGE partition has an upper value of the range in inhvalues.
  • LIST partition has an array with multiple elements in inhvalues.
  • An overflow partition has an empty array in inhvalues.
  • A normal inherited table has a NULL in inhvalues.

On-memory structure

A cached list of partitions are sorted by partition values and stored in the relcache for the parent table. Changes to the partitions would need to invalidate parent caches to ensure the cache is accurately maintained.



INSERT TRIGGERs will be replaced with specialized tuple-routing feature using on-memory structure. Tuples will be routed in O(log N). It also solve "0 row affected" problem in INSERT TRIGGERs.


CHECK constraints continue to be used for a while.

It would be improved using on-memory structure; instead of CHECK constraints for each child tables, we can use a sorted list in the parent table. Constraint exclusion can be in O(log N) order instead of O(N) of now.


We don't expand those commands for now, but they might have to be expanded like as TRUNCATE.

Future improvements

They are hard to fix in 9.0, but should continue to be improved in the future releases.


  • Support SPLIT and MERGE for existing partitions. See also Kedar's patch
  • Support UPDATE of partition keys and values.
  • Support adding a partition between existing partitions. It requires SPLIT feature.
  • Support sub-partitions.
  • Support some partition kinds for GIS types. For example, "PARTITION BY GIST" holds partition keys as a GiST tree in on-memory structure.
  • Support HASH partitions. Each partition could be a FOREIGN TABLE in SQL/MED. In other words, it is PL/Proxy integration.


  • SELECT FOR SHARE/UPDATE for parent tables.
  • Prepared statements that uses partition keys in place holders.
    • An idea is to convert check constraints into One-Time_Filter [1]
  • Unique constraint over multiple partitions, when each partition has a unique index on set/superset of partition keys
  • Unique constraints over multiple partitions in the general case (typically called as "global index").


  • Optimization for min/max, LIMIT + ORDER BY, GROUP BY on partition keys.
  • Optimization when constraint exclusion are used with stable or volatile functions. It is a very common case that the partition key is timestamp and compared with now().
  • Join optimization for two partitioned tables.

Third-Party Tools

PG Partition Manager

  • Project Home Page
  • This is an extension that automates time & serial based partitioning (basically does interval partitioning setting up the right triggers for you).
  • Handles setting up, partitioning existing data, dropping unneeded child tables, & undoing partitioning.
Personal tools