Overview

This page contains details of a number of development projects that I'm either working on or planning to work on in the next Development Cycle.

Well, it did once. This was last updated in about 2010.

I'm sponsored by a number of different companies, so tracking my development interests in a central place makes sense for all concerned. All of the projects and work listed here will be released as BSD licenced code with copyright assigned to PostgreSQL Global Development Group. If you are interested in sponsoring me, please visit http://www.2ndquadrant.com for contact details and other information. I'm only partially sponsored currently, so your contributions are welcome.

In some cases, projects may be done in collaboration with others, or even handed over to them completely. In that case, I'll put links through to those projects.

Please note that plans written here do not imply any level of acceptance by PostgreSQL Hacker community. Each feature needs detailed planning and then development following the PostgreSQL Community Development process.

This is an open Wiki, so you can edit this, but please don't do that apart from minor edits. I will be editing pages as time progresses, either to enhance plans or update dev status.

Active Developments

Likely to work on these things first, though some will take longer than others

Truncate Triggers (committed)
COPY performance (patch)
Snapshot cloning for pg_dump performance (external project)

VLDB work

Segment Visibility Map

Partitioning: Segment Exclusion
ReadOnlyTables
CompressedTables
Partitioning, Segmented table options

Development Plans

Very Large Database (VLDB) - Enhancements focused around Terabyte-plus data stores, but not restricted to just Data Warehouses

Recovery and Replication - Further robustness enhancements

Enterprise-class Performance - Further performance and scalability improvements

Very Large Database (VLDB)

VLDB covers a whole host of topics. Another page here discussing this is Data Warehousing. Primary concerns are:

Table Maintenance
- VACUUMing
- Backup
- Software Upgrade
- Database size
Query Performance
- Advanced Partitioning
- Index-only Scans
- Parallelism
- Low-level scan performance improvements
- Additional issues
  - NOT IN
Data Loading
- Load performance
- Error Handling

Table maintenance

VACUUMs are clearly a problem for VLDBs, especially when much of the data may be read-only. Backup may also be required to WORM media or tape. Solution here is to implement Read-Only Tables that will never require VACUUMing. Incremental backups are smoothed by this, but we also need migratable tables that can be moved easily from one server to another.

Read-only Tables
Migratable Tables
Block-level Binary Upgrades
Database size reductions
- NUMERIC with variable length headers (~1-3 bytes/col)
- NUMERIC scale reduction (2 bytes/col)
- Row Visibility Overhead reduction (8 bytes/col)
- Column-value compression
- Reduction in length of NULL bitmap
- Nirvana issue: remove need for column alignment

Currently we store xmin, xmax and xvac/combo (3x 4 bytes) for all tuples. Also store t_ctid (6 bytes). For deletion and to lock rows for write we must have xmax. To update rows we must have t_ctid - could we save those bytes if table prevented UPDATEs? If we had block-level INSERTs we could store xmin/combo at block level rather than tuple level. Maybe that would allow saving 8 bytes/row. Combining that with visibility map? Removing all of this is going to be very complex; better to look at compression of whole blocks.

NULL bitmap includes one bit for each column in table, including NOT NULL columns. Could be possible to reduce size of bitmap, though that would mean that changing a column from NOT NULL to NULLable would not be possible (ideas?).

Column-value compression should be possible. Sortof like partial enums?

Query Performance

statement_cost_limit (requested by Csaba Nagy)
Index-only scans (requested by Pablo Alcaraz and Gunther Schadow)
Advanced Partitioning
Parallelism (not likely for 8.4)
Lookaside tables (banned by TPC-D onwards 'cos they are too useful!)
Low-level scan performance improvements

Data Loading

Data Loading performance needs to be improved. Currently COPY is CPU-bound, specifically in the parsing from input data file into individual columns. Other issues are

No batch-mode Referential Integrity
Need to handle data errors from COPY, rather than aborting at first error - pg_loader does this, so may be less of a priority
Batch update of indexes - pg_bulkload pioneered this
Block-at-a-time inserts
Reduction of cache spoiling effect of COPY
Data loading can use fast-mode COPY if fine-grained partitioning is possible

Recovery and Replication

Currently the maintainer for PITR and Log Shipping replication.

Replication

Truncate Triggers, mainly to allow Slony to replicate Truncates
Synchronous Replication
Hot Standby

Recovery

Recovery Parallelism

Needs to happen after Hot Standby, but planned to allow it to be easy to do this.

WAL size reduction
- 4 bytes removed from WAL record header
- Reduction in WAL from Updates by only logging changed columns
- Nirvana issue: remove need for full-page writes

xlogdump

Dropped Relation Cache

Enterprise-class Performance

Performance Regression Tests

Benchmark Development
- TPC-E harness

Advanced Schema Knowledge

Sort Improvements

http://archives.postgresql.org/pgsql-hackers/2007-11/msg01101.php

Scalability Improvements

http://archives.postgresql.org/pgsql-hackers/2007-07/msg00948.php

Simon Riggs' Development Projects

Contents

Overview

Active Developments

VLDB work

Development Plans

Very Large Database (VLDB)

Table maintenance

Query Performance

Data Loading

Recovery and Replication

Replication

Recovery

Enterprise-class Performance

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools