Simon Riggs' Development Projects

From PostgreSQL wiki
Jump to navigationJump to search

Overview

This page contains details of a number of development projects that I'm either working on or planning to work on in the next Development Cycle.

Well, it did once. This was last updated in about 2010.

I'm sponsored by a number of different companies, so tracking my development interests in a central place makes sense for all concerned. All of the projects and work listed here will be released as BSD licenced code with copyright assigned to PostgreSQL Global Development Group. If you are interested in sponsoring me, please visit http://www.2ndquadrant.com for contact details and other information. I'm only partially sponsored currently, so your contributions are welcome.

In some cases, projects may be done in collaboration with others, or even handed over to them completely. In that case, I'll put links through to those projects.

Please note that plans written here do not imply any level of acceptance by PostgreSQL Hacker community. Each feature needs detailed planning and then development following the PostgreSQL Community Development process.

This is an open Wiki, so you can edit this, but please don't do that apart from minor edits. I will be editing pages as time progresses, either to enhance plans or update dev status.

Active Developments

Likely to work on these things first, though some will take longer than others

  • Truncate Triggers (committed)
  • COPY performance (patch)
  • Snapshot cloning for pg_dump performance (external project)

VLDB work

Development Plans

  • Very Large Database (VLDB) - Enhancements focused around Terabyte-plus data stores, but not restricted to just Data Warehouses
  • Recovery and Replication - Further robustness enhancements
  • Enterprise-class Performance - Further performance and scalability improvements

Very Large Database (VLDB)

VLDB covers a whole host of topics. Another page here discussing this is Data Warehousing. Primary concerns are:

  • Table Maintenance
    • VACUUMing
    • Backup
    • Software Upgrade
    • Database size
  • Query Performance
    • Advanced Partitioning
    • Index-only Scans
    • Parallelism
    • Low-level scan performance improvements
    • Additional issues
      • NOT IN
  • Data Loading
    • Load performance
    • Error Handling

Table maintenance

VACUUMs are clearly a problem for VLDBs, especially when much of the data may be read-only. Backup may also be required to WORM media or tape. Solution here is to implement Read-Only Tables that will never require VACUUMing. Incremental backups are smoothed by this, but we also need migratable tables that can be moved easily from one server to another.

  • Read-only Tables
  • Migratable Tables
  • Block-level Binary Upgrades
  • Database size reductions
    • NUMERIC with variable length headers (~1-3 bytes/col)
    • NUMERIC scale reduction (2 bytes/col)
    • Row Visibility Overhead reduction (8 bytes/col)
    • Column-value compression
    • Reduction in length of NULL bitmap
    • Nirvana issue: remove need for column alignment

Currently we store xmin, xmax and xvac/combo (3x 4 bytes) for all tuples. Also store t_ctid (6 bytes). For deletion and to lock rows for write we must have xmax. To update rows we must have t_ctid - could we save those bytes if table prevented UPDATEs? If we had block-level INSERTs we could store xmin/combo at block level rather than tuple level. Maybe that would allow saving 8 bytes/row. Combining that with visibility map? Removing all of this is going to be very complex; better to look at compression of whole blocks.

NULL bitmap includes one bit for each column in table, including NOT NULL columns. Could be possible to reduce size of bitmap, though that would mean that changing a column from NOT NULL to NULLable would not be possible (ideas?).

Column-value compression should be possible. Sortof like partial enums?

Query Performance

  • statement_cost_limit (requested by Csaba Nagy)
  • Index-only scans (requested by Pablo Alcaraz and Gunther Schadow)
  • Advanced Partitioning
  • Parallelism (not likely for 8.4)
  • Lookaside tables (banned by TPC-D onwards 'cos they are too useful!)
  • Low-level scan performance improvements

Data Loading

Data Loading performance needs to be improved. Currently COPY is CPU-bound, specifically in the parsing from input data file into individual columns. Other issues are

  • No batch-mode Referential Integrity
  • Need to handle data errors from COPY, rather than aborting at first error - pg_loader does this, so may be less of a priority
  • Batch update of indexes - pg_bulkload pioneered this
  • Block-at-a-time inserts
  • Reduction of cache spoiling effect of COPY
  • Data loading can use fast-mode COPY if fine-grained partitioning is possible

Recovery and Replication

Currently the maintainer for PITR and Log Shipping replication.

Replication

  • Truncate Triggers, mainly to allow Slony to replicate Truncates
  • Synchronous Replication
  • Hot Standby

Recovery

  • Recovery Parallelism

Needs to happen after Hot Standby, but planned to allow it to be easy to do this.

  • WAL size reduction
    • 4 bytes removed from WAL record header
    • Reduction in WAL from Updates by only logging changed columns
    • Nirvana issue: remove need for full-page writes
  • xlogdump
  • Dropped Relation Cache

Enterprise-class Performance

  • Performance Regression Tests
  • Benchmark Development
    • TPC-E harness
  • Advanced Schema Knowledge
  • Sort Improvements

http://archives.postgresql.org/pgsql-hackers/2007-11/msg01101.php

  • Scalability Improvements

http://archives.postgresql.org/pgsql-hackers/2007-07/msg00948.php