DataWarehousing
Data Warehousing Features
- On-disk Bitmap Index (anyone game to finish GP patch?)
- Error Handling in COPY
- Parallel Query
- Windowing Functions
- MERGE
- Parallel Index Build
Note that there's much overlap between here and the Simon Riggs' Development Projects planning.
On-Disk Bitmap Index
Error Handling in COPY
Handled OK by pg_loader, so lower priority
Parallel Query
2 main kinds are single-node and multi-node parallelism Fairly easy to get something working to improve SeqScan performance, but will be more difficult to get something working that applies further up into the executor. Challenges are
- Planner changes
- How to manage the pool of query slaves
- Deciding which parts of the data are accessed by which slave
Notably the last two aren't an issue at all in most multi-node parallel architectures, since there is one query slave per node, plus each node operates only on the data that has been statically partitioned to it, often using a hash partitioning scheme.
Windowing Functions
SQL:2003 feature
MERGE
SQL:2003 feature
Parallel Index Build
Josh Berkus: not sure how this works exactly, but it speeds Oracle up considerably
Slides
Overview of "Data warehousing with PostgreSQL" presented by Gabriele Bartolini at PGDay.EU 2009 (Data warehousing with PostgreSQL)