DataWarehousing

From PostgreSQL wiki
Jump to navigationJump to search

Data Warehousing Features

  • On-disk Bitmap Index (anyone game to finish GP patch?)
  • Error Handling in COPY
  • Parallel Query
  • Windowing Functions
  • MERGE
  • Parallel Index Build

Note that there's much overlap between here and the Simon Riggs' Development Projects planning.

On-Disk Bitmap Index

Error Handling in COPY

Handled OK by pg_loader, so lower priority

Parallel Query

2 main kinds are single-node and multi-node parallelism Fairly easy to get something working to improve SeqScan performance, but will be more difficult to get something working that applies further up into the executor. Challenges are

  • Planner changes
  • How to manage the pool of query slaves
  • Deciding which parts of the data are accessed by which slave

Notably the last two aren't an issue at all in most multi-node parallel architectures, since there is one query slave per node, plus each node operates only on the data that has been statically partitioned to it, often using a hash partitioning scheme.

Windowing Functions

SQL:2003 feature

MERGE

SQL:2003 feature

Parallel Index Build

Josh Berkus: not sure how this works exactly, but it speeds Oracle up considerably

Slides

Overview of "Data warehousing with PostgreSQL" presented by Gabriele Bartolini at PGDay.EU 2009 (Data warehousing with PostgreSQL)