DataWarehousing

From PostgreSQL wiki

Jump to: navigation, search

Contents

Data Warehousing Features

  • On-disk Bitmap Index (anyone game to finish GP patch?)
  • Error Handling in COPY
  • Parallel Query
  • Windowing Functions
  • MERGE
  • Parallel Index Build

Note that there's much overlap between here and the Simon Riggs' Development Projects planning.

On-Disk Bitmap Index

Error Handling in COPY

Handled OK by pg_loader, so lower priority

Parallel Query

2 main kinds are single-node and multi-node parallelism Fairly easy to get something working to improve SeqScan performance, but will be more difficult to get something working that applies further up into the executor. Challenges are

  • Planner changes
  • How to manage the pool of query slaves
  • Deciding which parts of the data are accessed by which slave

Notably the last two aren't an issue at all in most multi-node parallel architectures, since there is one query slave per node, plus each node operates only on the data that has been statically partitioned to it, often using a hash partitioning scheme.

Windowing Functions

SQL:2003 feature

MERGE

SQL:2003 feature

Parallel Index Build

Josh Berkus: not sure how this works exactly, but it speeds Oracle up considerably

Slides

Overview of "Data warehousing with PostgreSQL" presented by Gabriele Bartolini at PGDay.EU 2009 (Data warehousing with PostgreSQL)

Personal tools