ColumnOrientedSTorage

From PostgreSQL wiki

Revision as of 13:53, 21 April 2014 by Simon (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

= Project to implement Column Oriented STorage in PostgreSQL

As of 9.4, PostgreSQL is a row-oriented database server. The project and requirements described here have the objective of implementing performance improvements using column storage techniques.

= Overview of Column Oriented STorage (COST)

Row oriented storage means that columns in the table are generally stored in a single heap, with each column stored on a single tuple. Various database systems have implemented physical storage so that individual columns are stored separately, with a value for each tuple in the table stored.

[1]

This style of storage has become associated with Data Warehousing and Business Intelligence systems where in conjunction with other related features it can enhance performance.

[2]

= Advantages of COST

Column Oriented Storage would give PostgreSQL

  • Faster business intelligence queries
  • Ability to improve compression of individual columns
  • Ability for PostgreSQL to exceed the current 1600 column limit in one tuple

= What's wrong with TOAST?

The Oversized Attribute Storage Technique (TOAST) does a great job at what it does - storing large values for columns. There are some improvements that can be made, such as improving compression performance or making it more configurable. However, those don't change the basic nature of TOAST: it is designed for storage, not as a means to improve performance of queries.

Specifically, TOAST can be slow for queries because

  • TOAST stores all column values separately - multiple row values are not stored adjacent to each other
  • TOAST chops up values into chunks, though without much pattern - the column value must be reconstructed for use in search queries

= What would COST look like?

Just as we set storage parameters for each column for TOAST, we would set them for COST also...

ALTER TABLE foo

 ALTER COLUMN foocol
 SET ORIENTATION column;

We might also want to store columns in groups together, like so

ALTER TABLE foo

 ALTER COLUMN foocol1 SET COLUMN GROUP 2,
 ALTER COLUMN foocol2 SET COLUMN GROUP 2;

which would give us the ability to store columns in secondary relations in groups, like Cassandra.

Personal tools