Sample Databases

From PostgreSQL wiki

Revision as of 04:02, 21 June 2014 by Pgeoghegan (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Many database systems provide sample databases with the product. A good intro to popular ones that includes discussion of samples available for other databases is Sample Databases for PostgreSQL and More

One trivial sample that PostgreSQL ships with is the Pgbench. This has the advantage of being built-in and supporting a scalable data generator--you can make databases of any size ranging from 16MB to 600GB (approximately) with the current version.

PgFoundry Samples

The latest collection of PostgreSQL compatible database samples is at PgFoundry Sample Databases. It includes three commonly used benchmark databases:

  • World: Based on the MySQL World sample. Has a list of Cities, Countries, and what language they speak.
  • dellstore2: PostgreSQL port of a database-neutral e-commerce test application developed by Dell. The original code supports three size scales in their data generator (10MB, 1GB, 100GB), currently only the normal, smallest sized data set has been ported to PostgreSQL. PostgreSQL 8.4: Windowing Functions uses this test data to show some advanced queries.
  • Pagilia: Based on MySQL's replacement for World, Sakila, which is itself inspired by the Dell DVD Store.

There are some other sample databases there as well, such as a USDA Food database and a large list of country data via ISO-3166 standards.

Other Samples

  • Book Town - used for the examples in Practical PostgreSQL
  • Mouse Genome sample data set. See instructions. Custom format dump, 1.3GB compressed, but restored database is tens of GB in size. MGI is the international database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data to facilitate the study of human health and disease. MGI use PostgreSQL in production [1], providing direct protocol access to researchers, so the custom format dump is not an afterthought. Apparently updated frequently.
  • Benchmarking databases such as DBT-2 or TPC-H can be used as samples.
  • Freebase - Various wiki style data on places/people/things - ~600MB compressed
  • IMDB - the IMDB database - see also http://code.google.com/p/imbi/
  • [2] - US federal government data collection see also sunlightlabs
  • DBpedia - wikipedia data export project
  • eoddata - historic stock market data (requires registration - licence?)
  • RITA - Airline On-Time Performance Data
  • Openstreetmap - Openstreetmap source data
  • NCBI - biological annotation from NCBI's ENTREZ system (daily updated)
Personal tools