Sample Databases

From PostgreSQL wiki

Jump to: navigation, search

Many database systems provide sample databases with the product. A good intro to popular ones that includes discussion of samples available for other databases is Sample Databases for PostgreSQL and More

One trivial sample that PostgreSQL ships with is the Pgbench. This has the advantage of being built-in and supporting a scalable data generator--you can make databases of any size ranging from 16MB to 600GB (approximately) with the current version.

PgFoundry Samples

The latest collection of PostgreSQL compatible database samples is at PgFoundry Sample Databases. It includes three commonly used benchmark databases:

  • World: Based on the MySQL World sample. Has a list of Cities, Countries, and what language they speak.
  • dellstore2: PostgreSQL port of a database-neutral e-commerce test application developed by Dell. The original code supports three size scales in their data generator (10MB, 1GB, 100GB), currently only the normal, smallest sized data set has been ported to PostgreSQL. PostgreSQL 8.4: Windowing Functions uses this test data to show some advanced queries.
  • Pagila: Based on MySQL's replacement for World, Sakila, which is itself inspired by the Dell DVD Store.

There are some other sample databases there as well, such as a USDA Food database and a large list of country data via ISO-3166 standards.

Other Samples

  • The land registry file from has details of land sales in the UK, going back several decades, and is 3.5GB as of August 2016 (this applies only to the "complete" file, "pp-complete.csv"). No registration required.
-- Download file "pp-complete.csv", which has all records.
-- If schema changes/field added, consult:

-- Create table:
CREATE TABLE land_registry_price_paid_uk(
  transaction uuid,
  price numeric,
  transfer_date date,
  postcode text,
  property_type char(1),
  newly_built boolean,
  duration char(1),
  paon text,
  saon text,
  street text,
  locality text,
  city text,
  district text,
  county text,
  ppd_category_type char(1),
  record_status char(1));

-- Copy CSV data, with appropriate munging:
COPY land_registry_price_paid_uk FROM '/path/to/pp-complete.csv' with (format csv, encoding 'win1252', header false, null '', quote '"', force_null (postcode, saon, paon, street, locality, city, district));
  • AdventureWorks 2014 for Postgres - Scripts to set up the OLTP part of the go-to database used in training classes and for sample apps on the Microsoft stack. The result is 68 tables containing HR, sales, product, and purchasing data organized across 5 schemas. It represents a fictitious bicycle parts wholesaler with a hierarchy of nearly 300 employees, 500 products, 20000 customers, and 31000 sales each having an average of 4 line items. So it's big enough to be interesting, but not unwieldy. In addition to being a well-rounded OLTP sample, it is also a good choice to demonstrate ETL into a data warehouse. The code in some of the views demonstrates effective techniques for querying XML data.
  • Mouse Genome sample data set. See instructions. Custom format dump, 1.9GB compressed, but restored database is tens of GB in size. MGI is the international database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data to facilitate the study of human health and disease. MGI use PostgreSQL in production [1], providing direct protocol access to researchers, so the custom format dump is not an afterthought. Apparently updated frequently.
  • Benchmarking databases such as DBT-2 or TPC-H can be used as samples.
  • Freebase - Various wiki style data on places/people/things - ~600MB compressed
  • IMDB - the IMDB database - see also
  • [2] - US federal government data collection see also sunlightlabs
  • DBpedia - wikipedia data export project
  • eoddata - historic stock market data (requires registration - licence?)
  • RITA - Airline On-Time Performance Data
  • Openstreetmap - Openstreetmap source data
  • NCBI - biological annotation from NCBI's ENTREZ system (daily updated)
  • Airlines Demo Database (in Russian) - Airlines Demo Database provides database schema with several tables and meaningful content, which can be used for learning SQL and writing applications
  • Stack Exchange Data Dump - Anonymized dump of all user-contributed content on the Stack Exchange network (Stack Overflow, Server Fault...) under cc-by-sa 3.0 license. Use this tool to import XML dumps in PostgresQL :
  • The Museum of Modern Art (MoMA) collection data - This research dataset contains more than 130,000 records, representing all of the works that have been accessioned into MoMA’s collection and cataloged in our database. It includes basic metadata for each work, including title, artist, date made, medium, dimensions, and date acquired by the Museum. At this time, both datasets are available in CSV and JSON format, encoded in UTF-8.
Personal tools