PgCon 2009/Greg Smith Hardware Benchmarking notes

From PostgreSQL wiki
Jump to navigationJump to search


This is a complilation of notes taken by attendees at Greg Smith's lecture. Please feel free to flesh it out by adding the notes that you took, too.

Greg's slides can be found here


This was a really excellent presentation on the need to do hardware-level benchmarking on all systems before deploying them. Whenever a new piece of hardware comes in, it should be benchmarked, and the results recorded. Only with these benchmarks in hand can we truly tweak the installation for optimal performance. Another point stressed which was also echoed the entire week in various lectures was that hardware should be selected on a "try and buy" basis - i.e. never shell out money unless you have a money-back guarantee from the vendor stating that if the hardware does not meet the published specs, it goes back to the vendor.

Benchmark your hardware before deployment

  • Use the same OS, patch-level, drivers and so on as your production system
  • Some tests are only useful before you're in production
  • You should know what is slow/bad before you buy more hardware
  • Vendors lie :)

Systematic benchmarking

This talk is mostly about the first three tests:

  • Memory
  • CPU
  • Disk
  • Database server -- low level that coordinate well with your software operations
  • Application

Memory Tests

  • DOS: memtest86+ (also on many linux cds
    • need to be sure to get a version that is as new as your hardware, so that some of the hardware identification features will work correctly. i.e. what type of CPU and RAM you have.
  • Windows: SiSoftware Sandra
  • UNIX: sysbench
  • Linux: hdparm (only sometimes!)


  • DDR2/800 shoulod be 20% faster than DDR2/667
  • 25% GAIN IS BECAUSE OF IMPROVED CLOCK MULTIPLIERS -- need integer multipliers for this. We really need to understand CPU/RAM multipliers in order to tweak the most performance out of hardware
  • Slow memory?
    • Only running in single channel instead of dual channel -- you put the RAM in the wrong slots (check output of memtest86+ - bottom right)
    • Incorrect SPD? Sometimes the motherboard doesn't read this correctly -- may need to adjust voltage or timing to match what the RAM is really capable of
    • Bad RAM/CPU multiplier combos can drop things down.
    • Poor quality RAM can cause problems (30% performance difference between the 1GB and 4GB ones, even though exactly the same modules)
    • Chose exactly same modules from a different vendor, and saw a 20% different


  • PostgreSQL and the CPU
  • (talking about each core - how the OS views CPU)
    • Pg uses only a single CPU per query -- so speed of CPU is pretty important
    • Queries executing against cached data will bottleneck on the CPU -- upgrading CPU without upgrading memory may not provide the same level of improvement
    • COPY is CPU intensive (known to be slow wrt CPU -- how fast the single CPU can parse the data)

CPU tests

  • Windows: SiSoftware Sandra
  • UNIX: sysbench CPU test
  • Custom test with timing and generate_series() command (examples coming!)
  • pgbench select-only on small database -- something small enough to fit in RAM (limited by how fast the CPU can execute the queries)
    • test each cpu
    • test how system performs when every CPU is saturated
    • Was able to find a kernel scheduler change that affected the results
  • Vary pgbench client count to test single or multiple CPUs
  • Try varying something independently to find out if CPU or memory is the problem
    • Go into the bios and throttle the cpu or memory and see if the benchmark changes
  • pgbench
    • GRAPH: SELECT transactions/sec: scaling factor vs database size, # tps over time

Sources for slow CPU results

  • Slow memory
  • Power management throttling -- turned on and optimized for something other than performance
    • Look at /proc/cpuinfo -- frequency that the CPU is running at is 1000 Mhz

Disk Tests

  • Sequential writes: INSERT, COPY FROM (when not CPU limited)
  • Sequential read: SELECT * FROM and similar table sequential scans
  • Seeks: SELECT using index, UPDATE
  • Commit fsync rate: INSERT, UPDATE
    • First write to WAL
    • Then write to data store
    • Then indicate WAL was written
    • Caching
      • SATA drive? 8-32MB of write cache; Those aren't really safe :)
      • fsync() -- I've written data, and i'd like to wait until the data is sent to disk
      • Drive manufacturers have realized that they need to cheat to get good performance :)
      • Only able to commit as fast as a drive can spin! Take the RPM / 60 -- that is your commit number.
      • 7200 RPM drive - 120 commits per second, and you can't get faster than that!
  • How does the filesystem play into testing?
    • Filesystem policies affect how the blocks are actually written to disk
    • If you configure your FS to *not* send fsyncs through, then you might experience some pain
    • Measure the commit rate, and if it is faster than what is possible - look for caches that you can turn off

dd test

  • Compute 2X the size of your RAM in 8KB blocks
  • blocks = 250,000 * gigabytes of ram
  • vmstat
  • iostat
  • bi and bo will match current read/write rate
  • Note the CPU percentage required to reach the peak rate


  • ignore the per-char and create results, run the block output and input ones
  • automatically size the file to be 2x the ram in your system
  • output from bonnie gives a CSV output, but if you pipe it through bon_csv2html -- the result will actually produce a nice HTML page
  • Random Seeks #:
    • Starts up 3 processes at once, 8000 random seeks -- 10% of the time after it reads the block, it writes it again
    • Probably not good to compare to other tests

bonnie++ ZCAV

  • HD speed varies based on which part of the disk you're on (outer is way faster than inner -- more bits per capita of time when on the outer)
  • ./zcav -f/dev/sda > t500
    • must get a recent version for ZCAV to scale for TB drives 1.03e works
    • Download a somewhat broken gnuplot script...
    • gnuplot script that works is in the slides...
    • Almost a 2:1 difference between inner and outer part of the disk in Greg's tests


  • seeks/second - example in the slides for configuration
  • any random read/write test -- the actual section and how wide the disk
    • "short stroking" -- decide to only use the first 30-40 GB on every disk
    • Whenever you're seeking between sections of the disk, if the max possible distance is short, improves overall performance

Customizable seek tests

  • bonnie++ experimental (1.95)
  • Alot more complex:
    • iozone
    • fio
  • Windows: HD Tune

Computed values to share with vendors

  • In the presentation, Greg lists out the figures he calculates to share with vendors when there's a performance issue

Write cache

  • two commit rates for 1 disk setup - if you don't turn the write cache turned off -- you need a disk controller with a battery backed cache controller
  • 3 disk raiod 0- uses a 256MB battery backed cache in his tests
  • Quick CPU tests
    • select sum(generate_series) from generate_series(1,1000000); etc.
    • quick insert/plan test
    • create table test, create series, explain analyze count(*)
  • david wheeler's presentation that talks about function overhead


  • has a little scripting language
  • set of standard tests with examples from Greg


  • Trust no one
  • don't start benchmarks until you've baselined
  • don't trust vendors
  • use simple, standard tools
  • don't buy any hardware unless you have the option of returning it if it doesn't perform well.