PgCon 2009/Greg Smith Hardware Benchmarking notes

Preface

This is a complilation of notes taken by attendees at Greg Smith's lecture. Please feel free to flesh it out by adding the notes that you took, too.

Greg's slides can be found here

Commentary

This was a really excellent presentation on the need to do hardware-level benchmarking on all systems before deploying them. Whenever a new piece of hardware comes in, it should be benchmarked, and the results recorded. Only with these benchmarks in hand can we truly tweak the installation for optimal performance. Another point stressed which was also echoed the entire week in various lectures was that hardware should be selected on a "try and buy" basis - i.e. never shell out money unless you have a money-back guarantee from the vendor stating that if the hardware does not meet the published specs, it goes back to the vendor.

Benchmark your hardware before deployment

Use the same OS, patch-level, drivers and so on as your production system
Some tests are only useful before you're in production
You should know what is slow/bad before you buy more hardware
Vendors lie :)

Systematic benchmarking

This talk is mostly about the first three tests:

Memory
CPU
Disk
Database server -- low level that coordinate well with your software operations
Application

Memory Tests

DOS: memtest86+ (also on many linux cds
- need to be sure to get a version that is as new as your hardware, so that some of the hardware identification features will work correctly. i.e. what type of CPU and RAM you have.
Windows: SiSoftware Sandra
UNIX: sysbench
Linux: hdparm (only sometimes!)

Memory

DDR2/800 shoulod be 20% faster than DDR2/667
25% GAIN IS BECAUSE OF IMPROVED CLOCK MULTIPLIERS -- need integer multipliers for this. We really need to understand CPU/RAM multipliers in order to tweak the most performance out of hardware

Slow memory?
- Only running in single channel instead of dual channel -- you put the RAM in the wrong slots (check output of memtest86+ - bottom right)
- Incorrect SPD? Sometimes the motherboard doesn't read this correctly -- may need to adjust voltage or timing to match what the RAM is really capable of
- Bad RAM/CPU multiplier combos can drop things down.
- Poor quality RAM can cause problems (30% performance difference between the 1GB and 4GB ones, even though exactly the same modules)
- Chose exactly same modules from a different vendor, and saw a 20% different

CPU

PostgreSQL and the CPU
(talking about each core - how the OS views CPU)
- Pg uses only a single CPU per query -- so speed of CPU is pretty important
- Queries executing against cached data will bottleneck on the CPU -- upgrading CPU without upgrading memory may not provide the same level of improvement
- COPY is CPU intensive (known to be slow wrt CPU -- how fast the single CPU can parse the data)

CPU tests

Windows: SiSoftware Sandra
UNIX: sysbench CPU test
Custom test with timing and generate_series() command (examples coming!)
pgbench select-only on small database -- something small enough to fit in RAM (limited by how fast the CPU can execute the queries)
- test each cpu
- test how system performs when every CPU is saturated
- Was able to find a kernel scheduler change that affected the results
Vary pgbench client count to test single or multiple CPUs

Try varying something independently to find out if CPU or memory is the problem
- Go into the bios and throttle the cpu or memory and see if the benchmark changes

pgbench
- GRAPH: SELECT transactions/sec: scaling factor vs database size, # tps over time

Sources for slow CPU results

Slow memory
Power management throttling -- turned on and optimized for something other than performance
- Look at /proc/cpuinfo -- frequency that the CPU is running at is 1000 Mhz

Disk Tests

Sequential writes: INSERT, COPY FROM (when not CPU limited)
Sequential read: SELECT * FROM and similar table sequential scans
Seeks: SELECT using index, UPDATE
Commit fsync rate: INSERT, UPDATE
- First write to WAL
- Then write to data store
- Then indicate WAL was written
- Caching
  - SATA drive? 8-32MB of write cache; Those aren't really safe :)
  - fsync() -- I've written data, and i'd like to wait until the data is sent to disk
  - Drive manufacturers have realized that they need to cheat to get good performance :)
  - Only able to commit as fast as a drive can spin! Take the RPM / 60 -- that is your commit number.
  - 7200 RPM drive - 120 commits per second, and you can't get faster than that!
How does the filesystem play into testing?
- Filesystem policies affect how the blocks are actually written to disk
- If you configure your FS to *not* send fsyncs through, then you might experience some pain
- Measure the commit rate, and if it is faster than what is possible - look for caches that you can turn off

dd test

Compute 2X the size of your RAM in 8KB blocks
blocks = 250,000 * gigabytes of ram
vmstat
iostat
bi and bo will match current read/write rate
Note the CPU percentage required to reach the peak rate

bonnie++

ignore the per-char and create results, run the block output and input ones
automatically size the file to be 2x the ram in your system
output from bonnie gives a CSV output, but if you pipe it through bon_csv2html -- the result will actually produce a nice HTML page
Random Seeks #:
- Starts up 3 processes at once, 8000 random seeks -- 10% of the time after it reads the block, it writes it again
- Probably not good to compare to other tests

bonnie++ ZCAV

HD speed varies based on which part of the disk you're on (outer is way faster than inner -- more bits per capita of time when on the outer)
./zcav -f/dev/sda > t500
- must get a recent version for ZCAV to scale for TB drives 1.03e works
- Download a somewhat broken gnuplot script... coker.com
- gnuplot script that works is in the slides...
- Almost a 2:1 difference between inner and outer part of the disk in Greg's tests

sysbench

seeks/second - example in the slides for configuration
any random read/write test -- the actual section and how wide the disk
- "short stroking" -- decide to only use the first 30-40 GB on every disk
- Whenever you're seeking between sections of the disk, if the max possible distance is short, improves overall performance

Customizable seek tests

bonnie++ experimental (1.95)
Alot more complex:
- iozone
- fio
Windows: HD Tune

Computed values to share with vendors

In the presentation, Greg lists out the figures he calculates to share with vendors when there's a performance issue

Write cache

two commit rates for 1 disk setup - if you don't turn the write cache turned off -- you need a disk controller with a battery backed cache controller
3 disk raiod 0- uses a 256MB battery backed cache in his tests

Quick CPU tests
- select sum(generate_series) from generate_series(1,1000000); etc.
- quick insert/plan test
- create table test, create series, explain analyze count(*)

david wheeler's presentation that talks about function overhead

pgbench

has a little scripting language
set of standard tests with examples from Greg

howto

Trust no one
don't start benchmarks until you've baselined
don't trust vendors
use simple, standard tools
don't buy any hardware unless you have the option of returning it if it doesn't perform well.