Performance Analysis Tools
From PostgreSQL wiki
Performance Analysis Tools
This page is focused on tools for collecting data outside of PostgreSQL, in order to learn more about the system as a whole, about PostgreSQL's use of system resources, about things that may be bottlenecks for PostgreSQL's performance, etc.
Most of the time, the tools PostgreSQL provides internally will be more than adequate for your needs. The most important tool in your toolbox is the SQL
EXPLAIN command and its
EXPLAIN ANALYZE alternative. The
pg_catalog.pg_locks views are also vital.
System level tools for I/O, CPU and memory usage investigation
Tools like `ps' with the `wchan' format specifier, `vmstat', `top', `iotop', `blktrace' + `blkparse', `btrace', `sar', alt-sysrq-t, etc can help you learn much more about what your system is doing and where things are being delayed.
On supported systems (currently Solaris and FreeBSD), `dtrace' is also a powerful tool. PostgreSQL has DTrace hooks to allow you to investigate its internal workings and performance as well as that of the operating system it is running on.
The "wchan" option is really useful for seeing what a process that's in 'D' state (uninterruptable sleep in kernel system call) is actually doing. eg:
ps -e -o pid,ppid,wchan:60,cmd | grep post
You'll need to go digging in the kernel sources, use Google, or nut it out to figure what the value shown in the
wchan field means.
vmstat can give you useful overview information about CPU, disk and memory activity at a system wide level. It's most useful when updating continuously, eg:
(the number is in seconds).
Output looks like:
$ vmstat 1 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 4144 242088 707752 2307192 0 0 0 26 71 73 2 1 97 1 0 0 4144 242080 707752 2307220 0 0 0 0 944 1917 2 1 97 0 0 0 4144 241956 707752 2307220 0 0 0 64 772 1579 1 1 98 0
Like vmstat, gives you some system overview info, though isn't as useful for disk information. Quite configurable.
Don't be fooled by the "used" and "free" memory values. The real memory in use is actually (roughly) the value in "used" minus the value in "buffers", since "buffers" includes the kernel's disk cache. The kernel will use most of the free memory for disk cache, but will shrink that cache as required to fit other things into memory. After all, truly free memory is wasted memory that does you no good.
Shows free and used memory, system-wide.
$ free -m total used free shared buffers cached Mem: 3960 3726 234 0 721 2252 -/+ buffers/cache: 752 3208 Swap: 2070 4 2066
The "free" value beside "-/+ buffers/cache" is the one you should generally consider the "real" amount of free memory in the system. The "-m" flag asks for values in megabytes.
Process accounting. See the man page.
Why is a debugger listed in performance analysis tools?
Because sometimes, attaching a debugger to a backend, interrupting the backend periodically to get a backtrace, and then trying to figure out what on earth it's up to is a helpful way to track down an issue.
Wireshark, tshark, and tcpdump
These tools are for monitoring and analysis of traffic on a network interface. If you can't figure out what's keeping your network interface pegged, they might help you figure out what data is being transmitted/received.
A top-like utility for network interfaces. Doesn't, alas, show process IDs or names, but you can figure those out from source and destination port information. Handy for tracking down a backend that's flooding the interface with traffic.
Provides summary information about I/O activity and load on block devices in the system. Doesn't provide information about what processes are responsible for the load, but produces short and easily read output in real time. Like vmstat, most usefully used in continuous mode, eg "iostat 1".
See the man page for details.
blktrace, blkparse and btrace
These are tools to get very low level information about the activity on a given block device, what process(es) are causing that activity, and what they're doing. It produces a huge amount of information, but can be filtered somewhat. It takes a bit of thought to interpret, but can be great for those "what the hell is thrashing my disk" moments.
Because of the bgwriter and wal writer, it's not usually easy to see what PostgreSQL backends are keeping a disk busy with writes. It's still good for tracking down heavy read loads and figuring out what backend (and, thus, query - via the
pg_catalog.pg_stat_activity) is responsible.
Output looks like this:
9,1 0 246 11.339004193 1383 U N [postgres] 0 9,1 0 247 11.340029833 1383 A R 118356256 + 96 <- (252,3) 34469792 9,1 0 248 11.340030339 1383 Q R 118356256 + 96 [postgres] 9,1 0 249 11.340054022 1383 A R 118356352 + 128 <- (252,3) 34469888 9,1 0 250 11.340054341 1383 Q R 118356352 + 128 [postgres] 9,1 0 251 11.340062014 1383 A R 118356480 + 32 <- (252,3) 34470016 9,1 0 252 11.340062367 1383 Q R 118356480 + 32 [postgres]
... plus a handy summary at the end.
strace is a useful tool that can attach to a process and report all the system calls that process makes, including arguments to those system calls. It's great for figuring out what files a process opens, or if it's waiting for something, etc.
Useful interpretation requires some C programming knowledge and some idea about POSIX APIs.
(Note that other systems have similar tools with different names - for example, some BSDs have
Probably not useful for general users, but a handy tool if you have something within Pg or a library used by Pg that's running strangely slowly and you don't know why.
The "magic sysrq key". Mainly useful for tracking down processes mysteriously stuck in 'D' state in the kernel, ie hung in a system call, since it will output a stack trace of the kernel stack for all processes in the system. Mostly used when trying to figure out whether Pg is having problems due to a kernel bug, hardware issue, file system bug, etc.
Must be enabled with 'sysctl -w kernel.sysrq=1' before it can be used.
See separate DTrace page.
Page originally by --Ringerc 07:26, 26 November 2009 (UTC)