Performance Analysis Tools

From PostgreSQL wiki

Jump to: navigation, search

Contents

Performance Analysis Tools

This page is focused on tools for collecting data outside of PostgreSQL, in order to learn more about the system as a whole, about PostgreSQL's use of system resources, about things that may be bottlenecks for PostgreSQL's performance, etc.

Most of the time, the tools PostgreSQL provides internally will be more than adequate for your needs. The most important tool in your toolbox is the SQL EXPLAIN command and its EXPLAIN ANALYZE alternative. The pg_catalog.pg_stat_activity and pg_catalog.pg_locks views are also vital.

You can find a lot of advice about tuning PostgreSQL and the system in the Performance and categories of the wiki, and in the PostgreSQL manual. If you're at a loss, see SlowQueryQuestions.

System level tools for I/O, CPU and memory usage investigation

Tools like `ps' with the `wchan' format specifier, `vmstat', `top', `iotop', `blktrace' + `blkparse', `btrace', `sar', alt-sysrq-t, etc can help you learn much more about what your system is doing and where things are being delayed.

On supported systems (currently Solaris and FreeBSD), `dtrace' is also a powerful tool. PostgreSQL has DTrace hooks to allow you to investigate its internal workings and performance as well as that of the operating system it is running on.

Windows users will want to look into Process Monitor, Process Explorer, and FileMon from the SysInternals suite.


Unix/Linux tools

ps

The "wchan" option is really useful for seeing what a process that's in 'D' state (uninterruptable sleep in kernel system call) is actually doing. eg:

ps -e -o pid,ppid,wchan:60,cmd | grep post

You'll need to go digging in the kernel sources, use Google, or nut it out to figure what the value shown in the wchan field means.

vmstat

vmstat can give you useful overview information about CPU, disk and memory activity at a system wide level. It's most useful when updating continuously, eg:

vmstat 1

or

vmstat 60

(the number is in seconds).

Output looks like:

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0   4144 242088 707752 2307192    0    0     0    26   71   73  2  1 97  1
 0  0   4144 242080 707752 2307220    0    0     0     0  944 1917  2  1 97  0
 0  0   4144 241956 707752 2307220    0    0     0    64  772 1579  1  1 98  0

top

Like vmstat, gives you some system overview info, though isn't as useful for disk information. Quite configurable.

Don't be fooled by the "used" and "free" memory values. The real memory in use is actually (roughly) the value in "used" minus the value in "buffers", since "buffers" includes the kernel's disk cache. The kernel will use most of the free memory for disk cache, but will shrink that cache as required to fit other things into memory. After all, truly free memory is wasted memory that does you no good.

free

Shows free and used memory, system-wide.

$ free -m
             total       used       free     shared    buffers     cached
Mem:          3960       3726        234          0        721       2252
-/+ buffers/cache:        752       3208
Swap:         2070          4       2066

The "free" value beside "-/+ buffers/cache" is the one you should generally consider the "real" amount of free memory in the system. The "-m" flag asks for values in megabytes.

sar

Process accounting. See the man page.

gdb

Why is a debugger listed in performance analysis tools?

Because sometimes, attaching a debugger to a backend, interrupting the backend periodically to get a backtrace, and then trying to figure out what on earth it's up to is a helpful way to track down an issue.

See Generating a stack trace of a PostgreSQL_backend

Wireshark, tshark, and tcpdump

These tools are for monitoring and analysis of traffic on a network interface. If you can't figure out what's keeping your network interface pegged, they might help you figure out what data is being transmitted/received.

pktstat

A top-like utility for network interfaces. Doesn't, alas, show process IDs or names, but you can figure those out from source and destination port information. Handy for tracking down a backend that's flooding the interface with traffic.

Linux-only tools

iostat

Provides summary information about I/O activity and load on block devices in the system. Doesn't provide information about what processes are responsible for the load, but produces short and easily read output in real time. Like vmstat, most usefully used in continuous mode, eg "iostat 1".

See the man page for details.

blktrace, blkparse and btrace

These are tools to get very low level information about the activity on a given block device, what process(es) are causing that activity, and what they're doing. It produces a huge amount of information, but can be filtered somewhat. It takes a bit of thought to interpret, but can be great for those "what the hell is thrashing my disk" moments.

Because of the bgwriter and wal writer, it's not usually easy to see what PostgreSQL backends are keeping a disk busy with writes. It's still good for tracking down heavy read loads and figuring out what backend (and, thus, query - via the pg_catalog.pg_stat_activity) is responsible.

Output looks like this:

  9,1    0      246    11.339004193  1383  U   N [postgres] 0
  9,1    0      247    11.340029833  1383  A   R 118356256 + 96 <- (252,3) 34469792
  9,1    0      248    11.340030339  1383  Q   R 118356256 + 96 [postgres]
  9,1    0      249    11.340054022  1383  A   R 118356352 + 128 <- (252,3) 34469888
  9,1    0      250    11.340054341  1383  Q   R 118356352 + 128 [postgres]
  9,1    0      251    11.340062014  1383  A   R 118356480 + 32 <- (252,3) 34470016
  9,1    0      252    11.340062367  1383  Q   R 118356480 + 32 [postgres]

... plus a handy summary at the end.

strace

strace is a useful tool that can attach to a process and report all the system calls that process makes, including arguments to those system calls. It's great for figuring out what files a process opens, or if it's waiting for something, etc.

Useful interpretation requires some C programming knowledge and some idea about POSIX APIs.

(Note that other systems have similar tools with different names - for example, some BSDs have truss).

oprofile

Probably not useful for general users, but a handy tool if you have something within Pg or a library used by Pg that's running strangely slowly and you don't know why.

Profiling with OProfile

alt-sysrq-t

The "magic sysrq key". Mainly useful for tracking down processes mysteriously stuck in 'D' state in the kernel, ie hung in a system call, since it will output a stack trace of the kernel stack for all processes in the system. Mostly used when trying to figure out whether Pg is having problems due to a kernel bug, hardware issue, file system bug, etc.

Must be enabled with 'sysctl -w kernel.sysrq=1' before it can be used.

DTrace

See separate DTrace page.

Windows tools

Process Monitor

Process Explorer

FileMon

Page originally by --Ringerc 07:26, 26 November 2009 (UTC)

Personal tools