Getting a stack trace of a running PostgreSQL backend on Linux/BSD
From PostgreSQL wiki
Linux and BSD
(If you want more than just a stack trace, take a look at the Developer FAQ which covers interactive debugging).
Installing External symbols
(BSD users who installed from ports can skip this)
On many Linux systems, debugging info is separated out from program binaries and stored separately. It's often not installed when you install a package, so if you want to debug the program (say, get a stack trace) you will need to install debug info packages. Unfortunately, the names of these packages vary depending on your distro, as does the procedure for installing them.
Some generic instructions (unrelated to PostgreSQL) are maintained on the GNOME Wiki here.
Debian Squeeze (6.x) users will also need to install gdb 7.3 from backports, as the gdb shipped in Squeeze doesn't understand the PIE executables used in newer PostgreSQL builds.
First, follow the instructions on the Ubuntu wiki entry DebuggingProgramCrash.
Once you've finished enabling the use of debug info packages as described, you will need to use the
list-dbgsym-packages.sh script linked to on that wiki article to get a list of debug packages you need. Installing the debug package for postgresql alone is not sufficient.
After following the instructions on the Ubuntu wiki, download the script to your desktop, open a terminal, and run:
$ sudo apt-get install $(sudo bash Desktop/list-dbgsym-packages.sh -t -p $(pidof -s postgres))
All Fedora versions: FedoraProject.org - StackTraces
In general, you need to install at least the debug symbol packages for the PostgreSQL server and client as well as any common package that may exist, and the debug symbol package for libc. It's a good idea to add debug symbols for the other libraries PostgreSQL uses in case the problem you're having arises in or touches on one of those libraries.
Collecting a stack trace
How to tell if a stack trace is any good
Read this section and keep it in mind as you collect information using the instructions below. Making sure the information you collect is actually useful will save you, and everybody else, time and hassle.
It is vitally important to have debugging symbols available to get a useful stack trace. If you do not have the required symbols installed, backtraces will contain lots of entries like this:
#1 0x00686a3d in ?? () #2 0x00d3d406 in ?? () #3 0x00bf0ba4 in ?? () #4 0x00d3663b in ?? () #5 0x00d39782 in ?? ()
... which are completely useless for debugging without access to your system (and almost useless with access). If you see results like the above, you need to install debugging symbol packages, or even re-build postgresql with debugging enabled. Do not bother collecting such backtraces, they are not useful.
Sometimes you'll get backtraces that contain just the function name and the executable it's within, not source code file names and line numbers or parameters. Such output will have lines like this:
#11 0x00d3afbe in PostmasterMain () from /usr/lib/postgresql/8.4/bin/postgres
This isn't ideal, but is a lot better than nothing. Installing debug information packages should give an even more detailed stack trace with line number and argument information, like this:
#9 0xb758d97e in PostmasterMain (argc=5, argv=0xb813a0e8) at postmaster.c:1040
... which is the most useful for tracking down your problem. Note the reference to a source file and line number instead of just an executable name.
Identifying the backend to connect to
You need to know the process ID of the postgresql backend to connect to. If you're interested in a backend that's using lots of CPU it might show up in
top. If you have a current connection to the backend you're interested in, use
select pg_backend_pid() to get its process ID. Otherwise, the
pg_catalog.pg_locks views may be useful in identifying the backend of interest; see the "procpid" column in those views.
Attaching gdb to the backend
Once you know the process ID to connect to, run:
sudo gdb -p pid
where "pid" is the process ID of the backend. GDB will pause the execution of the process you specified and drop you into interactive mode (the
(gdb) prompt) after showing the call the backend is currently running, eg:
0xb7c73424 in __kernel_vsyscall () (gdb)
You'll want to tell gdb to save a log of the session to a file, so at the gdb prompt enter:
(gdb) set pagination off (gdb) set logging file debuglog.txt (gdb) set logging on
gdb is now saving all input and output to a file,
debuglog.txt, in the directory in which you started gdb.
At this point execution of the backend is still paused. It can even hold up other backends, so I recommend that you tell it to resume executing normally with the "cont" command:
(gdb) cont Continuing.
The backend is now running normally, as if gdb wasn't connected to it.
Getting the trace
OK, with gdb connected you're ready to get a useful stack trace.
In addition to the instructions below, you can find some useful tips about using gdb with postgresql backends on the Developer FAQ.
Getting representative traces from a running backend
If you're concerned with a case that's taking way too long to execute a query, is using too much CPU, or appears to be in an infinite loop, you'll want to repeatedly interrupt its execution, get a stack trace, and let it resume executing. Having a collection of several stack traces helps provide a better idea of where it's spending its time.
You interrupt the backend and get back to the gdb command line with ^C (control-C). Once at the gdb command line, you use the "bt" command to get a backtrace, then the "cont" command to resume normal backend execution.
Once you've collected a few backtraces, detach then exit gdb at the gdb interactive prompt:
(gdb) detach Detaching from program: /usr/lib/postgresql/8.3/bin/postgres, process 12912 (gdb) quit user@host:~$
An alternative approach is to use the
gcore program to save a series of core dumps of the running program without disrupting its execution. Those core dumps may then be examined at your leisure, giving you time to get more than just a backtrace because you're not holding up the backend's execution while you think and type.
Getting a trace from the point of an error report
If you are trying to find out the cause of an unexpected error, the most useful thing to do is to set a breakpoint at errfinish before you let the backend continue:
(gdb) b errfinish Breakpoint 1 at 0x80ced0: file elog.c, line 414. (gdb) cont Continuing.
Now, in your connected psql session, run whatever query is needed to provoke the error. When it happens, the backend will stop execution at errfinish. Collect your backtrace with bt, then quit (or, possibly, cont if you want to do it again).
A breakpoint at errfinish will capture generation of not only ERROR reports, but also NOTICE, LOG, and any other message that isn't suppressed by client_min_messages or log_min_messages. You may want to adjust those settings to avoid having to continue through a bunch of unrelated messages.
Getting a trace from a reproducibly crashing backend
GDB will automatically interrupt the execution of a program if it detects a crash. So, once you've attached gdb to the backend you expect to crash, you just let it continue execution as normal and do whatever you need to to make the backend crash.
gdb will drop you into interactive mode as the backend crashes. At the
gdb prompt you can enter the
bt command to get a stack trace of the crash, then
cont to continue execution. When gdb reports the process has exited, use the
Alternately, you can collect a core file as explained below, but it's probably more hassle than it's worth if you know which backend to attach gdb to before it crashes.
Getting a trace from a randomly crashing backend
It's a lot harder to get a stack trace from a backend that's crashing when you don't know why it's crashing, what causes a backend to crash, or which backends will crash when. For this, you generally need to enable the generation of core files, which are debuggable dumps of a program's state that are generated by the operating system when the program crashes.
Enabling core dumps
On a Linux system you can check to see if core file generation is enabled for a process by examining /proc/$pid/limits, where $pid is the process ID of interest. "Max core file size" should be non-zero.
Generally, adding "ulimit -c unlimited" to the top of the PostgreSQL startup script and restarting postgresql is sufficient to enable core dump collection. Make sure you have plenty of free space in your PostgreSQL data directory, because that's where the core dumps will be written and they can be fairly big due to Pg's use of shared memory.
On a Linux system it's also worth changing the file name format used for core dumps so that core dumps don't overwrite each other. The
/proc/sys/kernel/core_pattern file controls this. I suggest
core.%p.sig%s.%ts, which will record the process's PID, the signal that killed it, and the timestamp at which the core was generated. See
man 5 core. To apply the settings change just run
sudo echo core.%p.sig%s.%ts | tee -a /proc/sys/kernel/core_pattern.
You can test whether core dumps are enabled by starting a `psql' session, finding the backend pid for it using the instructions given above, then killing it with "kill -ABRT pidofbackend" (where pidofbackend is the PID of the postgres backend, NOT the pid of psql). You should see a core file appear in your postgresql data directory.
Debugging the core dump
Once you've enabled core dumps, you need to wait until you see a backend crash. A core dump will be generated by the operating system, and you'll be able to attach gdb to it to collect a stack trace or other information.
You need to tell gdb what executable file generated the core if you want to get useful backtraces and other debugging information. To do this, just specify the postgres executable path then the core file path when invoking gdb, as shown below. If you do not know the location of the postgres executable, you can get it by examining /proc/$pid/exe for a running postgres instance. For example:
$ for f in `pgrep postgres`; do ls -l /proc/$f/exe; done lrwxrwxrwx 1 postgres postgres 0 2010-04-19 10:30 /proc/10621/exe -> /usr/lib/postgresql/8.4/bin/postgres lrwxrwxrwx 1 postgres postgres 0 2010-04-19 10:51 /proc/11052/exe -> /usr/lib/postgresql/8.4/bin/postgres lrwxrwxrwx 1 postgres postgres 0 2010-04-19 10:51 /proc/11053/exe -> /usr/lib/postgresql/8.4/bin/postgres lrwxrwxrwx 1 postgres postgres 0 2010-04-19 10:51 /proc/11054/exe -> /usr/lib/postgresql/8.4/bin/postgres lrwxrwxrwx 1 postgres postgres 0 2010-04-19 10:51 /proc/11055/exe -> /usr/lib/postgresql/8.4/bin/postgres
... we can see from the above that the postgres executable on my (Ubuntu) system is
Once you know the executable path and the core file location, just run gdb with those as arguments, ie
gdb -q /path/to/postgres /path/to/core. Now you can debug it as if it was a normal running postgres, as discussed in the sections above.
Debugging the core dump - example
For example, having just forced a postgres backend to crash with
kill -ABRT, I have a core file named
/var/lib/postgresql/8.4/main, which is the data directory on my Ubuntu system. I've used /proc to find out that the executable for postgres on my system is
It's now easy to run GDB against it and request a backtrace:
$ sudo -u postgres gdb -q -c /var/lib/postgresql/8.4/main/core.10780.sig6.1271644870s /usr/lib/postgresql/8.4/bin/postgres Core was generated by `postgres: wal writer process '. Program terminated with signal 6, Aborted. #0 0x00a65422 in __kernel_vsyscall () (gdb) bt #0 0x00a65422 in __kernel_vsyscall () #1 0x00686a3d in ___newselect_nocancel () from /lib/tls/i686/cmov/libc.so.6 #2 0x00e68d25 in pg_usleep () from /usr/lib/postgresql/8.4/bin/postgres #3 0x00d3d406 in WalWriterMain () from /usr/lib/postgresql/8.4/bin/postgres #4 0x00bf0ba4 in AuxiliaryProcessMain () from /usr/lib/postgresql/8.4/bin/postgres #5 0x00d3663b in ?? () from /usr/lib/postgresql/8.4/bin/postgres #6 0x00d39782 in ?? () from /usr/lib/postgresql/8.4/bin/postgres #7 <signal handler called> #8 0x00a65422 in __kernel_vsyscall () #9 0x00686a3d in ___newselect_nocancel () from /lib/tls/i686/cmov/libc.so.6 #10 0x00d37bee in ?? () from /usr/lib/postgresql/8.4/bin/postgres #11 0x00d3afbe in PostmasterMain () from /usr/lib/postgresql/8.4/bin/postgres #12 0x00cdc0dc in main () from /usr/lib/postgresql/8.4/bin/postgres
If you don't have proper symbols installed, specify the wrong executable to gdb or fail to specify an executable at all, you'll see a useless backtrace like this following one:
$ sudo -u postgres gdb -q -c /var/lib/postgresql/8.4/main/core.10780.sig6.1271644870s Core was generated by `postgres: wal writer process '. Program terminated with signal 6, Aborted. #0 0x00a65422 in __kernel_vsyscall () (gdb) bt #0 0x00a65422 in __kernel_vsyscall () #1 0x00686a3d in ?? () #2 0x00d3d406 in ?? () #3 0x00bf0ba4 in ?? () #4 0x00d3663b in ?? () #5 0x00d39782 in ?? () #6 <signal handler called> #7 0x00a65422 in __kernel_vsyscall () #8 0x00686a3d in ?? () #9 0x00d3afbe in ?? () #10 0x00cdc0dc in ?? () #11 0x005d7b56 in ?? () #12 0x00b8fad1 in ?? ()
If you get something like that, don't bother sending it in. If you didn't just get the executable path wrong, you'll probably need to install debugging symbols for PostgreSQL (or even re-build PostgreSQL with debugging enabled) and try again.
Tracing problems when creating a cluster
If you're running into a crash while trying to create a database cluster using initdb, that may leave behind a core dump that you can analyze with gdb as described above. This should be the case if there's an assertion failure for example. You will probably need to give the --no-clean option to initdb to keep it from deleting the new data directory and the core file along with it.
Another technique for finding bootstrap-time bugs is to manually feed the bootstrapping commands into bootstrap mode or single-user mode, with a data directory left over from initdb --no-clean. This can help if there has been no PANIC that leaves a core dump, but just a FATAL or ERROR, for example. It's easy to attach GDB to such a backend.
Starting Postgres under GDB
Debugging multi-process applications like PostgreSQL has historically been very painful with GDB. Thankfully with recent 7.x releases, this has been improved greatly by "inferiors" (GDB's term for multiple debugged processes).
NB! This is still quite fragile, so don't expect to be able to do this in production.
# Stop server pg_ctl -D /path/to/data stop -m fast # Launch postgres via gdb gdb --args postgres -D /path/to/data
Now, in the GDB shell, use these commands to set up an environment:
# We have scroll bars in the year 2012! set pagination off # Attach to both parent and child on fork set detach-on-fork off # Stop/resume all processes set schedule-multiple on # Usually don't care about these signals handle SIGUSR1 noprint nostop handle SIGUSR2 noprint nostop # Ugly hack so we don't break on process exit python gdb.events.exited.connect(lambda x: [gdb.execute('inferior 1'), gdb.post_event(lambda: gdb.execute('continue'))]) # Phew! Run it. run
To get a list of processes, run
info inferior. To switch to another process, run