Getting a stack trace of a running PostgreSQL backend on Windows

From PostgreSQL wiki
Jump to navigationJump to search

Up to parent

Before you start

There are a few things you need to be aware of to get a useful stack trace:

  • To get a useful stack trace, your symbol path must be configured to include the debugging symbol (.pdb) files of the programs you want to debug. See the section on the symbol path below.
  • It appears that debugging a 32-bit PostgreSQL on 64-bit windows produces useless results, at least when the 64-bit debugging tools are used.

For more information, see the section on how to verify that you have a useful stack trace, below.

Methods for getting a stack trace

On Windows there are a few different methods for getting a stack trace of a backend that crashes while doing something, and a few different tools to to it with.

Methods:

  • Get the process ID of a running backend, attach a debugger to it, and trigger a crash;
  • Set your debugger up as the post-mortem debugger and trigger a crash; or
  • Enable crash dumps, trigger a crash, and debug the resulting crash dump.

Direct attach is probably easier, but could potentially interfere with production operations. It's also not always possible, particularly when the crashes are random and hard to reproduce. You can't use a crashdump if the process is stuck (100% cpu, or just not responding and not progressing). You have to attach a debugger directly.

Setting up the post-mortem debugger can work well, but it's an intrusive system-wide change. (Use drwatson.exe to revert to the default crash handler afterwards). Not all Visual Studio editions support being used as the post-mortem debugger, and it won't work if PostgreSQL is running as a service in a separate user account.

Crash dumps (minidumps) work well for intermittent, hard to reproduce crashes. You can also use crash dumps if you need to send the debuggable crash dump to somebody else for examination. Unlike direct attach, crash dumps are supported by the free Express editions of Visual Studio. Do not just send a crash dump to someone and assume they can do something with it; you should normally extract and send a stack trace. Only send a crash dump if asked.


Tools:

  • Microsoft Visual Studio Professional - supports both crash dumps and direct attach, very easy, but costs $
  • Microsoft Visual Studio Express 2010 - supports both crash dumps and direct attach after enabling "Expert Settings"
  • Microsoft Visual Studio Express 2012 for Windows Desktop - supports both crash dumps and direct attach, very easy
  • Process Explorer - only useful for getting the status of a backend without interrupting it, for tracking down apparent infinite loops etc.
  • windbg from Debugging Tools for Windows - supports both crash dumps and direct attach, somewhat less than user friendly

If you have Microsoft Visual Studio, use that. It's the easiest and best option.

If you can't use Visual Studio, use Process Explorer for stack traces of running processes. For all other work, install Visual Studio. windbg is not a tool suitable for normal humans.

Installing Debugging Tools for Windows and Process Explorer

Microsoft Windows does not come with built-in debugging tools. You will need to install the Debugging Tools for Windows. These are no longer available as stand-alone packages, you must install them using the Windows SDK installer. If you like you can disable installation of everything except the debugging tools. See also here.

Once you have installed the Debugging Tools for Windows, you'll need to do a little bit of setup so you can get useful stack traces.

Note: The instructions below assume your copy of Windows is installed on c: so you may have to adjust them if this is not the case.

You will might also want Process Explorer from the Sysinternal suite. Sysinternals is useful, Process Explorer particularly so. Process Explorer can be used to get stack traces showing a snapshot of the state of a running PostgreSQL process for when you're tracking down apparently "stuck" queries.

Configuring the symbol path

First, you need to set the symbol path. First you have to figure out what it is, since it depends on the version of PostgreSQL you've installed, the language of your version of Windows, and where PostgreSQL was installed to. What you need to find is the location of the symbols folder inside the PostgreSQL install directory. If you've used the EnterpriseDB 1-click installer for PostgreSQL 9.4 and you're using an English version of Windows, it'll be:

C:\Program Files\PostgreSQL\9.4\symbols

I'll assume that's the case in the following instructions, so adjust the path shown to match your system if required.

To set the symbol path, you'll need to use the System control panel. You can get to this quickly and conveniently with all versions of Windows by holding down the Windows key and pressing the BREAK key (it's often under the Pause key). You can also find it in the Control Panel; you may need to search for "system". If you can't find it, open the Start menu and click "Run" then enter "sysdm.cpl" and press enter.

In the System control panel, click on the Advanced tab and then click "Environment Variables". An "Environment Variables" dialog will appear. In that dialog, under the label "User Variables for <username>" you'll see a list. What you need to do is click the "New" button under that list.

In the dialog that appears, enter the variable name exactly as:

_NT_SYMBOL_PATH

Note the leading underscore ("_") and the fact that the variable name is ALL UPPER CASE with UNDERSCORES_BETWEEN_WORDS.

The value is in two parts. The first part depends on where PostgreSQL is installed, as described above. The second part tells debugging tools to use a Microsoft server (the Symbol Server) to find out about parts of Windows its self.

Set the value like this, changing the path to the PostgreSQL symbols directory to match your PostgreSQL install location and version. If in doubt, look in Program Files and check. If you set it wrong there will be no error, you'll just get useless backtraces.

Example value, adjust to suit your system:

C:\Program Files\PostgreSQL\9.4\symbols;SRV*c:\localsymbols*http://msdl.microsoft.com/download/symbols;

If you don't have an active Internet connection you can download the Windows symbol package for your version of Windows and install it, but be aware that this will be a BIG download. You'll then need to adjust the symbol path shown above to point to it.

Getting the stack trace

Phew. You now have things configured and can actually start collecting information.

Important: Don't waste your time generating stack traces that're useless. Please read "How to make sure a stack trace is useful" before spending lots of time collecting information or posting to the mailing list. Your stack trace needs to contain "!SymbolicNames" as well as "+offsets" to be much use.

If you are collecting information about a postgresql backend that appears to be in an infinite loop or using too much CPU, you will need to take more than one stack trace. You should try to get quite a few over a short period of time so that developers can tell where it's spending its time. One stack trace alone doesn't really tell you a lot about what it's doing overall.

Getting the process ID of the PostgreSQL backend

Identifying the process ID of the PostgreSQL process you're interested in isn't always easy. If you have an active connection to the backend you want to look at, you can run select pg_backend_pid(); to get it's process ID. Sometimes it's obvious - it's the one using constantly high amounts of CPU as shown in Process Explorer, for example. At other times, though, you'll need to use psql or PgAdmin III to look in the pg_catalog.pg_stat_activity or pg_catalog.pg_locks tables/views to find out the process id ("pid" or "procpid") of the process you're interested in.

Get Process ID

Debugging using Microsoft Visual Studio or VS Express

Windbg is free, quick and easy to install, supports remote debugging, and a real pain to use. You'll want something much nicer to use if you're trying to do more than just get a stack trace - and for some problems, you will indeed need more than a stack trace to know what's going wrong. If you intend to step through the execution of PostgreSQL's code and examine the value of variables, using Microsoft Visual C++ will make it a lot easier.

Microsoft Visual C++ has a much nicer debugger that I highly recommend using in favour of windbg.exe - if you have it available.

The free Visual Studio 2008 Express Edition can be used for many debugging tasks; it lacks remote debugging support and just-in-time postmortem debugging (which the full, paid edition has) but otherwise does a very good job as a user-friendly debugger.

Visual Studio Express 2010 hides, by default, key features like the ability to attach to already-running processes. Select Tools -> Settings -> Expert Settings to make that and other features visible in the usual menus. Visual Studio Express 2012 for Windows Desktop does not require such a configuration change.

Attaching

Once you've installed and configured VC++, launch it and use "Attach to Process" in the Tools menu to connect to the PostgreSQL backend you want. Make sure "show processes from all users" is checked, select the postgres.exe process you want to debug, and click "Attach". You'll get a security warning, which you should read and understand before accepting.

Attaching VC++ 2008 to a process

VC++ will load symbols for PostgreSQL, which may take a few minutes if you haven't used VC++ to debug PostgreSQL before (it'll have to fetch them from the symbol server). Once VC++ finishes loading symbols, execution of the Pg backend will resume and your database system will be working normally.

Breaking into execution

You can now break into the execution of the postgres.exe backend with the break button on the toolbar. It looks like a pause button. If you're debugging a runaway backend, you'll want to break into its execution now. Alternately, if you're debugging a crash, you should trigger the crash now.

If VC++ detects a crash it'll display a dialog like this to give you the chance to break into the process's execution:

VC++ crash notice

If you break into a process's execution while it's running in the kernel you'll see a dialog like this:

Kernel or deadlock warning

... which is common for PostgreSQL since it'll often be waiting for network or disk activity. You can generally ignore it.

Starting debugging

You'll get a dialog telling you that "there is no source code available for your current location". Click OK. Now, on the bottom right (by default) you'll see a set of tabs named "call stack", "breakpoints", and "output". Click on "call stack".

Debug without source

You can see the stack of function calls that were in progress in Pg's main thread at the time you interrupted postgresql. (If you want or need to change threads, use the "threads" tab on the bottom left - double click on the thread of interest to debug it - though you usually want the main thread which is active by default).

In the call stack you can double-click on an entry to see where in the executable the function call arose. If you don't have source code, it'll offer to show you a disassembly, but you have source code for PostgreSQL so you can do better than that. (If you don't have source code for Pg, download and extract the sources for your version now. If you can't open the .tar.gz file, install [www.7-zip.org/|7-zip]). Double-click on the topmost line in the stack trace that starts with "postgres.exe" and a dialog will appear asking for a .c file:

Find source dialog

It'll note the original location of the file at the top of the dialog. In my case, it wants socket.c from src\backend\port\win32\socket.c, so I find the postgresql sources (on my machine: c:\developer\postgresql-8.4.0) and from there open src, then backend, then port then win32. Visual studio notices "win32.c" within the directory and opens it automatically. From now on, during this debugging session VS should find all other postgresql source files too.

VC++ debug with source

Debugging tips

The Visual Studio debugger is far too capable to fully describe here. I'll just give a few hints - and suggest that you use the documentation.

Now that you have sources displayed, you can step through the code line-by-line, run until the end of functions, step into functions, and otherwise control the execution of the PostgreSQL process in detail using the buttons on the toolbar. You can also examine local variables at the stack frame you're on using the "locals" tab on the left, set breakpoints in functions using the "breakpoints" tab on the right, etc.

Adding a breakpoint

For example, if I wanted to pause execution the next time PostgreSQL tried to receive data from the network I might set a breakpoint on the pgwin32_recv function and resume execution:

Adding a breakpoint dialog

If I then issue a query on the backend I'm debugging, Visual Studio will shortly break back into the debugger at the start of the pgwin32_recv function. I can now step through the function's execution to see what functions are called, what values variables take, etc. I can add and remove breakpoints by clicking in the margin on the left of the source file display.

Note that you must be careful not to change the values of any variables or cause execution to skip lines using the "set next statement" option. If you're not careful you could crash the server or possibly severely corrupt your database.

Using windbg

Using windbg.exe to debug PostgreSQL

Using crash dumps to debug random and unpredictable backend crashes

If the crashes appear to be random and you don't know how to trigger them, it's hard to connect a debugger to the problem postgres.exe before it crashes.

Setting your debugger as the JIT (just-in-time) or post-mortem debugger won't help you, because PostgreSQL generally runs as a service under a different user account that cannot interact with the desktop. You could always initdb a new cluster under your normal user account and use pg_ctl to start the postmaster with that cluster manually, so you can JIT debug under your own user account where Pg can interact with the desktop. This isn't suitable for production use, though, and you might not be able to reproduce the problem that way.

In PostgreSQL 9.0 and above there is a crash dump hander included in PostgreSQL. To use it:

  • Create a directory named crashdumps (all lower case) in the PostgreSQL data directory (as shown by SHOW data_directory; in psql)
  • Give the PostgreSQL user (postgres by default) "full control" of it in the security tab of the folder properties
  • Run the problem code. You don't need to restart Pg or change any settings.
  • When a backend crashes, a Windows minidump should be created in the crashdumps directory.

You can debug this file with Microsoft Visual Studio or with windbg, using much the same procedures you use when attaching to a process ID as explained above. Unfortunately the more recent Express editions of Visual Studio do not have the ability to debug minidumps.

Minidumps can also be sent to somebody else to debug. For this to work you must tell the other person the *exact* PostgreSQL version you are using, because the debug symbols vary from minor release to minor release. You can't debug a 9.1.0 minidump with debug symbols taken from a 9.1.1 install. It's often easiest to just send them the .pdb files from your install alongside the minidump.

Debugging crashes during postmaster start

This article discusses debugging of crashes during service startup.

Using Process Explorer to see what running backends are doing

Process Explorer is only useful for getting stack traces from running backends. It is not useful for debugging crashes.

If you intend to use Process Explorer to get a basic stack trace of a running (non-crashed) backend, you need to apply some settings first. You can skip this section if you're going to use Visual Studio (recommended if you have it) or windbg from Debugging Tools for Windows.

Initial Process Explorer configuration

Open Process Exploer from the start menu, use "procexp.exe" in the Start->Run dialog. Read and accept the license agreement if prompted.

Now, in the "options" menu choose "configure symbols". First, note the "symbols path" entry, which should be the same as the value you entered for _NT_SYMBOL_PATH in the System control panel. If it isn't, you've made a mistake setting up the environment variable for the symbol path in the System control panel.

There is another entry in the "configure symbols" dialog labelled "dbghelp.dll path". If this begins with "c:\windows" you'll need to change it to the alternative provided by the Debugging Tools for Windows. The exact location varies depending on your debugging tools version. Use the "..." button, type %ProgramFiles% into the "file name" field and press enter. Now find the debugging tools for windows folder, open it, and find dbghelp.dll. Double click on it. The dbghelp.dll path should've changed to something like:

C:\Program Files\Debugging Tools for Windows (x86)\dbghelp.dll

You can now click "OK".

Checking your Process Explorer setup

You should now be able to look at the stack trace of a commonplace process that's pre-installed on Windows systems. I suggest starting Notepad (start->run->notepad.exe), then launching Process Explorer, double-clicking on "notepad.exe" in the process list, switching to the "threads" tab, and double-clicking on the only entry in the list shown there. The displayed stack trace should look like this (on Windows XP SP3 at least):

ntkrnlpa.exe!KiSwapThread+0x46
ntkrnlpa.exe!KeWaitForSingleObject+0x1c2
win32k.sys!xxxSleepThread+0x192
win32k.sys!xxxRealInternalGetMessage+0x418
win32k.sys!NtUserGetMessage+0x27
ntkrnlpa.exe!KiFastCallEntry+0xf8
ntdll.dll!KiFastSystemCallRet
USER32.dll!NtUserGetMessage+0xc
notepad.exe!WinMain+0xe5
notepad.exe!WinMainCRTStartup+0x174
kernel32.dll!BaseProcessStart+0x23

If it looks like this instead (lots of lines with just "modulename+offset" not "modulename!symbolname+offset"):

ntkrnlpa.exe+0x6a822
ntkrnlpa.exe!ZwYieldExecution+0x1938
ntkrnlpa.exe!ZwYieldExecution+0x19a4
win32k.sys+0x2f35
win32k.sys+0x1b0d
win32k.sys!EngQueryPerformanceCounter+0x5af
ntkrnlpa.exe!KeReleaseInStackQueuedSpinLockFromDpcLevel+0xb14
ntdll.dll!KiFastSystemCallRet
notepad.exe+0x2a1b
notepad.exe+0x7511
kernel32.dll!RegisterWaitForInputIdle+0x49

... then your symbol path is set up wrong or there's something interfering with access to the symbol server. See #No symbolic names? Why?.

You might also see a backtrace like this:


00000000`0013e6b8 00000000`73d5282c wow64cpu!CpupSyscallStub+0x9
00000000`0013e6c0 00000000`73dcd07e wow64cpu!WaitForMultipleObjects32+0x32
00000000`0013e780 00000000`73dcc549 wow64!RunCpuSimulation+0xa
00000000`0013e7d0 00000000`779884c8 wow64!Wow64LdrpInitialize+0x429
00000000`0013ed20 00000000`77987623 ntdll!LdrpInitializeProcess+0x17e2
00000000`0013f220 00000000`7797308e ntdll! ?? ::FNODOBFM::`string'+0x2bea0
00000000`0013f290 00000000`00000000 ntdll!LdrInitializeThunk+0xe

with lines about wow64. This backtrace is useless for debugging. You're debugging a 32-bit binary on 64-bit windows with 64-bit debugging tools. You might be able to get a useful trace by installing and using the 32-bit debugging tools, but I haven't verified that.

Getting a stack trace using Process Explorer

Launch Process Explorer from the Start menu, or by running "procexp.exe".

You'll need to find the postgres.exe instance you're interested in. It might be one showing consistently high CPU usage, but if not, see the instructions above for finding it by process ID.

Once you know which postgres.exe backend you want to examine, double-click on it in the list in Process Explorer. In the dialog that appears open the "Thread" tab. You'll now see a list of threads. In most cases you're interested in the first thread in the list. Select it by clicking on it and click the "stack" button. After a short delay a list will appear in a new dialog, looking something like this:

ntkrnlpa.exe!KiSwapContext+0x2f
ntkrnlpa.exe!KiSwapThread+0x8a
ntkrnlpa.exe!KeWaitForSingleObject+0x1c2
ntkrnlpa.exe!KiSuspendThread+0x18
ntkrnlpa.exe!KiDeliverApc+0x124
ntkrnlpa.exe!KiSwapThread+0xa8
ntkrnlpa.exe!KeWaitForMultipleObjects+0x284
ntkrnlpa.exe!NtWaitForMultipleObjects+0x297
ntkrnlpa.exe!KiFastCallEntry+0xfc
ntdll.dll!KiFastSystemCallRet
ntdll.dll!ZwWaitForMultipleObjects+0xc
kernel32.dll!WaitForMultipleObjectsEx+0x12c
postgres.exe!pgwin32_waitforsinglesocket+0x1f0
postgres.exe!pgwin32_recv+0x90
postgres.exe!secure_read+0x17d
postgres.exe!pq_recvbuf+0x71
postgres.exe!pq_getbyte+0x15
postgres.exe!SocketBackend+0x6
postgres.exe!PostgresMain+0xbe8
postgres.exe!BackendRun+0x204
postgres.exe!SubPostmasterMain+0x224
postgres.exe!main+0x177
postgres.exe!__tmainCRTStartup+0x10f
kernel32.dll!BaseProcessStart+0x23

See how there's different text after the exclamation mark on each line? If that's not the case, your symbol path is probably set up wrong, and Process Explorer can't figure out what the program is doing. Without the text after the exclamation marks the stack trace is mostly useless, so you'll need to collect that information.

You need to copy the whole list. Click on the first entry, hold down the shift key, and press the "end" key (or scroll to the bottom and click the bottom entry), then click the "copy" button.

You can now paste the stack trace into your email program to send it to the PostgreSQL mailing list. By its self it is not useful, so you MUST provide the usual debugging information too - your OS version, PostgreSQL version, what's going wrong, what you're doing when it goes wrong, etc etc. See Guide to reporting problems.

How to make sure a stack trace is useful

A stack trace isn't always trustworthy or useful. This is particularly the case on Windows, where programs generally contain no debugging information and proper stack traces can't reliably be created without it.

If you're going to post a stack trace as part of a bug report or request for help, it's really important that it contains symbolic information about the call path, like the example posted above. If, instead, it contains just module names and numeric offsets within the modules, it doesn't tell the reader anything useful unless they know exactly what versions of the modules you have. Even then, it's difficult and cumbersome to read. So, make sure your stack trace contains lines showing modulename!SymbolicName+offset, like this:

ntkrnlpa.exe!KiSwapThread+0x8a

not just modulename+offset this:

ntkrnlpa.exe+0x8dafe

Take this trace, for example.

ntkrnlpa.exe+0x8dafe
ntkrnlpa.exe+0x29a82
ntkrnlpa.exe+0x33198
hal.dll+0x6199
hal.dll+0x63d9
hal.dll+0x6577
hal.dll+0x3902
postgres.exe!process_implied_equality+0x18d50e

The C function process_implied_equality(...) from backend/optimizer/plan/initsplan.c is deep in the query planner, and is (according to the comments) "currently used only when an EquivalenceClass is found to contain pseudoconstants". It's not going to be calling into hal.dll - not, at least, without a fairly long call chain between it and hal.dll. So we can't trust that postgres.exe was even in the process_implied_equality function when it called into hal.dll and if it was we don't have any idea how it got there. The call trace is basically useless.

Compare it to the example stack trace shown earlier in the article, where we can clearly see that PostgreSQL has called WaitForMultipleObjectsEx (to wait until data is delivered to a socket) via its win32 wrapper layer from the pg_getbyte function ... and so on.

...
kernel32.dll!WaitForMultipleObjectsEx+0x12c
postgres.exe!pgwin32_waitforsinglesocket+0x1f0
postgres.exe!pgwin32_recv+0x90
postgres.exe!secure_read+0x17d
postgres.exe!pq_recvbuf+0x71
postgres.exe!pq_getbyte+0x15
postgres.exe!SocketBackend+0x6
postgres.exe!PostgresMain+0xbe8
...


No symbolic names? Why?

Your stack traces aren't showing !SymbolicNames, and you don't know why. Here are a few possibilities.

No symbols for the module

You might simply not have any way to get symbols for the module you're debugging. If you're finding that most of the stack trace (at least the parts in the system DLLs and NT kernel) have symbolic names shown, but other parts don't, it's quite likely you just have no debugging information (pdb files) for those parts.

This shouldn't be the case for PostgreSQL if you've set up the symbol path to point to the symbols for your version of PostgreSQL as well as the Microsoft Symbol Server, though.

_NT_SYMBOL_PATH not set

Are you sure you set the _NT_SYMBOL_PATH environment variable?

In the Start menu choose Run, then enter "cmd" and press enter.

In the resulting window, type in exactly:

set | findstr symbol

You should get the following result, allowing for differences in PostgreSQL versions, drive letters, etc:

_NT_SYMBOL_PATH=c:\program files\postgresql\8.4\symbols;SRV*c:\localsymbols*http://msdl.microsoft.com/download/symbols;

Note that _NT_SYMBOL_PATH is in capitals, with underscores between words and a leading underscore.

The same string should show up in Process Explorer, in Configure Symbols under the Options menu.

Add-in software firewall

Do you have an add-in software firewall installed? The firewall built in to Windows will not cause problems as it only affects listening sockets and incoming connections, but 3rd party firewalls often block outbound connections too. This may prevent access to the symbol server from working.

Make sure your software firewall allows access to msdl.microsoft.com on port 80 (http), and/or add exemptions for procexp.exe and windbg.exe.


Initially written by Craig Ringer --Ringerc 10:06, 29 July 2009 (UTC)
Sections on remote debugging, software firewall, troubleshooting bad stack traces, identifying a useful stack trace, and debugging with Visual Studio added --Ringerc 06:25, 22 August 2009 (UTC)