Valgrind

From PostgreSQL wiki
Jump to navigationJump to search

Valgrind and Postgres

Postgres supports Valgrind memcheck directly - it is possible to include "client requests" in the memory allocator, providing detection of many additional memory others that would otherwise not be detected in vanilla builds. See src/include/pg_config_manual.h for full details of how to build Postgres with support for Valgrind memcheck instrumentation.

You should normally use MEMORY_CONTEXT_CHECKING with USE_VALGRIND; instrumentation of repalloc() is inferior without it.

Known Bugs

If you observe core dumps in autovacuum while running under Valgrind on x86_64 hardware, it's probably a known bug in valgrind 3.8.1 and earlier; see https://bugs.kde.org/show_bug.cgi?id=280114. If you're prepared to recompile Valgrind, apply the one-line patch shown there. Otherwise, the simplest answer is to set autovacuum = off in postgresql.conf while using Valgrind. However, it's not clear that "fix" will hide all instances of the issue.

Using Valgrind

One effective approach to gathering Valgrind memcheck instrumentation while running the regression tests is outlined here. A binary with debug symbols produces source file line numbers detail.

A Memcheck-hosted postgres can be started like this:

  valgrind --leak-check=no --gen-suppressions=all \
    --suppressions=src/tools/valgrind.supp --time-stamp=yes \
    --error-markers=VALGRINDERROR-BEGIN,VALGRINDERROR-END \
    --log-file=$HOME/pg-valgrind/%p.log --trace-children=yes \
    postgres --log_line_prefix="%m %p " \
    --log_statement=all --shared_buffers=64MB 2>&1 | tee $HOME/pg-valgrind/postmaster.log

. Run the regression tests:

  make installcheck

If that detected an error, and more details are required for debugging purposes, it is then possible to rerun a smaller test case with "--track-origins=yes --read-var-info=yes" flags also added. That slows things noticeably but gives more specific messaging. For more information, see the Valgrind documentation.

Run individual subsets of the regression tests, to limit the duration of testing:

  make installcheck-tests TESTS="json combocid"

Not all tests in the regression tests are capable of being run on their own like this - you may wish to verify that each test passes without Postgres running through Valgrind first. (The file src/test/regress/parallel_schedule should give you some idea of the dependencies that the test of interest may have on other tests).

Additional suppressions

You may need additional local suppressions. If you get complaints about wcstombs and related functions, consider this addition:

# wcsrtombs uses some clever optimizations internally, which to valgrind
# may look like access to uninitialized data. For example AVX2 instructions
# load data in 256-bit chunks, irrespectedly of wchar length. gconv does
# somethink similar by loading data in 32bit chunks and then shifting the
# data internally. Neither of those actually uses the uninitialized part
# of the buffer, as far as we know.
#
# https://www.postgresql.org/message-id/90ac0452-e907-e7a4-b3c8-15bd33780e62@2ndquadrant.com

{
   wcsnlen_optimized
   Memcheck:Cond
   fun:__wcsnlen_avx2
   fun:wcsrtombs
   fun:wcstombs
   fun:wchar2char
}

{
   wcsnlen_optimized_addr32
   Memcheck:Addr32
   fun:__wcsnlen_avx2
   fun:wcsrtombs
   fun:wcstombs
   fun:wchar2char
}

{
   gconv_transform_internal
   Memcheck:Cond
   fun:__gconv_transform_internal_utf8
   fun:wcsrtombs
   fun:wcstombs
   fun:wchar2char
}

(In my system, this is not enough, because wchar2char seems to be inlined, so I get additional reports about wcstombs being called by functions that call wchar2char. So you might need to remove the last line of each proposed suppression.)

Troubleshooting

If you get an error like

   valgrind: mmap(0x58000000, 2347008) failed in UME with error 22 (Invalid argument).
   valgrind: this can be caused by executables with very large text, data or bss segments.

the most likely cuplrit is an attempt to run valgrind under valgrind, likely via a wrapper script. If you're using a wrapper script for pg_regress, for example, make sure you use which -a or change the PATH to find the next postgres, rather than re-executing your wrapper script.