From PostgreSQL wiki

Jump to: navigation, search



Should \dx show the highest available version, in addition to installed version?

overload to allow @> to have a scalar on the right hand side, rather than having to wrap your scalar into a degenerate single-element array. (tszrange is already overloaded this way)

shouldn't \df+ show the procost?



since 9.4, the xmin no longer tells you if a tuple is frozen, you have to use pageinspect.

SELECT (to_hex(t_infomask & x'300'::int) = '300') as frozen FROM heap_page_items(get_raw_page('foo', 0));


vacuuming doesn't try to advance the frozenxid until it was completely vacuumed the entire database. If vacuum is persistently interrupted (i.e. weekly cold backup), it might never complete the entire database in one sitting.



checkpoint_sync_time is the time it spent inside sync calls.

checkpoint_write_time is the time it spent during the write phase, most of that time is probably spent sleeping between writes, so there is no way from this view to know how hard it is working to get the job done.


This sometimes doesn't seem to do what it should. If many buffers marked BM_CHECKPOINT_NEEDED get written out by the bgwriter or the backends, they don't get count as written by the checkpointer, so it always thinks it is behind, and works too quickly. It would nice if outsiders would decrement num_to_write when they write a BM_CHECKPOINT_NEEDED buffer, but there is no easy way to accomplish this. If most of the writing is being done by others, there is probably no harm in having the checkpointer finish early, but it is confusing when looking at the log files under log_checkpoints=on.

(Also, it always writes 10 buffers per second, even if that means it will finish early)


Since the fsync queue compaction logic went it, the background writer seems rather useless. The problem is that when writes are free flowing (the kernel absorbs them without blocking), they are so fast that offloading them from the backends is pointless, and when IO is congested there is no hope of the background writer keeping up. If the backends were changed to skip dirty pages in the clocksweep, rather than clean them themselves (and thus block), perhaps bgwriter would be useful again.

gin indexes

initial build

When building a gin index, it doesn't use a tuplesort on all the key-tid pairs, rather it does semi-bulk insertions one maintenance_work_mem chunk at a time. This is better than doing retail insertion, but probably not as good as doing tapesort. Why not use tuplesort?

WAL logging initial build

Can wal logging the initial build be skipped and replaced by fsyncing the index files when wal_level = minimal, as is currently done for btree? I don't see why not. but btree builds the index outside of shared_buffers, so unless gin switches to do that, then at the end it would have to scour shared_buffers before the fsync.


if the estimated plan cost is > some threshold, increase join_collapse_limit and try again? If the execution is going to be expensive, why not spend more time planning?


explain analyze

on a line with multiple loops, which things get amortized and which things get summed?


statement text

If someone prepares a bunch of statements by sending semicolon separated block of prepares to the server in a single PGexec, then pg_stat_statements reports the entire block of prepares text as the query for each one, and it is hard to figure out which statement was the one actually being executed for that time report. (The query reported by pg_stat_activity has the same problem). How hard would it be to fix this? osm2pgsql is good example of something doing this.

COPY timing

It seems that the total execution time for copy statements includes the time spent waiting for the client to send more data.

unlogged tables

A transaction which only changes unlogged tables still generates WAL. That is because the commit itself generates a WAL message (only with wal_level > minimal ?). Fortunately this bare commit message it not synchronously committed, but it still is enough to trigger log swaps on otherwise idle systems that use archive_timeout. Which is unfortunate because sometimes that is exactly the thing I wanted to avoid by using unlogged tables.


Where it says "Decrypting bytea data with pgp_pub_decrypt is disallowed", what it means it decrypting data encrypted with the pgp binary flag (i.e. without the gpg --textmode flag), even if the data happens to have no weird characters in it. Calling this bytea is unfortunate, as that is a postgresql term, not a pgp term. If you try to do this anyway, the error message is rather unhelpful (even if you pass debug=1 in the options argument).

There is no way to decrypt a signed and encrypted message without verifying the signature. pgp_sym_decrypt fails for unknown reason, and pgp_sym_decrypt_verify fails if not given the public key.

Can't verify signatures made with subkeys, or encrypt to public keys which have signature subkeys. This is a substantial limit. I wouldn't import my private key into postgresql, but rather make a app-specific one which just doesn't use signing subkeys. But for public keys, you are at the mercy of the other person, who probably won't like being told not to use an signing subkey.


~/.gdbinit, for when looking at something not compiled with -ggdb3

set history filename ~/.gdb_history
set history save on
macro define T_NAME(lock) (LWLockTrancheArray[(lock)->tranche]->name)
macro define T_ID(lock) ((int) ((((char *) lock) - ((char *) LWLockTrancheArray[(lock)->tranche]->array_base)) / LWLockTrancheArray[(lock)->tranche]->array_stride))

dynamic stack traces

To get a strack trace at a certain point in the code, without needing to catch it via `gdb` at that exact point

+ #include <execinfo.h>
+ void
+ print_trace (void)
+ {
+   void *array[100];
+   size_t size;
+   char **strings;
+   size_t i;

+   size = backtrace (array, 100);
+   strings = backtrace_symbols (array, size);
+   elog(LOG, "JJ Obtained %zd stack frames.", size);
+   for (i = 0; i < size; i++)
+      elog(LOG, "%s", strings[i]);
+   free (strings);
+ }
Personal tools