From PostgreSQL wiki
vacuuming doesn't try to advance the frozenxid until it was completely vacuumed the entire database. If vacuum is persistently interrupted (i.e. weekly cold backup), it might never complete the entire database in one sitting.
checkpoint_sync_time is the time it spent inside sync calls.
checkpoint_write_time is the time it spent during the write phase, most of that time is probably spent sleeping between writes, so there is no way from this view to know how hard it is working to get the job done.
This sometimes doesn't seem to do what it should. If many buffers marked BM_CHECKPOINT_NEEDED get written out by the bgwriter or the backends, they don't get count as written by the checkpointer, so it always thinks it is behind, and works too quickly. It would nice if outsiders would decrement num_to_write when they write a BM_CHECKPOINT_NEEDED buffer, but there is no easy way to accomplish this. If most of the writing is being done by others, there is probably no harm in having the checkpointer finish early, but it is confusing when looking at the log files under log_checkpoints=on.
(Also, it always writes 10 buffers per second, even if that means it will finish early)
Since the fsync queue compaction logic went it, the background writer seems rather useless. The problem is that when writes are free flowing (the kernel absorbs them without blocking), they are so fast that offloading them from the backends is pointless, and when IO is congested there is no hope of the background writer keeping up. If the backends were changed to skip dirty pages in the clocksweep, rather than clean them themselves (and thus block), perhaps bgwriter would be useful again.
When building a gin index, it doesn't use a tuplesort on all the key-tid pairs, rather it does semi-bulk insertions one maintenance_work_mem chunk at a time. This is better than doing retail insertion, but probably not as good as doing tapesort. Why not use tuplesort?
WAL logging initial build
Can wal logging the initial build be skipped and replaced by fsyncing the index files when wal_level = minimal, as is currently done for btree? I don't see why not. but btree builds the index outside of shared_buffers, so unless gin switches to do that, then at the end it would have to scour shared_buffers before the fsync.
if the estimated plan cost is > some threshold, increase join_collapse_limit and try again? If the execution is going to be expensive, why not spend more time planning?
If someone prepares a bunch of statements by sending semicolon separated block of prepares to the server in a single PGexec, then pg_stat_statements reports the entire block of prepares text as the query for each one, and it is hard to figure out which statement was the one actually being executed for that time report. (The query reported by pg_stat_activity has the same problem). How hard would it be to fix this? osm2pgsql is good example of something doing this.
It seems that the total execution time for copy statements includes the time spent waiting for the client to send more data.
A transaction which only changes unlogged tables still generates WAL. That is because the commit itself generates a WAL message (only with wal_level > minimal ?). Fortunately this bare commit message it not synchronously committed, but it still is enough to trigger log swaps on otherwise idle systems that use archive_timeout. Which is unfortunate because sometimes that is exactly the thing I wanted to avoid by using unlogged tables.