VacuumHeadaches

From PostgreSQL wiki

(Difference between revisions)
Jump to: navigation, search
(CREATE TABLE: Add note about temporary tables never being autoanalyzed.)
(VACUUM & Autovacuum: Add "Disk Writes Multiplied" section.)
Line 52: Line 52:
  
 
Problem: automated vacuum and non-blocking vacuum has no way to shrink files which have become severely bloated.  VACUUM FULL, which does so, takes an ACCESS EXCLUSIVE lock for the duration of the operation and so cannot be automated.  pg_repack, which is more background-friendly, is an extension.
 
Problem: automated vacuum and non-blocking vacuum has no way to shrink files which have become severely bloated.  VACUUM FULL, which does so, takes an ACCESS EXCLUSIVE lock for the duration of the operation and so cannot be automated.  pg_repack, which is more background-friendly, is an extension.
 +
 +
=== Disk Writes Multiplied ===
 +
 +
Unless inserted as part of a bulk load, rows which are not updated or deleted for long enough to be frozen, are written at least five times:
 +
# the tuple is WAL-logged as "transaction pending"
 +
# the tuple is written to the heap
 +
# the tuple is updated in place to set "committed" hint bit
 +
# the tuple is WAL-logged as "frozen"
 +
# the tuple is updated in place as "frozen"
 +
This does not count the writes required to maintain the visibility map or "all visible" page flag, nor the increased writes needed for some WAL archiving and backup techniques.  If an "update in place" step occurs soon enough after the previous write of a tuple, the actual disk I/O may be absorbed by a cache, eliminating actual disk I/O; but no effort is made to make that happen.
  
 
== ANALYZE & Autoanalyze ==
 
== ANALYZE & Autoanalyze ==

Revision as of 11:09, 2 June 2013

Contents

VACUUM & ANALYZE Headaches

This page is here to collect all of the problems, and possible solutions, for the reasons VACUUM and ANALYZE are still a headache. Issues around PostgreSQL's use of statistics, while substantial, are specifically excluded from this page in order to manage the scope of work.

VACUUM & Autovacuum

FREEZE

FREEZE Cost

Problem: thanks to the visibility map and other optimizations, data pages which are infrequently (or never) updated or deleted from will not be vacuum frozen until they reach autovacuum_max_freeze_age. This causes a huge resource hit on the database server as the entire table, which might not be in cache, is read and re-written.

Possible Solutions and Mitigations:

FREEZE rescanning

Problem: vacuum currently has no knowledge of which data pages may already have all rows frozen. This means that the entire relation must be rescanned each time freeze runs, even if the vast majority of tuples are already frozen.

FREEZE WAL volume

Problem: when a tuple is frozen, the entire tuple is WAL-logged (possibly causing a full-page write), even though the tuple may have seen no activity for a long time. The large volume of WAL which can be generated can overwhelm write caches, causing latency spkes, and can cause gluts in WAL-based replication streams, leading to confusing short term lag increases.

FREEZE base backup impact

When a technique (such as rsync) is used to transmit only changed relation files, or changed portions of relation files, over a slow communicatioins path, data is sent an extra time after a FREEZE. This potentially doubles the bandwidth required to support capture of a base backup.

CLOG contention

Until transactions covered by a CLOG segment are known to be frozen, the segment cannot be freed. With enough segments active, contention for CLOG cache space can become a performance bottleneck.

Visibility Vacuuming and Index-Only Scan

Problem: As of 9.2, there's a significant benefit to adding pages to the Visibility Map so that index-only scan is enabled. However, this only gets done for pages which were being vacuumed anyway, so pages with no updates or deletes on them don't get set allvisible unless the user calls a manual VACUUM. This makes index-only scan a lot less useful than it could be.

Batch Processing

Problem: Autovacuum does not work well for workloads which involve a lot of batch processing. Autovacuum tends to kick in in the middle of the batch (assuming it's not a single transaction), and/or to not notice when a batch is finished. As a result, users with batch-load databases currently turn off autovacuum and have to do manual vacuuming.

Autovacuuming and Server Load

Problem: Autovacuum currently has no information on the current server "load". As a result, it tends to kick in at peak load times (since that's when the most rows a changing), making bottlenecking worse on user applications. It is also unable to take advantage of low traffic periods to get caught up on a long pending queue.

Long-Running Transactions

Problem: Users accidentally leave transactions open for hours or days, which prevents autovacuum from vacuuming the tables touched by the open transaction. When the transaction is closed, it's often after bloat has occurred. It also prevents marking XIDs as all-visible and shortening HOT chains.

Difficult Tuning, No Visibility

Problem: tuning autovacuum's behavior is arcane and poorly understood, and there's not good statistics to check if your tuning actually improved things. Setting autovacuum tuning for specific tables is even more problematic, especially since it requires an ACCESS EXCLUSIVE lock on the tables themselves to tweak.

Shrinking Files

Problem: automated vacuum and non-blocking vacuum has no way to shrink files which have become severely bloated. VACUUM FULL, which does so, takes an ACCESS EXCLUSIVE lock for the duration of the operation and so cannot be automated. pg_repack, which is more background-friendly, is an extension.

Disk Writes Multiplied

Unless inserted as part of a bulk load, rows which are not updated or deleted for long enough to be frozen, are written at least five times:

  1. the tuple is WAL-logged as "transaction pending"
  2. the tuple is written to the heap
  3. the tuple is updated in place to set "committed" hint bit
  4. the tuple is WAL-logged as "frozen"
  5. the tuple is updated in place as "frozen"

This does not count the writes required to maintain the visibility map or "all visible" page flag, nor the increased writes needed for some WAL archiving and backup techniques. If an "update in place" step occurs soon enough after the previous write of a tuple, the actual disk I/O may be absorbed by a cache, eliminating actual disk I/O; but no effort is made to make that happen.

ANALYZE & Autoanalyze

Difficult Tuning, No Visibility

Problem: Same as Autovacuum.

Autoanalyze Frequency

Problem: Autoanalyze is governed by the same threshold+percentage logic as autovacuum. This tends to result in very small tables never being analyzed at all, and large tables being analyzed too infrequently. This forces users to do manual analyzes and/or per-table tuning.

Sample Sizes not Scaled To Table

Problem: Our sample size, statstics_target, is set at a uniform size for the whole database. This forces a user to choose between: to many samples for small tables, too few for large tables, or doing a lot of per-table tuning.

CREATE TABLE

Problem: if you CREATE and populate a table in the same transaction, there are no statistics on it until Autoanalyze gets around to it. This means PostgreSQL estimates it at 1000 rows, often creating quite bad query plans. Since users frequently want to use the new table right after creation, this is a frequent issue not just for regular tables but also temporary tables which are never autoanalysed during their lifetime and therefore always need to be manually analysed to avoid undesirable plans.

Batch Processing

Problem: Same as autovacuum.

Personal tools