- 1 VACUUM & ANALYZE Headaches
- 1.1 VACUUM & Autovacuum
- 1.1.1 FREEZE
- 1.1.2 Visibility Vacuuming and Index-Only Scan
- 1.1.3 Batch Processing
- 1.1.4 Autovacuuming and Server Load
- 1.1.5 Long-Running Transactions
- 1.1.6 Difficult Tuning, No Visibility
- 1.1.7 Perverse Feedback
- 1.1.8 Shrinking Files
- 1.1.9 Disk Writes Multiplied
- 1.1.10 Read-Only Access to Cached Data I/O Bound
- 1.2 ANALYZE & Autoanalyze
- 1.1 VACUUM & Autovacuum
VACUUM & ANALYZE Headaches
This page is here to collect all of the problems, and possible solutions, for the reasons VACUUM and ANALYZE are still a headache. Issues around PostgreSQL's use of statistics, while substantial, are specifically excluded from this page in order to manage the scope of work.
VACUUM & Autovacuum
Problem: thanks to the visibility map and other optimizations, data pages which are infrequently (or never) updated or deleted from will not be vacuum frozen until they reach autovacuum_max_freeze_age. This causes a huge resource hit on the database server as the entire table, which might not be in cache, is read and re-written.
Possible Solutions and Mitigations:
Problem: vacuum currently has no knowledge of which data pages may already have all rows frozen. This means that the entire relation must be rescanned each time freeze runs, even if the vast majority of tuples are already frozen.
FREEZE WAL volume
Problem: when a tuple is frozen, the entire tuple is WAL-logged (possibly causing a full-page write), even though the tuple may have seen no activity for a long time. The large volume of WAL which can be generated can overwhelm write caches, causing latency spkes, and can cause gluts in WAL-based replication streams, leading to confusing short term lag increases.
FREEZE base backup impact
When a technique (such as rsync) is used to transmit only changed relation files, or changed portions of relation files, over a slow communicatioins path, data is sent an extra time after a FREEZE. This potentially doubles the bandwidth required to support capture of a base backup.
Until transactions covered by a CLOG segment are known to be frozen, the segment cannot be freed. With enough segments active, contention for CLOG cache space can become a performance bottleneck.
Visibility Vacuuming and Index-Only Scan
Problem: As of 9.2, there's a significant benefit to adding pages to the Visibility Map so that index-only scan is enabled. However, this only gets done for pages which were being vacuumed anyway, so pages with no updates or deletes on them don't get set allvisible unless the user calls a manual VACUUM. This makes index-only scan a lot less useful than it could be.
Problem: Autovacuum does not work well for workloads which involve a lot of batch processing. Autovacuum tends to kick in in the middle of the batch (assuming it's not a single transaction), and/or to not notice when a batch is finished. As a result, users with batch-load databases currently turn off autovacuum and have to do manual vacuuming.
Autovacuuming and Server Load
Problem: Autovacuum currently has no information on the current server "load". As a result, it tends to kick in at peak load times (since that's when the most rows a changing), making bottlenecking worse on user applications. It is also unable to take advantage of low traffic periods to get caught up on a long pending queue.
Problem: Users accidentally leave transactions open for hours or days, which prevents autovacuum from vacuuming the tables touched by the open transaction. When the transaction is closed, it's often after bloat has occurred. It also prevents marking XIDs as all-visible and shortening HOT chains.
Difficult Tuning, No Visibility
Problem: tuning autovacuum's behavior is arcane and poorly understood, and there's not good statistics to check if your tuning actually improved things. Setting autovacuum tuning for specific tables is even more problematic, especially since it requires an ACCESS EXCLUSIVE lock on the tables themselves to tweak.
If autovacuum is not keeping up, the database may bloat, leading to poor performance. When users look at what is using resources, they see autovacuum struggling to improve things, blame its load for their predicament, and make it less aggressive -- or even turn it off. This gives some immediate relief, so the user doen't know that they have made the fundamental problem worse, and won't associate the even poorer performance they see later as being caused by this action.
Problem: automated vacuum and non-blocking vacuum has no way to shrink files which have become severely bloated. VACUUM FULL, which does so, takes an ACCESS EXCLUSIVE lock for the duration of the operation and so cannot be automated. pg_repack, which is more background-friendly, is an extension.
Disk Writes Multiplied
Unless inserted as part of a bulk load, rows which are not updated or deleted for long enough to be frozen are written at least five times:
- the tuple is WAL-logged as "transaction pending"
- the tuple is written to the heap
- the tuple is updated in place to set "committed" hint bit
- the tuple is WAL-logged as "frozen"
- the tuple is updated in place as "frozen"
This does not count the writes required to maintain the visibility map or "all visible" page flag, nor the increased writes needed for some WAL archiving and backup techniques. If an "update in place" step occurs soon enough after the previous write of a tuple, the actual disk I/O may be absorbed by a cache, eliminating actual disk I/O; but no effort is made to make that happen.
Read-Only Access to Cached Data I/O Bound
It can confuse users when a SELECT of cached data causes a spike in disk writes for the querying process, due to hint bit setting.
ANALYZE & Autoanalyze
Difficult Tuning, No Visibility
Problem: Same as Autovacuum.
Problem: Autoanalyze is governed by the same threshold+percentage logic as autovacuum. This tends to result in very small tables never being analyzed at all, and large tables being analyzed too infrequently. This forces users to do manual analyzes and/or per-table tuning.
Sample Sizes not Scaled To Table
Problem: Our sample size, statstics_target, is set at a uniform size for the whole database. This forces a user to choose between: to many samples for small tables, too few for large tables, or doing a lot of per-table tuning.
Problem: if you CREATE and populate a table in the same transaction, there are no statistics on it until Autoanalyze gets around to it. This means PostgreSQL estimates it at 1000 rows, often creating quite bad query plans. Since users frequently want to use the new table right after creation, this is a frequent issue not just for regular tables but also temporary tables which are never autoanalysed during their lifetime and therefore always need to be manually analysed to avoid undesirable plans.
Problem: Same as autovacuum.