MultiXact Bugs
From PostgreSQL wiki
Jump to navigationJump to searchOutstanding Issues
- if members (or offsets?) SLRU is completely empty and some other conditions hold we might truncate nothing instead of nearly everything
- Might have been fixed by making truncation separately WAL-logged; need to check details (email from Thomas Munro).
- bad arithmetic if there are no offsets files found
- SetMultiXactIdLimit advances the stop/warn/wrap limits but we really shouldn't advance those until we truncate
- Might need to update shared memory limits in the same critical section that performs the truncation.
- after Andres's patch, PerformMembersTruncation happens just after SetMultiXactIdLimit, but is that good enough?
- Andres's patch logs which members should be removed rather than, as we do for other SLRUs, what should be kept.
- Since we use all the members space, we can't rely on comparisons that assume a circular numbering space to give us correct results.
- GetMultiXactMembers can't handle zero offsets resulting from crashes
Other Ideas
- Reduce or eliminate the 10 million limit on autovacuum_multixact_freeze_max_age.
- This would make testing easier.
- It might actually be useful to people with really large MultiXacts.
- Add a burn_slru module to facilitate testing.
- Consume XIDs, MXIDs, etc. at accelerated rates.
- Or maybe just "burn" so we can burn OIDs etc. too?
- Document the new truncation model
Fixed Issues
- emergency autovacuum may kick in too aggressively when find_multixact_start fails after having previously succeeded
- emergency autovacuum may not kicking in aggressively enough due to kicking postmaster only every 64k multixacts
- This is a problem if the typical multixact has many members.
- http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=667912aee649c3608e003568e4b47d95251b1c8c
- spurious errors 'LOG: could not truncate directory "pg_multixact/offsets": apparent wraparound' (also same for pg_subtrans)
- In-memory pg_multixact/members buffers can be pointing to already-truncated pages
- This is simple to fix, and currently hard to hit. But if you move truncation to happen other than at checkpoints, it gets much easier to hit.
- http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=4f627f897367f15702d59973f75f6391d5d3e06f
- find_multixact_start may try to consult on-disk state that hasn't been written yet
- We can fix this either by flushing the SLRU pages to disk before calling SlruDoesPhysicalPageExist, or by checking whether the page is in memory in addition to checking whether it is on disk
- http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=4f627f897367f15702d59973f75f6391d5d3e06f
- pg_multixact SLRUs may be consulted during recovery prior to consistency
- Using WAL for truncation implicitly fixes this.
- http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=4f627f897367f15702d59973f75f6391d5d3e06f
- when restarting recovery from an earlier checkpoint, we may truncate data "from the future"
- Using WAL for truncation implicitly fixes this.
- http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=4f627f897367f15702d59973f75f6391d5d3e06f
- we'd like to be able to advance the limits without needing a checkpoint
- one problem with needing a checkpoint: if we're triggering emergency autovacuums, we'll keep triggering them uselessly even when there's nothing more for vacuum to clean up
- Using WAL for truncation implicitly fixes this.
- http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=4f627f897367f15702d59973f75f6391d5d3e06f