Zheap

Overview

The purpose of this page is to track the pending code items and open issues in zheap. We have also mentioned about some of the points that need to be considered for integrating with Pluggable storage API.

Pending Items

Delete marking in indexes: This will allow inplace updates even when index columns are updated and additionally with this we can avoid the need for a dedicated vacuum process to perform retail deletes.

Page wise undo: To save locking and repeated writing of same page, we want to collect all the undo records that belong to same page and then apply them together. Currently, we collect all consecutive records which apply to the same page and then apply them at one shot. This will be okay for cases where most of the changes to heap pages are performed together, but if the changes are randomly distributed across the undo of a transaction, it won’t work. So, we want some efficient mechanism to collect all the undo records that belong to a page.

We have to perform additional buffer locking to do the allocation for tuple before zheap_lock_tuple which can be fixed if we change that API. However, we want to be compatible with existing heap_lock_tuple. During integration with storage API, we need some work to make a standard API, so that we can avoid additional allocation and locking.

Open Issues

More testing is needed for recovery and rollbacks. We will not be surprised if we some issues in that area.

Integration with Pluggable Storage API

Currently, we don't have a nice way to handle different type of heaps in the backend. So, we have used storage engine option to define different code path for zheap. To integrate it with storage API, we need to check for all the places where RelationStorageIsZHeap is used.

HeapTuple is used widely in the backend, so instead of changing all such places, we have written converter functions zheap_to_heap, heap_to_zheap which will convert tuples from one format to another. To integrate it with storage API, we need to check for all the places where these API's are used.

Currently, we have stored ZHeapTuple in TupleTableSlot as a separate variable which doesn't appear to be the best way. We would like to integrate it with storage API. Andres has proposed an idea for the same.
- TupleTableSlot abstraction

Snapshot satisfies API's - The snapshot mechanism works differently in zheap as we need to traverse the undo chain to check if the prior tuple is visible to a snapshot.

Code

The original implementation (developed by EDB):[1]
Cybertec fork, currently merged into PostgreSQL 14.1: [2]

It includes the following changes:

- The new undo log infrastructure incorporated. The original version was designed by EDB. The current version includes the following changes:
  - checkpoint does not include the undo request metadata. Instead, the undo log itself is used to find the data changes that need to be rolled back during server restart.
  - preprocessor constant (UNDO_DEBUG) added. If it's defined during the build, the undo record set chunks are much smaller so that various corner cases can be tested.
  - temporary undo is discarded (and the undo segments unlinked) by the backend that wrote the undo records.
  - test framework for the undo log reader (undoread.c) added
  - draft of the pg_undodump tool added, as well as views to read the undo contents from memory (pg_stat_undo_chunks, pg_stat_undo_records)
  - draft of the undo discard worker.
- Implemented logical decoding for zheap tables.

Zheap

Contents

Overview

Pending Items

Open Issues

Integration with Pluggable Storage API

Code

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Tools

Search