Zheap

From PostgreSQL wiki

Jump to: navigation, search

Contents

Overview

The purpose of this page is to track the pending code items and open issues in zheap. We have also mentioned about some of the points that need to be considered for integrating with Pluggable storage API.

Pending Items

  • Currently we have fixed number of transaction slots on a page, so multiple transactions operating on a page can lead to deadlock. The simplest case says transaction T-1 acquires a slot on Page p-1 and is waiting for acquiring a slot on Page-2, transaction T-2 has acquired a slot on P-2 and is waiting to acquire on P-1. So, both the transactions will wait for each other which will result in deadlock. We are planning to add a mechanism to allow the array of transactions slots to be continued on a separate overflow page. We also need such a mechanism to support cases where a large number of transactions acquire SHARE or KEY SHARE locks on a single page.
  • Delete marking in indexes: This will allow inplace updates even when index columns are updated and additionally with this we can avoid the need for a dedicated vacuum process to perform retail deletes.
  • Free Space Map: For the current heap, vacuum takes care of updating free space map. In zheap, individual operation will optimistically update the freespace map when it removes the tuples from a page in the hope that eventually most of the transactions will commit and space will be available.
  • Alignment padding: We would like to eliminate most of the alignment padding in the tuple. Currently, we have a very crude implementation of this in the code which allows 1-byte, 4-byte or 8-byte padding depending on a GUC variable. However, we want it to be without any GUC such that padding will be only done for the columns that are varlenas with 4-byte headers or fixed-length pass-by-reference types (e.g. interval, box).
  • Page wise undo: To save locking and repeated writing of same page, we want to collect all the undo records that belong to same page and then apply them together. Currently, we collect all consecutive records which apply to the same page and then apply them at one shot. This will be okay for cases where most of the changes to heap pages are performed together, but if the changes are randomly distributed across the undo of a transaction, it won’t work. So, we want some efficient mechanism to collect all the undo records that belong to a page.
  • It is unclear at this stage how visibility maps will work with zheap, but we have kept some code related to visibility maps in zheap API’s in the hope that we need it for indexonlyscans of indexes that don’t support delete marking. I think we will get more clarity once we implement two-pass vacuum.
  • We have to perform additional buffer locking to do the allocation for tuple before zheap_lock_tuple which can be fixed if we change that API. However, we want to be compatible with existing heap_lock_tuple. During integration with storage API, we need some work to make a standard API, so that we can avoid additional allocation and locking.
  • RLS: We have yet not investigated whether any changes are required to make it work with zheap. We expect the changes required if any to support this will be some sort of tuple conversion work.

Open Issues

  • We are locking the undo buffers in critical section during InsertPreparedUndo. It should be done outside critical section as per locking protocol. Locking the buffer can give an error, so we shouldn’t do it inside the critical section.
  • While replaying the WAL for transaction slots that got reused, we need to ensure that in hotstandby mode, there are no running queries which can see that transaction. The mechanism works in general, but we might need some handling for transaction wraparound cases. See zheap_xlog_freeze_xact_slot.
  • Rollbacks for crashed transactions will be performed after recovery and currently, we can only rollback the transactions which happen on Postgres database. Undo worker is always connected to Postgres database, so it can't be dropped.

Integration with Pluggable Storage API

  • Currently, we don't have a nice way to handle different type of heaps in the backend. So, we have used storage engine option to define different code path for zheap. To integrate it with storage API, we need to check for all the places where RelationStorageIsZHeap is used.
  • HeapTuple is used widely in the backend, so instead of changing all such places, we have written converter functions zheap_to_heap, heap_to_zheap which will convert tuples from one format to another. To integrate it with storage API, we need to check for all the places where these API's are used.
  • Currently, we have stored ZHeapTuple in TupleTableSlot as a separate variable which doesn't appear to be the best way. We would like to integrate it with storage API. Andres has proposed an idea for the same.
  • Snapshot satisfies API's - The snapshot mechanism works differently in zheap as we need to traverse the undo chain to check if the prior tuple is visible to a snapshot.
Personal tools