Pgcon2014unconferenceBitmapIndexOnlyScan

From PostgreSQL wiki

Revision as of 13:08, 25 May 2014 by Markuswinand (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Bitmap Index Only Scan

Topic of this session was how to avoid unnecessary heap fetches from a Bitmap Heap Scan when an Index Only Scan is possible too (see slides).

Irrespective of the question whether the proposed approach is favorable or not, it seems to have show-stopper: when the "lossy" bitmap kicks in, there is no way of telling the Bitmap Heap Scan node not to emit rows that were already emitted by the Bitmap Index Only Scan (if "all-visible"). No promising solution to this problem was found in this session. Deciding to not use the lossy algorithm in advance during planning can't be relied on because the estimates might just be wrong. "Flushing" the bitmap and starting with a fresh bitmap again should work, but might lead to double page access which is pretty much what we try to avoid in the first place (it would be hard to prove that this approach is an improvement).

The concern was raised that it might be best to avoid Bitmap Index/Heap Scan entirely in case an Index Only Scan is possible. That lead to the question if it could just be solved by reducing the cost of an Index Only Scan.

In the very superficial benchmark shown in the session (see below), it was also visible that the Index Only Scan plan had a faster execution time as the corresponding Bitmap Index/Heap Scan plan (disproportionately higher as compared to the IO difference in an all-cached scenario). That lead to a discussion that ultimately ended up way beyond the original goal of this session: different cost for different operator classes, collecting anonymized "explain analyze" from the field...

Sumary

Proposed approach seems to be impossible due to lossy bitmaps. Enhancing the costing would be great, but no specific action was identified in this session. Avoiding Bitmap Index+Heap Scan if Index Only Scan is possible was not ruled out.


Benchmark

The following figures show explain analyze upon decreasing level of "all-visible" in the visibility map. At some point, the planner prefers Bitmap Index+Heap Scan over Index Only Scan. Forcing an Index Only Scan does however still result in about 20% less IO. Further the execution time (all in buffers) is lower.

Test data: Sales table from Use The Index, Luke. Requires creating the employees table first.

Index Only Scan using sales_sub_eur on sales  (cost=0.43..505.02 rows=13748 width=8) (actual time=0.062..11.098 rows=14539 loops=1)
  Index Cond: (subsidiary_id = 20::numeric)
  Heap Fetches: 0
  Buffers: shared hit=70
Total runtime: 13.472 ms


update sales set sale_id = sale_id where sale_id %100 = 0; UPDATE 12129

analyze sales;

Index Only Scan using sales_sub_eur on sales  (cost=0.43..10101.44 rows=14816 width=8) (actual time=0.048..12.894 rows=14539 loops=1)
  Index Cond: (subsidiary_id = 20::numeric)
  Heap Fetches: 2820
  Buffers: shared hit=2890
Total runtime: 14.828 ms


pgcon=# update sales set sale_id = sale_id where sale_id %100 = 50;UPDATE 12130pgcon=# analyze sales;


Index Only Scan using sales_sub_eur on sales  (cost=0.43..15555.23 rows=14200 width=8) (actual time=0.051..15.796 rows=14539 loops=1)
  Index Cond: (subsidiary_id = 20::numeric)
  Heap Fetches: 4524
  Buffers: shared hit=4594
Total runtime: 18.501 ms


pgcon=# update sales set sale_id = sale_id where sale_id %100 = 25;UPDATE 12130pgcon=# analyze sales;


Index Only Scan using sales_sub_eur on sales  (cost=0.43..25644.45 rows=14276 width=8) (actual time=0.060..21.812 rows=14539 loops=1)
  Index Cond: (subsidiary_id = 20::numeric)
  Heap Fetches: 7471
  Buffers: shared hit=7541
Total runtime: 23.975 ms


Bitmap Heap Scan on sales  (cost=404.69..32171.49 rows=15002 width=8) (actual time=15.835..40.853 rows=14539 loops=1)
  Recheck Cond: (subsidiary_id = 20::numeric)
  Buffers: shared hit=12675
  ->  Bitmap Index Scan on sales_sub_eur  (cost=0.00..400.94 rows=15002 width=0) (actual time=10.645..10.645 rows=14566 loops=1)
        Index Cond: (subsidiary_id = 20::numeric)
        Buffers: shared hit=69
Total runtime: 42.862 ms

set enable_bitmapscan = off;


Index Only Scan using sales_sub_eur on sales  (cost=0.43..36972.55 rows=15002 width=8) (actual time=0.066..25.680 rows=14539 loops=1)
  Index Cond: (subsidiary_id = 20::numeric)
  Heap Fetches: 10359
  Buffers: shared hit=10425
Total runtime: 28.209 ms

Personal tools