Loose indexscan

Works with PostgreSQL

8.4

Written in

SQL

Depends on

Nothing

"Loose indexscan" in MySQL and Cockroach, and "Hybrid Scan" in YugabyteDB are names used for an operation that finds distinct values of the leading columns of a btree index efficiently; rather than scanning all equal values of a key, as soon as a new value is found, restart the search by looking for a larger value. This is much faster when the index has many equal keys.

Loose Indexscan versus Index Skip Scan

Although they apply similar techniques, Loose Indexscans and Index Skip Scans describe two different optimizations. Below a comparison table.

Comparison between Loose Indexscan and Index Skip Scan
	Loose Indexscan	Index Skip Scan
Returns	Results for grouping sets, often DISTINCT values	All matching rows for a predicate, can easily return more than one rows with same value
Limitations	Requires the use of a prefix of columns in the index as grouping key	Works on any selection of columns in the index, with filters on any subset of columns of the index
Information for skipping values must be provided	During scan execution, or (with limitations) before query execution	Before scan execution
Example	SELECT DISTINCT user_id FROM orders; -- common SELECT user_id FROM orders GROUP BY user_id HAVING count(*) > 10; -- Uncommon / nonexistent	CREATE INDEX ON orderlines (order_id, product_id); SELECT order_id, product_option FROM orderlines WHERE product_id IN (2, 11, 23);

Vendor support for index scan type
DB Engine	Loose Indexscan	Index Skip Scan
PostgreSQL	No	No
MySQL	Yes, with limitations	Yes
Oracle		Yes
DB2		Yes
SQLite		Yes
CockroachDB	Yes, with limitations	No
YugabyteDB	Yes, with limitations	Yes

Selecting Distinct Values

Postgres does not support Loose Index Scan natively (though plans are afoot), but they can be emulated using a recursive CTE as follows:

WITH RECURSIVE t AS (
   SELECT min(col) AS col FROM tbl
   UNION ALL
   SELECT (SELECT min(col) FROM tbl WHERE col > t.col)
   FROM t WHERE t.col IS NOT NULL
   )
SELECT col FROM t WHERE col IS NOT NULL
UNION ALL
SELECT null WHERE EXISTS(SELECT 1 FROM tbl WHERE col IS NULL);

The recursive part of the query processes one new row in each pass, finding the smallest column value greater than that on the previous pass; the main query adds in a single null result if needed. The above query is therefore equivalent to

SELECT DISTINCT col FROM tbl;

but runs orders of magnitude faster when "col" has only a proportionally small number of distinct values.

If `col` IS NOT NULL

If col is defined NOT NULL the query can be simplified.
Also using alternative syntax with ORDER BY / LIMIT 1, which typically results in a simpler (and faster) query plan:

WITH RECURSIVE t AS (
   (SELECT col FROM tbl ORDER BY col LIMIT 1)  -- parentheses required
   UNION ALL
   SELECT (SELECT col FROM tbl WHERE col > t.col ORDER BY col LIMIT 1)
   FROM t
   WHERE t.col IS NOT NULL
   )
SELECT col FROM t WHERE col IS NOT NULL;

SQL fiddle (Postgres 9.3) demonstrating both.

Making use of a non-leading column of an index

The same technique can be used to allow a query to benefit from an index which has a leading column with few distinct values (and which would not naturally be specified in the query), and a 2nd column which is highly specific and is used by the query. The leading column can be introduced into the query in a way which does not alter the meaning.

To set up the example:

CREATE TABLE tbl (col integer NOT NULL, col2 integer NOT NULL);
INSERT INTO tbl
SELECT trunc(random()*5) AS col, (random()*1000000)::int AS col2 FROM generate_series(1,10000000);
CREATE INDEX ON tbl (col, col2);

The natural specification of the query, which typically doesn't use the index takes 1758 ms:

SELECT count(*) FROM tbl WHERE col2 = 9854;

The modified version can benefit from the index and runs in 2 ms:

SELECT count(*) FROM tbl WHERE col2 = 9854 AND col IN 
(
  WITH RECURSIVE
     t AS (SELECT min(col) AS col FROM tbl
           UNION ALL
           SELECT (SELECT min(col) FROM tbl WHERE col > t.col) FROM t WHERE t.col IS NOT NULL)
  SELECT col FROM t
);

A correct version of this query would be much more complex if col could be NULL.

Loose indexscan

Contents

Loose Indexscan versus Index Skip Scan

Selecting Distinct Values

If `col` IS NOT NULL

Making use of a non-leading column of an index

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools

Loose indexscan

Loose Indexscan versus Index Skip Scan

Selecting Distinct Values

If col IS NOT NULL

Making use of a non-leading column of an index

Navigation menu

Search

If `col` IS NOT NULL