From PostgreSQL wiki
Selecting Distinct Values
The term "loose indexscan" is used in some other databases for the operation of using a btree index to retrieve the distinct values of a column efficiently; rather than scanning all equal values of a key, as soon as a new value is found, restart the search by looking for a larger value. This is much faster when the index has many equal keys.
Postgres does not support loose indexscans natively, but they can be emulated using a recursive CTE as follows:
WITH RECURSIVE t AS (SELECT MIN(col) AS col FROM tablename UNION ALL SELECT (SELECT MIN(col) FROM tablename WHERE col > t.col) FROM t WHERE t.col IS NOT NULL) SELECT col FROM t WHERE col IS NOT NULL UNION ALL SELECT NULL WHERE EXISTS(SELECT * FROM tablename WHERE col IS NULL);
The recursive part of the query processes one new row in each pass, finding the smallest column value greater than that on the previous pass; the main query adds in a single null result if needed. The above query is therefore equivalent to
SELECT DISTINCT col FROM tablename;
but runs orders of magnitude faster when "col" has only a proportionally small number of distinct values.
Making use of a non-leading column of an index
The same technique can be used to allow a query to benefit from an index which has a leading column with few distinct values (and which would not naturally be specified in the query), and a 2nd column which is highly specific and is used by the query. The leading column can be introduced into the query in a way which does not alter the meaning.
To set up the example:
CREATE TABLE tablename (col INTEGER NOT NULL, col2 INTEGER NOT NULL); INSERT INTO tablename SELECT FLOOR(random()*5) AS col, (random()*1000000)::INTEGER AS col2 FROM generate_series(1,10000000); CREATE INDEX ON tablename (col,col2 );
The natural specification of the query, which cannot use the index and takes 1758 ms:
SELECT COUNT(*) FROM tablename WHERE col2=9854 ;
The modified version which can benefit from the index, and runs in 2 ms:
SELECT COUNT(*) FROM tablename WHERE col2=9854 AND col IN ( WITH RECURSIVE t AS (SELECT MIN(col) AS col FROM tablename UNION ALL SELECT (SELECT MIN(col) FROM tablename WHERE col > t.col) FROM t WHERE t.col IS NOT NULL) SELECT col FROM t );
Writing a correct version of this query would be much more complex if col could be NULL