Slow Counting

From PostgreSQL wiki
Jump to navigationJump to search

Index-only scans have been implemented in Postgres 9.2. providing some performance improvements where the visibility map of the table allows it.

A full count of rows in a table can be comparatively slow in PostgreSQL:

SELECT count(*) FROM tbl;

The reason is related to the MVCC implementation in PostgreSQL. The fact that multiple transactions can see different states of the data means that there can be no straightforward way for "COUNT(*)" to summarize data across the whole table. PostgreSQL must walk through all rows to determine visibility. This normally results in a sequential scan reading information about every row in the table. EXPLAIN ANALYZE reveals what's going on:

postgres=# EXPLAIN ANALYZE SELECT COUNT(*) FROM tbl;
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=4499.00..4499.01 rows=1 width=0) (actual time=465.588..465.591 rows=1 loops=1)
   ->  Seq Scan on tbl  (cost=0.00..4249.00 rows=100000 width=0) (actual time=0.011..239.212 rows=100000 loops=1)
 Total runtime: 465.642 ms
(3 rows)

Worth noting that only the full count must be so pessimistic; if augmented with a "WHERE" clause like:

SELECT COUNT(*) FROM tbl WHERE status = 'something';

PostgreSQL will take advantage of available indexes against the restricted field(s) to limit how many records must be counted. This can greatly accelerate such queries. PostgreSQL will still need to read the resulting rows to verify that they exist. Other database systems may only need to reference the index in this situation.

Estimating the row count

PostgreSQL can query "estimates" or "cached" values for a table's size (much faster). See Count estimate for details.