Count estimate

From PostgreSQL wiki

Jump to: navigation, search

Snippets

count_estimate function

Works with PostgreSQL

Any version

Written in

sql

Depends on

Nothing

Authors: Michael Fuhr, Erwin Brandstetter
Comment by: Emanuel Calvo Franco


The basic SQL standard query to count the rows in a table is:

SELECT COUNT(*) FROM tbl;

This can be rather slow because PostgreSQL has to check visibility for all rows, due to the MVCC model. (There have been improvements in PostgreSQL 9.2.)


If you don't need an exact count, the current statistic from the catalog table pg_class might be good enough and is much faster to retrieve for big tables.

SELECT reltuples AS approximate_row_count FROM pg_class WHERE relname = 'tbl';
 
 approximate_row_count
-----------
       100

Tables named "tbl" can live in multiple schemas of a database, in which case you get multiple rows for this query. To avoid ambiguity:

SELECT * FROM pg_class WHERE oid = 'schema_name.tbl'::regclass;

Also if you have more than 2B rows you'll need to cast it, like this, or results will be weirdly truncated (numbers like 1.8263)

SELECT reltuples::BIGINT AS estimate FROM pg_class WHERE relname='schema_name.tbl';

Consider the manual of PostgreSQL 9.2 on pg_class.reltuples:

Number of rows in the table.
This is only an estimate used by the planner. It is updated by VACUUM, ANALYZE, and a few DDL commands such as CREATE INDEX.

If you didn't ANALYZE recently (after last changes), the estimates can be off more or less.
If you are running the autovacuum daemon as is the default for modern PostgreSQL, ANALYZE is run automatically, too (except for temporary tables which need manual attention). So the estimates should be good unless you had major changes very recently.


For more sophisticated queries (other than counting all rows from a table), or if you cannot SELECT from the catalog table pg_class (which every user can by default), consider this plpgsql function by Michael Fuhr that collects information from EXPLAIN output:

CREATE FUNCTION count_estimate(query text) RETURNS INTEGER AS
$func$
DECLARE
    rec   record;
    ROWS  INTEGER;
BEGIN
    FOR rec IN EXECUTE 'EXPLAIN ' || query LOOP
        ROWS := SUBSTRING(rec."QUERY PLAN" FROM ' rows=([[:digit:]]+)');
        EXIT WHEN ROWS IS NOT NULL;
    END LOOP;
 
    RETURN ROWS;
END
$func$ LANGUAGE plpgsql;

Demo:

CREATE TEMP TABLE tbl AS SELECT * FROM generate_series(1, 1000) AS t;
ANALYZE tbl;

SELECT count_estimate('SELECT * FROM tbl WHERE t < 100');

 count_estimate 
----------------
            100

EXPLAIN SELECT * FROM tbl WHERE t < 100;

                      QUERY PLAN
------------------------------------------------------
 Seq Scan on tbl  (cost=0.00..35.00 rows=100 width=4)
   Filter: (t < 100)

As you can see, it's an estimate - actual count would be 99.


Another popular approach is to use a trigger-based mechanism to count the rows in the table. One or both of these techniques are covered in the following:

Other ways exist to speed up count distinct, etc. see stackoverflow

Personal tools