Slow Counting

From PostgreSQL wiki

(Difference between revisions)
Jump to: navigation, search
(Note about index-only scans in 9.2)
(use the same neutral table name in all examples, add query for reltuples that avoids ambiguity, add info and link about autovacuum)
Line 5: Line 5:
  
 
<code><pre>
 
<code><pre>
SELECT COUNT(*) FROM table
+
SELECT COUNT(*) FROM tbl
 
</pre></code>
 
</pre></code>
  
Line 11: Line 11:
  
 
<code><pre>
 
<code><pre>
postgres=# EXPLAIN ANALYZE SELECT COUNT(*) FROM accounts;
+
postgres=# EXPLAIN ANALYZE SELECT COUNT(*) FROM tbl;
 
                                                       QUERY PLAN                                                       
 
                                                       QUERY PLAN                                                       
 
-----------------------------------------------------------------------------------------------------------------------
 
-----------------------------------------------------------------------------------------------------------------------
 
  Aggregate  (cost=4499.00..4499.01 rows=1 width=0) (actual time=465.588..465.591 rows=1 loops=1)
 
  Aggregate  (cost=4499.00..4499.01 rows=1 width=0) (actual time=465.588..465.591 rows=1 loops=1)
   ->  Seq Scan on accounts (cost=0.00..4249.00 rows=100000 width=0) (actual time=0.011..239.212 rows=100000 loops=1)
+
   ->  Seq Scan on tbl (cost=0.00..4249.00 rows=100000 width=0) (actual time=0.011..239.212 rows=100000 loops=1)
 
  Total runtime: 465.642 ms
 
  Total runtime: 465.642 ms
 
(3 rows)
 
(3 rows)
Line 23: Line 23:
  
 
<code><pre>
 
<code><pre>
SELECT COUNT(*) FROM table WHERE status = 'something'
+
SELECT COUNT(*) FROM tbl WHERE status = 'something'
 
</pre></code>
 
</pre></code>
  
Line 33: Line 33:
  
 
<code><pre>
 
<code><pre>
pgbench=# select reltuples from pg_class where relname='tellers';
+
pgbench=# select reltuples from pg_class where relname = 'tbl';
 
  reltuples  
 
  reltuples  
 
-----------
 
-----------
Line 39: Line 39:
 
</pre></code>
 
</pre></code>
  
This presumes you have been running ANALYZE on the table enough to keep these statistics up to date.
+
Or, to avoid ambiguity, since tables with the same name can exist in multiple schemas in a database:
 +
<code><pre>select reltuples from pg_class where oid = 'my_schema.tbl'::regclass;</pre></code>
 +
 
 +
This presumes you have been running ANALYZE on the table enough to keep these statistics up to date. If you have [http://www.postgresql.org/docs/current/interactive/routine-vacuuming.html#AUTOVACUUM autovacuum] on like you probably should, ANALYZE is run automatically, too.
  
 
Another popular approach is to use a trigger-based mechanism to count the rows in the table. One or both of these techniques are covered in the following:
 
Another popular approach is to use a trigger-based mechanism to count the rows in the table. One or both of these techniques are covered in the following:

Revision as of 22:06, 28 January 2013

Note that the following article only applies to versions of PostgreSQL prior to 9.2. Index-only scans are now implemented.

One operation that PostgreSQL is known to be slow performing is doing a full count of rows in a table, typically using this SQL:

SELECT COUNT(*) FROM tbl

The reason why this is slow is related to the MVCC implementation in PostgreSQL. The fact that multiple transactions can see different states of the data means that there can be no straightforward way for "COUNT(*)" to summarize data across the whole table; PostgreSQL must walk through all rows, in some sense. This normally results in a sequential scan reading information about every row in the table. A good way to see what is going on with your query is to use EXPLAIN ANALYZE:

postgres=# EXPLAIN ANALYZE SELECT COUNT(*) FROM tbl;
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=4499.00..4499.01 rows=1 width=0) (actual time=465.588..465.591 rows=1 loops=1)
   ->  Seq Scan on tbl  (cost=0.00..4249.00 rows=100000 width=0) (actual time=0.011..239.212 rows=100000 loops=1)
 Total runtime: 465.642 ms
(3 rows)

It is worth observing that it is only this precise form of aggregate that must be so pessimistic; if augmented with a "WHERE" clause like

SELECT COUNT(*) FROM tbl WHERE status = 'something'

PostgreSQL will take advantage of available indexes against the restricted field(s) to limit how many records must be counted, which can greatly accelerate such queries. PostgreSQL will still need to read the resulting rows to verify that they exist; other database systems may only need to reference the index in this situation.

Estimating the row count

One PostgreSQL alternative when only an approximate count is needed is to use the reltuples field from the pg_class catalog table:

pgbench=# select reltuples from pg_class where relname = 'tbl';
 reltuples 
-----------
       250

Or, to avoid ambiguity, since tables with the same name can exist in multiple schemas in a database:

select reltuples from pg_class where oid = 'my_schema.tbl'::regclass;

This presumes you have been running ANALYZE on the table enough to keep these statistics up to date. If you have autovacuum on like you probably should, ANALYZE is run automatically, too.

Another popular approach is to use a trigger-based mechanism to count the rows in the table. One or both of these techniques are covered in the following:

Personal tools