Find recent activity

From PostgreSQL wiki

Revision as of 07:19, 17 March 2012 by Rhodiumtoad (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Performance Snippets

Find recent activity

Works with PostgreSQL

>=8.4

Written in

SQL

Depends on

Nothing


A request seen occasionally is: given a large table representing changes (whether bug comments, stock ticker prices, whatever), return the few most recently active affected objects and their latest few changes.

Obviously if you are tracking the last change time on a per-object basis there are ways to do this quickly, but at the expense of a non-HOT update of the object row for each insert on the changes table. This page shows how to get recent changes efficiently (single-digit milliseconds) from the changes table alone.

This specific example was written for the following requirement: "Given a (large) table prices(stock,price,updated), find the most recent 3 prices for each of the 10 most recently updated stocks". The presence of appropriate indexes is assumed.

WITH recursive
  t1 AS ( (SELECT *, array[stock] AS seen
             FROM prices
            ORDER BY updated DESC LIMIT 1)
          UNION ALL 
          (SELECT (p).*, s.seen || (p).stock
             FROM (SELECT (SELECT p FROM prices p
                            WHERE p.stock <> ALL (t.seen)
                            ORDER BY p.updated DESC LIMIT 1) AS p,
                          t1.seen
                     FROM t1
                    WHERE array_upper(t1.seen,1) < 10 offset 0) s
          )
        ),
  t2 AS ( SELECT stock, price, updated, 1 AS n
            FROM t1
          UNION ALL
          (SELECT (p).stock, (p).price, (p).updated, s.n+1
             FROM (SELECT (SELECT p FROM prices p
                            WHERE p.stock=t2.stock AND p.updated < t2.updated
                            ORDER BY p.updated DESC LIMIT 1) AS p,
                          t2.n
                     FROM t2
                    WHERE t2.n < 3 offset 0) s
          )
        )
SELECT * FROM t2;

The idea here is to make use of recursion to fetch single rows in the most efficient way available, and then stop when the desired number has been reached. The assumption is that the source table is large compared to the number of rows fetched, and also that the number of distinct stocks is also reasonably large compared to the number fetched, or at least that at least 10 different stocks have updates in the most recent small fraction of the table.

The rather ugly subquery usage turns out to be required to prevent a full scan of the prices table on the final iteration of each recursive branch.

The following variation is for a "return the most recent 5 posts for each topic" query:

WITH recursive
  rp AS (SELECT (p).*, 1 AS rcount
           FROM (SELECT (SELECT p FROM posts p
                          WHERE p.topic_id=t.topic_id
                          ORDER BY p.post_ts DESC, p.post_id DESC LIMIT 1) AS p
                   FROM topics t offset 0) s
                  WHERE (p).post_id IS NOT NULL
         UNION ALL
         SELECT (p).*, s.rcount + 1
           FROM (SELECT (SELECT p FROM posts p
                          WHERE p.topic_id=rp.topic_id
                            AND (p.post_ts,p.post_id) < (rp.post_ts,rp.post_id)
                          ORDER BY p.post_ts DESC, p.post_id DESC LIMIT 1) AS p,
                        rp.rcount
                   FROM rp
                  WHERE rp.rcount < 5 offset 0) s
          WHERE (p).post_id IS NOT NULL)
SELECT * FROM rp;
Personal tools