Find recent activity
From PostgreSQL wiki
A request seen occasionally is: given a large table representing changes (whether bug comments, stock ticker prices, whatever), return the few most recently active affected objects and their latest few changes.
Obviously if you are tracking the last change time on a per-object basis there are ways to do this quickly, but at the expense of a non-HOT update of the object row for each insert on the changes table. This page shows how to get recent changes efficiently (single-digit milliseconds) from the changes table alone.
This specific example was written for the following requirement: "Given a (large) table prices(stock,price,updated), find the most recent 3 prices for each of the 10 most recently updated stocks". The presence of appropriate indexes is assumed.
WITH recursive t1 AS ( (SELECT *, array[stock] AS seen FROM prices ORDER BY updated DESC LIMIT 1) UNION ALL (SELECT (p).*, s.seen || (p).stock FROM (SELECT (SELECT p FROM prices p WHERE p.stock <> ALL (t.seen) ORDER BY p.updated DESC LIMIT 1) AS p, t1.seen FROM t1 WHERE array_upper(t1.seen,1) < 10 offset 0) s ) ), t2 AS ( SELECT stock, price, updated, 1 AS n FROM t1 UNION ALL (SELECT (p).stock, (p).price, (p).updated, s.n+1 FROM (SELECT (SELECT p FROM prices p WHERE p.stock=t2.stock AND p.updated < t2.updated ORDER BY p.updated DESC LIMIT 1) AS p, t2.n FROM t2 WHERE t2.n < 3 offset 0) s ) ) SELECT * FROM t2;
The idea here is to make use of recursion to fetch single rows in the most efficient way available, and then stop when the desired number has been reached. The assumption is that the source table is large compared to the number of rows fetched, and also that the number of distinct stocks is also reasonably large compared to the number fetched, or at least that at least 10 different stocks have updates in the most recent small fraction of the table.
The rather ugly subquery usage turns out to be required to prevent a full scan of the prices table on the final iteration of each recursive branch.
The following variation is for a "return the most recent 5 posts for each topic" query:
WITH recursive rp AS (SELECT (p).*, 1 AS rcount FROM (SELECT (SELECT p FROM posts p WHERE p.topic_id=t.topic_id ORDER BY p.post_ts DESC, p.post_id DESC LIMIT 1) AS p FROM topics t offset 0) s WHERE (p).post_id IS NOT NULL UNION ALL SELECT (p).*, s.rcount + 1 FROM (SELECT (SELECT p FROM posts p WHERE p.topic_id=rp.topic_id AND (p.post_ts,p.post_id) < (rp.post_ts,rp.post_id) ORDER BY p.post_ts DESC, p.post_id DESC LIMIT 1) AS p, rp.rcount FROM rp WHERE rp.rcount < 5 offset 0) s WHERE (p).post_id IS NOT NULL) SELECT * FROM rp;