Find recent activity
From PostgreSQL wiki
A request seen occasionally is: given a large table representing changes (whether bug comments, stock ticker prices, whatever), return the few most recently active affected objects and their latest few changes.
Obviously if you are tracking the last change time on a per-object basis there are ways to do this quickly, but at the expense of a non-HOT update of the object row for each insert on the changes table. This page shows how to get recent changes efficiently (single-digit milliseconds) from the changes table alone.
This specific example was written for the following requirement: "Given a (large) table prices(stock,price,updated), find the most recent 3 prices for each of the 10 most recently updated stocks". The presence of appropriate indexes (specifically on (updated) and (stock,updated)) is assumed.
This version is suitable for 9.3 onwards:
WITH recursive t1 AS ( (SELECT stock, array[stock] AS seen FROM prices ORDER BY updated DESC LIMIT 1) UNION ALL (SELECT p.stock, t1.seen || p.stock FROM t1, LATERAL (SELECT p1.stock FROM prices p1 WHERE p1.stock <> ALL (t1.seen) ORDER BY p1.updated DESC LIMIT 1) AS p WHERE array_upper(t1.seen,1) < 10) ) SELECT p.* FROM t1, LATERAL (SELECT * FROM prices p2 WHERE p2.stock=t1.stock ORDER BY updated DESC LIMIT 3) p;
This is the original version written for 8.4 onwards:
WITH recursive t1 AS ( (SELECT *, array[stock] AS seen FROM prices ORDER BY updated DESC LIMIT 1) UNION ALL (SELECT (p).*, s.seen || (p).stock FROM (SELECT (SELECT p FROM prices p WHERE p.stock <> ALL (t.seen) ORDER BY p.updated DESC LIMIT 1) AS p, t1.seen FROM t1 WHERE array_upper(t1.seen,1) < 10 offset 0) s ) ), t2 AS ( SELECT stock, price, updated, 1 AS n FROM t1 UNION ALL (SELECT (p).stock, (p).price, (p).updated, s.n+1 FROM (SELECT (SELECT p FROM prices p WHERE p.stock=t2.stock AND p.updated < t2.updated ORDER BY p.updated DESC LIMIT 1) AS p, t2.n FROM t2 WHERE t2.n < 3 offset 0) s ) ) SELECT * FROM t2;
The idea here is to make use of recursion to fetch single rows in the most efficient way available, and then stop when the desired number has been reached. The assumption is that the source table is large compared to the number of rows fetched, and also that the number of distinct stocks is also reasonably large compared to the number fetched, or at least that at least 10 different stocks have updates in the most recent small fraction of the table.
The rather ugly subquery usage turns out to be required to prevent a full scan of the prices table on the final iteration of each recursive branch.
The following variation is for a "return the most recent 5 posts for each topic" query (however in 9.3+ there is a much simpler LATERAL solution without recursion):
WITH recursive rp AS (SELECT (p).*, 1 AS rcount FROM (SELECT (SELECT p FROM posts p WHERE p.topic_id=t.topic_id ORDER BY p.post_ts DESC, p.post_id DESC LIMIT 1) AS p FROM topics t offset 0) s WHERE (p).post_id IS NOT NULL UNION ALL SELECT (p).*, s.rcount + 1 FROM (SELECT (SELECT p FROM posts p WHERE p.topic_id=rp.topic_id AND (p.post_ts,p.post_id) < (rp.post_ts,rp.post_id) ORDER BY p.post_ts DESC, p.post_id DESC LIMIT 1) AS p, rp.rcount FROM rp WHERE rp.rcount < 5 offset 0) s WHERE (p).post_id IS NOT NULL) SELECT * FROM rp;