User:Rhodiumtoad/CTE

From PostgreSQL wiki
Jump to navigationJump to search

Scratch page for random scribbling.

CTE bugs

none known

CTE documentation

this is just my best guess at what's going on

There are two aspects to this patch: the WITH syntax, and the actual use of recursion in WITH RECURSIVE queries.

WITH syntax

Currently, the patch appears to simply convert

WITH cte(cols) AS (query) SELECT ... FROM cte

into

SELECT ... FROM (query) AS cte

without making any other changes to the query. Specifically, the CTE is substituted inline every time it appears.

WITH RECURSIVE

Given

WITH RECURSIVE cte(cols) AS (query)

the table name "cte" is in scope for the query body, and therefore the query can recursively refer to its own output.

The patch as it stands supports this only for queries of the following form:

WITH RECURSIVE cte(cols) AS (non_recursive_query UNION ALL recursive_query)

where non_recursive_query is any query (including other set operations) that does not refer to "cte", and recursive_query is a query (not including set operations) that refers to "cte".

There are a number of limitations on the form of recursive_query:

  • references to "cte" must not appear on the nullable side of an outer join
  • no references to recursive CTEs other than the current one (i.e. no mutual recursion)

Planning and execution of recursive queries

The execution of recursive queries involves two new plan nodes:

  • Recursion (displayed as "Recursion on cte")
  • RecursiveScan (displayed as "Recursive Scan on cte")

A Recursion node marks the top of a recursive subplan. It has one child node, which is always an Append node; all subplans of the Append except the last are non-recursive, while the last subplan of the Append is the recursive part.

The RecursiveScan node can only appear inside the recursive subplan of a Recursion node. The fact that mutual recursion is not permitted means that any RecursiveScan node always refers to the nearest parent Recursion node

Execution of the Recursion node proceeds as follows. Initially, tuples are returned from the non-recursive subplans in order. These tuples, in addition to being returned to the caller, are stored in a working tuplestore. Once the non-recursive subplans are exhausted, the recursive subplan is run, with the result tuples stored in an intermediate tuplestore; if the recursive subplan returns any tuples, then the working and intermediate tuplestore are swapped, the new intermediate tuplestore is cleared, and the recursive subplan run again. This continues until no new tuples are returned.

Looked at another way, at a given recursion depth N, the working tuplestore contains those tuples produced by recursion level N-1, and the intermediate tuplestore contains those tuples produced so far by recursion level N.

Within the recursive subplan, a RecursiveScan node returns the result of scanning the current working tuplestore, i.e. it returns all tuples produced by the previous recursion level.