Path Keys Tracking

From PostgreSQL wiki
Jump to: navigation, search

This page tracks a new concept of tracking and using an existing order that is present in a table, materialized view or set returning function.

Existing Order:

Consider a set returning function like generate_series. The function has an implicit order in which the result is returned.

Consider the following query:

SELECT generate_series(1,10);

generate_series

              1
              2
              3
              4
              5
              6
              7
              8
              9
             10

(10 rows)

Now, consider the following query:

SELECT generate_series(1,10) AS c ORDER BY c;

c

 1
 2
 3
 4
 5
 6
 7
 8
 9
10

(10 rows)

Both the results are same since generate_series returns result implicitly ordered by its only result column.

The query plan for the second query is as following:

EXPLAIN ANALYZE SELECT generate_series(1,100) AS c ORDER BY c;

                                         QUERY PLAN

Sort  (cost=54.84..57.34 rows=1000 width=0) (actual time=0.076..0.081 rows=100 loops=1)
  Sort Key: (generate_series(1, 100))
  Sort Method: quicksort  Memory: 29kB
  ->  Result  (cost=0.00..5.01 rows=1000 width=0) (actual time=0.016..0.026 rows=100 loops=1)
Planning time: 0.177 ms
Execution time: 0.149 ms

(6 rows)

A Sort operation is present in the query plan for the ORDER BY execution.

Consider the following query:

SELECT * FROM vec;

id  |  a  |  b

+-----+-----

 10 |  15 |  17
100 |  90 |  95
250 | 450 | 750
 95 |  24 |  50
 83 |  87 |  95
 75 |  77 | 205
 64 |  78 | 405
 54 |  23 |  88
 45 |  95 |  87
 98 |  95 |  82

(10 rows)

CREATE MATERIALIZED VIEW test_matview2 AS SELECT id,a,b FROM vec ORDER BY b; SELECT 10

SELECT * FROM test_matview2;

id  |  a  |  b

+-----+-----

 10 |  15 |  17
 95 |  24 |  50
 98 |  95 |  82
 45 |  95 |  87
 54 |  23 |  88
 83 |  87 |  95
100 |  90 |  95
 75 |  77 | 205
 64 |  78 | 405
250 | 450 | 750

(10 rows)

Now consider the following query:

SELECT * FROM test_matview2 ORDER BY b;

id  |  a  |  b

+-----+-----

 10 |  15 |  17
 95 |  24 |  50
 98 |  95 |  82
 45 |  95 |  87
 54 |  23 |  88
 83 |  87 |  95
100 |  90 |  95
 75 |  77 | 205
 64 |  78 | 405
250 | 450 | 750

(10 rows)

The query plan for the above query is:

EXPLAIN ANALYZE SELECT * FROM test_matview2 ORDER BY b;

                                                   QUERY PLAN

Sort  (cost=135.34..140.19 rows=1940 width=12) (actual time=0.016..0.017 rows=10 loops=1)
  Sort Key: b
  Sort Method: quicksort  Memory: 25kB
  ->  Seq Scan on test_matview2  (cost=0.00..29.40 rows=1940 width=12) (actual time=0.006..0.007 rows=10 loops=1)
Planning time: 0.044 ms
Execution time: 0.034 ms

(6 rows)

A Sort operation is present in query plan for ORDER BY execution.

The Sort operation can be eliminated if there is functionality for tracking pathkeys and order already present in the entity and using them in the query planner. For eg, we can track that test_matview2's source query has ORDER BY b hence the data input into test_matview2 materialized view is sorted on b. Hence a SELECT * on the materialized view which has ORDER BY b or an operation which involves a Sort operation including b (ORDER BY b, c, d or a GroupAgg involving b) can avoid sorting on b and assume that it is pre sorted.

The same is applicable for SRFs having an implicit order in return order like generate_series. If we know the preorder for the result, we can avoid the ORDER BY as given in above example queries.

Implementation Methods:

The approaches that can be taken for this implementation are:

1) Parser approach:

The first step shall be to add a column in pg_proc and pg_class which maintains the column(s) which have a preorder present in either the underlying data or the return order of a SRF. The populating of the column shall be when a materialized view is created (by using the AS query given). For SRF we can define it implicitly or give it at time of definition of SRF.

The parser approach involves seeing which columns are present in the sortclause of the query and seeing if any of them is present in preorder keys of the matview/SRF. If this is true, then we can remove the column from the parse tree itself and behave as if it never existed.

2) Query Planner approach:

The approach shall have the catalog changes as mentioned above. This approach shall involve looking up preorder keys in parser stage and populating them in parse tree and when planning, match sortclause with preorder keys and not add TLE resjunk entries for them (this may involve not having a Sort plan node at all as in the above example).