Crosstabview

From PostgreSQL wiki
Jump to navigationJump to search

Introduction

\crosstabview is a psql command included in PostgreSQL 9.6.
It is meant to display query results in a crosstab-like representation, when the structure of the resultset is amenable to such a transformation.

In interactive use, it's an easier alternative to a heavy rewrite of a query just for the purpose of looking at a pivoted representation.
The main advantage of creating the crosstab representation at the display level is that the number and positions of resulting columns in the horizontal header do not have to be pre-computed and listed in the query: they are automatically deduced from the resultset already fetched by psql.

In non-interactive use, the primary choice for producing pivots should probably be based on dynamically generated SQL and the server-side crosstab() function from the tablefunc extension.


The patch and associated discussion can be found at https://commitfest.postgresql.org/8/372/ and https://commitfest.postgresql.org/9/521/

Description

The display is formed by following these rules:

  • the distinct values in a given column (colH) are interpreted as a horizontal header
  • the distinct values in a second given column colV form the vertical header
  • another specified column is projected into cells of the 2D grid formed by the headers: for each tuple `(v,h,C)` present in the resultset, C goes into the cell at the intersection of row "v" and column "h". In the case of several `C` corresponding to one `(v,h)` couple, an error is emitted.

The typical case when this representation is helpful is with 3 output columns: 2 dimensions, and a value that is a function of the two dimensions.


Below we're using a sample view with a few rows and columns to illustrate different uses and options of the command:

create view v_data as 
select * from ( values
   ('v1','h2','foo', '2015-04-01'::date),
   ('v2','h1','bar', '2015-01-02'),
   ('v1','h0','baz', '2015-07-12'),
   ('v0','h4','qux', '2015-07-15')
 ) as l(v,h,c,d);

The simplest possible example on this data would be

 select v,h,c from v_data \crosstabview

which produces:

 v  | h2  | h1  | h0  | h4  
----+-----+-----+-----+-----
 v1 | foo |     | baz | 
 v2 |     | bar |     | 
 v0 |     |     |     | qux


The command displays "v" in the vertical header, "h" in the horizontal header, and "c" into the cells, without any particular sort order.

Many options are available to alter this output, they're described step by step in the next paragraphs.

Effect of the ORDER BY clause

The order of the results are taken into account: for a given source column used as a header, if a value A comes in the results before B, then the header will display A before B.

This is true for both the vertical and horizontal header.

Example 1

select v,h,c from v_data order by 1;

 v  | h  |  c  
----+----+-----
 v0 | h4 | qux
 v1 | h2 | foo
 v1 | h0 | baz
 v2 | h1 | bar

 \crosstabview v h c

 v  | h4  | h2  | h0  | h1  
----+-----+-----+-----+-----
 v0 | qux |     |     | 
 v1 |     | foo | baz | 
 v2 |     |     |     | bar

(In crosstabview arguments, column numbers could also be used instead of column names)

The `v` column appears sorted in the crosstab representation, as it is in the resultset. The order in the horizontal header (considered from left to right) is also the same as in the resultset (from top to bottom)

The next example shows the same data but with a descending order in the query:

Example 2

select v,h,c from v_data order by 1 desc;
 v  | h  |  c  
----+----+-----
 v2 | h1 | bar
 v1 | h2 | foo
 v1 | h0 | baz
 v0 | h4 | qux

\crosstabview v h c
 v  | h1  | h2  | h0  | h4  
----+-----+-----+-----+-----
 v2 | bar |     |     | 
 v1 |     | foo | baz | 
 v0 |     |     |     | qux

The `v` column is displayed sorted again, this time in descending order, as in the resultset. The order in the horizontal header also matches the order of `h` of the resultset.

To have the horizontal header sorted, we can assign the ORDER BY to column `h`, as in the next example.

Example 3

select v,h,c from v_data order by 2 ;
 v  | h  |  c  
----+----+-----
 v1 | h0 | baz
 v2 | h1 | bar
 v1 | h2 | foo
 v0 | h4 | qux

 \crosstabview v h c
 v  | h0  | h1  | h2  | h4  
----+-----+-----+-----+-----
 v1 | baz |     | foo | 
 v2 |     | bar |     | 
 v0 |     |     |     | qux

Now the horizontal header is sorted, but the vertical header is not.

Both headers cannot be each sorted independantly with only one ORDER BY clause.

To achieve independant sort of the horizontal header, the sort information must be added in the query results, in the form of an integer giving the relative position of the value in the horizontal header. Typically this is obtained with a window function such as row_number().

The next paragraph illustrates this.

Independant sort of the horizontal header, with additional column

When the query's ORDER BY is already dedicated to sort on the leftmost column (vertical header), an additional sort column can be added for the horizontal header \crosstabview

Example 4

 select v,h,c,row_number() over(order by h) as hsort from v_data order by 1;

 v  | h  |  c  | hsort 
----+----+-----+-------
 v0 | h4 | qux |     4
 v1 | h0 | baz |     1
 v1 | h2 | foo |     3
 v2 | h1 | bar |     2

\crosstabview v h c hsort

 v  | h0  | h1  | h2  | h4  
----+-----+-----+-----+-----
 v0 |     |     |     | qux
 v1 | baz |     | foo | 
 v2 |     | bar |     | 

Here the vertical header is sorted by way of the ORDER BY clause, and the horizontal header is sorted independantly, so we get the desired result of having both headers sorted.


A reverse sort could be obtained with OVER(ORDER BY h DESC), or by negating the value of the column.

Example 5

select v,h,c,row_number() over(order by h desc) as hsort  from v_data order by 1
  \crosstabview v h c hsort

 v  | h4  | h2  | h1  | h0  
----+-----+-----+-----+-----
 v0 | qux |     |     | 
 v1 |     | foo |     | baz
 v2 |     |     | bar | 

Independant sort on the horizontal header, with a non-synthetic column

It might be that the desired horizontal order is already present in the query, without having to resort to a window function. It is typically the case for months, that we generally want to display by name but sorted by their number in the year. The next example shows the base output that we'd like to improve by sorting months.

Example 6

select v,to_char(d,'Mon') as m, c  from v_data order by 1 
 \crosstabview v m c

 v  | Jul | Apr | Jan 
----+-----+-----+-----
 v0 | qux |     | 
 v1 | baz | foo | 
 v2 |     |     | bar

The desired order would be Jan, Apr, Jul. To obtain it, it makes sense to ORDER BY date, but then we're loosing the vertical sort, as shown in the next example.

Example 7

select v,to_char(d,'Mon') as m, c  from v_data order by d
 \crosstabview v m c

 v  | Jan | Apr | Jul 
----+-----+-----+-----
 v2 | bar |     | 
 v1 |     | foo | baz
 v0 |     |     | qux

To sort in both directions as desired, we may just the month number as the sort value for the horizontal header, and let the primary order being driven by v

Example 8

select v,to_char(d,'Mon') as m, c, extract(month from d) as mnum  from v_data order by v
 \crosstabview v m c mnum

 v  | Jan | Apr | Jul 
----+-----+-----+-----
 v0 |     |     | qux
 v1 |     | foo | baz
 v2 | bar |     | 

Just like previously, it's also easy to sort horizontally in reverse order:

Example 9

select v,to_char(d,'Mon') as m, c, -1*extract(month from d) as revnum from v_data order by v
 \crosstabview v m c revnum

 v  | Jul | Apr | Jan 
----+-----+-----+-----
 v0 | qux |     | 
 v1 | baz | foo | 
 v2 |     |     | bar

crosstabview with only 2 output columns

Sometimes it makes sense to visualize in a 2D grid what are the relationships between all values in the first column and all values in the second column, when the meaning of a (V,H) tuple is that V and H are related.

In this case, we need to add an 'X' or a similar character (unicode checkmarks can look nicer) as a 3rd column.

For instance, this sample query helps visualizing the user/group memberships in postgres roles.

Example 10

=# SELECT r.rolname as username,r1.rolname as groupname, 'X'
FROM pg_catalog.pg_roles r LEFT JOIN pg_catalog.pg_auth_members m
ON (m.member = r.oid)
LEFT JOIN pg_roles r1 ON (m.roleid=r1.oid)
WHERE r.rolcanlogin
ORDER BY 1;
  username  | groupname | ?column? 
------------+-----------+----------
 FooBar     | mailusers | X
 daniel     | mailusers | X
 drupal     |           | X
 dv         | admin     | X
 dv         | common    | X
 extc       | readonly  | X
 extu       |           | X
 joel       |           | X
 mailreader | readonly  | X
 manitou    | mailusers | X
 manitou    | admin     | X
 postgres   |           | X
 roundcube  |           | X
 u1         | common    | X
 u2         | mailusers | X
 zaz        | mailusers | X
(16 rows)


# \crosstabview
  username  | mailusers |   | admin | common | readonly 
------------+-----------+---+-------+--------+----------
 FooBar     | X         |   |       |        | 
 daniel     | X         |   |       |        | 
 drupal     |           | X |       |        | 
 dv         |           |   | X     | X      | 
 extc       |           |   |       |        | X
 extu       |           | X |       |        | 
 joel       |           | X |       |        | 
 mailreader |           |   |       |        | X
 manitou    | X         |   | X     |        | 
 postgres   |           | X |       |        | 
 roundcube  |           | X |       |        | 
 u1         |           |   |       | X      | 
 u2         | X         |   |       |        | 
 zaz        | X         |   |       |        | 
(14 rows)

An empty column is present in the horizontal header to represent the NULL values for the "group" column, corresponding to users not being member of any group.