Aggregate Histogram

From PostgreSQL wiki

Jump to: navigation, search

A histogram represents the distribution of a set of values.

Contents

histogram()

Snippets

Aggregate Histogram

Works with PostgreSQL

Any version

Written in

PLPGSQL

Depends on

Nothing

We write a function which runs width_bucket on each value to be aggregated and uses that to increment the corresponding bucket. The state is stored as an array of integers.

CREATE OR REPLACE FUNCTION hist_sfunc (state INTEGER[], val REAL, min REAL, max REAL, nbuckets INTEGER) RETURNS INTEGER[] AS $$
DECLARE
  bucket INTEGER;
  i INTEGER;
BEGIN
  -- width_bucket uses nbuckets + 1 (!) and starts at 1.
  bucket := width_bucket(val, min, max, nbuckets - 1) - 1;
 
  -- Init the array with the correct number of 0's so the caller doesn't see NULLs
  IF state[0] IS NULL THEN
    FOR i IN SELECT * FROM generate_series(0, nbuckets - 1) LOOP
      state[i] := 0;
    END LOOP;
  END IF;
 
  state[bucket] = state[bucket] + 1;
 
  RETURN state;
END;
$$ LANGUAGE plpgsql IMMUTABLE;
 
-- Tell Postgres how to use the new function
DROP AGGREGATE IF EXISTS histogram (REAL, REAL, REAL, INTEGER);
CREATE AGGREGATE histogram (REAL, REAL, REAL, INTEGER) (
       SFUNC = hist_sfunc,
       STYPE = INTEGER[]
);

Usage

Create a 20 bucket histogram of phases from each host, since the data is in degrees we use 0-360 as the limits.

SELECT host, count(*), histogram(phase, 0.0, 360.0, 20) FROM t GROUP BY host ORDER BY host;

Caution

Returns error if the argument is a column of NULL values. Only tested with Postgres 9.2.

See also

Personal tools