# Aggregate Histogram

### From PostgreSQL wiki

A histogram represents the distribution of a set of values.

## Contents |

## histogram()

Aggregate Histogram

Works with PostgreSQL

Any version

Written in

PLPGSQL

Depends on

Nothing

We write a function which runs width_bucket on each value to be aggregated and uses that to increment the corresponding bucket. The state is stored as an array of integers.

CREATE OR REPLACE FUNCTION hist_sfunc (state INTEGER[], val REAL, min REAL, max REAL, nbuckets INTEGER) RETURNS INTEGER[] AS $$ DECLARE bucket INTEGER; i INTEGER; BEGIN -- width_bucket uses nbuckets + 1 (!) and starts at 1. bucket := width_bucket(val, min, max, nbuckets - 1) - 1; -- Init the array with the correct number of 0's so the caller doesn't see NULLs IF state[0] IS NULL THEN FOR i IN SELECT * FROM generate_series(0, nbuckets - 1) LOOP state[i] := 0; END LOOP; END IF; state[bucket] = state[bucket] + 1; RETURN state; END; $$ LANGUAGE plpgsql IMMUTABLE; -- Tell Postgres how to use the new function DROP AGGREGATE IF EXISTS histogram (REAL, REAL, REAL, INTEGER); CREATE AGGREGATE histogram (REAL, REAL, REAL, INTEGER) ( SFUNC = hist_sfunc, STYPE = INTEGER[] );

### Usage

Create a 20 bucket histogram of phases from each host, since the data is in degrees we use 0-360 as the limits.

SELECT host, count(*), histogram(phase, 0.0, 360.0, 20) FROM t GROUP BY host ORDER BY host;

### Caution

Returns error if the argument is a column of NULL values. Only tested with Postgres 9.2.