# Aggregate Median

### From PostgreSQL wiki

The statistical median is the numerical value separating the higher half of a data sample, a population, or a probability distribution, from the lower half. A median is only defined on ordered one-dimensional data, and is independent of any distance metric.

## Contents |

## median(numeric)

This snippet is also part of the ulib_agg user-defined library.

CREATE OR REPLACE FUNCTION _final_median(NUMERIC[]) RETURNS NUMERIC AS $$ SELECT AVG(val) FROM ( SELECT val FROM unnest($1) val ORDER BY 1 LIMIT 2 - MOD(array_upper($1, 1), 2) OFFSET CEIL(array_upper($1, 1) / 2.0) - 1 ) sub; $$ LANGUAGE 'sql' IMMUTABLE; CREATE AGGREGATE median(NUMERIC) ( SFUNC=array_append, STYPE=NUMERIC[], FINALFUNC=_final_median, INITCOND='{}' );

### Usage

SELECT median(num_value) AS median_value FROM t;

### Caution

Unlike `mode()` and `range()`, this one will only work with PostgreSQL 8.4 and higher due to the dynamic limit and offset.

NOTE: This version does not take NULL values into consideration, e.g., median('{1,NULL,NULL}') is NULL.

This implementation is also quite slow. On tables of ~500K elements *orafce* implementation is ~400 times faster.

## median(anyelement)

Here's a refinement of the above with two innovations: it takes any numerical value via anyelement. It also culls nulls which requires an extra pass so will be a bit slower. Also, I switched it to return float8 which is mainly a cosmetic preference. This snippet is also part of the ulib_agg user-defined library.

CREATE FUNCTION _final_median(anyarray) RETURNS float8 AS $$ WITH q AS ( SELECT val FROM unnest($1) val WHERE VAL IS NOT NULL ORDER BY 1 ), cnt AS ( SELECT COUNT(*) AS c FROM q ) SELECT AVG(val)::float8 FROM ( SELECT val FROM q LIMIT 2 - MOD((SELECT c FROM cnt), 2) OFFSET GREATEST(CEIL((SELECT c FROM cnt) / 2.0) - 1,0) ) q2; $$ LANGUAGE SQL IMMUTABLE; CREATE AGGREGATE median(anyelement) ( SFUNC=array_append, STYPE=anyarray, FINALFUNC=_final_median, INITCOND='{}' );

### Usage

SELECT median(VALUE) AS median_value FROM t;

### Caution

If you are stuck with PostgreSQL 8.3 or lower the following modification to the above median aggregate will work (note you first have to define unnest and create the PL/pgsql language (CREATE LANGUAGE plpgsql;) if you haven't already. Also note more efficient median functions are possible on unsorted arrays, e.g., in O(N) using a selection algorithm vs this O(N lg N) that first comparison sorts the array.

CREATE OR REPLACE FUNCTION final_median(anyarray) RETURNS float8 AS $$ DECLARE cnt INTEGER; BEGIN cnt := (SELECT COUNT(*) FROM unnest($1) val WHERE val IS NOT NULL); RETURN (SELECT avg(tmp.val)::float8 FROM (SELECT val FROM unnest($1) val WHERE val IS NOT NULL ORDER BY 1 LIMIT 2 - MOD(cnt, 2) OFFSET CEIL(cnt/ 2.0) - 1 ) AS tmp ); END $$ LANGUAGE plpgsql; CREATE AGGREGATE median(anyelement) ( SFUNC=array_append, STYPE=anyarray, FINALFUNC=final_median, INITCOND='{}' );

## External implementations

There are many compiled versions for "external" *median* calculus:

- C code for libpq:
- PL/R (the
*R*Language is also a statistical package) - orafce has quite fast median implementation. It's packages are also available in Debian/Ubuntu repositories as
*postgresql-x.x-orafce*. - ...

## See Also

Aggregate Mode
**Aggregate Median**