Aggregate Median
From PostgreSQL wiki
The statistical median is the numerical value separating the higher half of a data sample, a population, or a probability distribution, from the lower half. A median is only defined on ordered one-dimensional data, and is independent of any distance metric.
Contents |
median(numeric)
This snippet is also part of the ulib_agg user-defined library.
CREATE OR REPLACE FUNCTION _final_median(numeric[]) RETURNS numeric AS $$ SELECT AVG(val) FROM ( SELECT val FROM unnest($1) val ORDER BY 1 LIMIT 2 - MOD(array_upper($1, 1), 2) OFFSET CEIL(array_upper($1, 1) / 2.0) - 1 ) sub; $$ LANGUAGE 'sql' IMMUTABLE; CREATE AGGREGATE median(numeric) ( SFUNC=array_append, STYPE=numeric[], FINALFUNC=_final_median, INITCOND='{}' );
Usage
SELECT median(num_value) AS median_value FROM t;
Caution
Unlike mode() and range(), this one will only work with PostgreSQL 8.4 and higher due to the dynamic limit and offset.
NOTE: This version does not take NULL values into consideration, e.g., median('{1,NULL,NULL}') is NULL.
median(anyelement)
Here's a refinement of the above with two innovations: it takes any numerical value via anyelement. It also culls nulls which requires an extra pass so will be a bit slower. Also, I switched it to return float8 which is mainly a cosmetic preference. This snippet is also part of the ulib_agg user-defined library.
CREATE FUNCTION _final_median(anyarray) RETURNS float8 AS $$ WITH q AS ( SELECT val FROM unnest($1) val WHERE VAL IS NOT NULL ORDER BY 1 ), cnt AS ( SELECT COUNT(*) AS c FROM q ) SELECT AVG(val)::float8 FROM ( SELECT val FROM q LIMIT 2 - MOD((SELECT c FROM cnt), 2) OFFSET GREATEST(CEIL((SELECT c FROM cnt) / 2.0) - 1,0) ) q2; $$ LANGUAGE sql IMMUTABLE; CREATE AGGREGATE median(anyelement) ( SFUNC=array_append, STYPE=anyarray, FINALFUNC=_final_median, INITCOND='{}' );
Usage
SELECT median(value) AS median_value FROM t;
Caution
If you are stuck with PostgreSQL 8.3 or lower the following modification to the above median aggregate will work (note you first have to define unnest and create the PL/pgsql language (CREATE LANGUAGE plpgsql;) if you haven't already. Also note more efficient median functions are possible on unsorted arrays, e.g., in O(N) using a selection algorithm vs this O(N lg N) that first comparison sorts the array.
CREATE OR REPLACE FUNCTION final_median(anyarray) RETURNS float8 AS $$ DECLARE cnt INTEGER; BEGIN cnt := (SELECT count(*) FROM unnest($1) val WHERE val IS NOT NULL); RETURN (SELECT avg(tmp.val)::float8 FROM (SELECT val FROM unnest($1) val WHERE val IS NOT NULL ORDER BY 1 LIMIT 2 - MOD(cnt, 2) OFFSET CEIL(cnt/ 2.0) - 1 ) AS tmp ); END $$ LANGUAGE plpgsql; CREATE AGGREGATE median(anyelement) ( SFUNC=array_append, STYPE=anyarray, FINALFUNC=final_median, INITCOND='{}' );
External implementations
There are many compiled versions for "external" median calculus:
See Also
Aggregate Mode Aggregate Median
