Aggregate Mode
The statistical mode is the value that appears most often in a set of values.
Postgres 9.4 has built-in aggregate function mode()
The new ordered-set aggregate function supersedes the custom aggregate function below. More importantly, it also conflicts with it. To install it anyway, use a different name. Functionality is (almost) identical, but the new built-in function is much faster.
Subtle differences:
- Like most built-in aggregate functions, NULL values are ignored. If the most common value is NULL, built-in mode() returns the second most common value.
- The built-in mode() does not return an error if the expression is NULL in all rows. Returns NULL instead.
Usage
New syntax is different:
SELECT mode() WITHIN GROUP (ORDER BY some_value) AS modal_value FROM tbl;
See also
- Ordered-set aggregate functions in the current manual (including mode())
- Depesz review of ordered-set aggregate functions
- Stackoverflow post demonstrating confict
mode() for Postgres 9.3 or earlier (superseded in 9.4)
PostgreSQL makes it easy to add custom aggregate functions. This snippet is also part of the ulib_agg user-defined library.
Conceptually the process will be to gather each value into an array. Then once all values are in the array, we will run a function to find the most common value in our array. Then you need to create a function to find the most common value in the array.
CREATE OR REPLACE FUNCTION _final_mode(anyarray)
RETURNS anyelement AS
$BODY$
SELECT a
FROM unnest($1) a
GROUP BY 1
ORDER BY COUNT(1) DESC, 1
LIMIT 1;
$BODY$
LANGUAGE sql IMMUTABLE;
-- Tell Postgres how to use our aggregate
CREATE AGGREGATE mode(anyelement) (
SFUNC=array_append, --Function to call for each row. Just builds the array
STYPE=anyarray,
FINALFUNC=_final_mode, --Function to call after everything has been added to array
INITCOND='{}' --Initialize an empty array when starting
);
Usage
SELECT mode(some_value) AS modal_value FROM t;
Caution
Returns error if the argument is a column of NULL values.
If you are on PostgreSQL 8.3 or below you will need to add the unnest() function to convert an array to a set of rows.
See also
- Aggregate Range
- Aggregate Median
- most_common_vals() pg_stats function