MEDIAN
Definition
There is no MEDIAN
function in BigQuery, but it can be calculated using the PERCENTILE_CONT
function.
Syntax
PERCENTILE_CONT (value_expression, percentile [{RESPECT | IGNORE} NULLS])
Using this function, you can calculate the median with the following:
SELECT
PERCENTILE_CONT(x, 0.5) OVER () AS median
FROM
UNNEST([1, 2, 4, 5, NULL]) AS x
LIMIT
1
By default NULL
values are ignored in the calculation but they can be included with the key words RESPECT NULLS
. If you include NULL
values then the following behaviour is important to know.
NULL
values returns NULL
.NULL
and a non-NULL
value returns the non-NULL
value. (NULL
is not just simply treated as 0).In the example below the results are different because we now take the NULL
value in the list of numbers into account, shifting the result to 2.
SELECT
PERCENTILE_CONT(x, 0.5 RESPECT NULLS) OVER () AS median
FROM
UNNEST([1, 2, 4, 5, NULL]) AS x
LIMIT
1
Practical Info
PERCENTILE_CONT
is a window function. You can learn more about window functions here.value_expression
must be either NUMERIC
, BIGNUMERIC
, FLOAT64
.percentile
must be a literal between 0 and 1.percentile
to 0 or 1 you can use percentile_cont
to calculate the minimum and maximum values respectively.PERCENTILE_DISC
has the same arguments as PERCENTILE_CONT
but provides a percentile value for a discrete set of values including strings and any data type that can be ordered.Common Questions
Why is there no MEDIAN
function in BigQuery?
There's no official explanation why MEDIAN
function isn't supported. BigQuery has a number of approximate aggregate functions which are designed to give approximate results but are much less computationally expensive across big data sets.
Troubleshooting Common Errors
The query could not be executed in the allotted memory. OVER() operator used too much memory
This error occurs when the table you're analysing is too big for BiqQuery to fit in memory. This can happen with big data tables and particularly if the window function has no partition and it is therefore running the calculation across the whole table. Instead you can use the approx_quantiles function to give you an estimated value.