SQL Resources/BigQuery/MEDIAN

MEDIAN

Definition

There is no MEDIAN function in BigQuery, but it can be calculated using the PERCENTILE_CONT function.

Syntax

PERCENTILE_CONT (value_expression, percentile [{RESPECT | IGNORE} NULLS])

Using this function, you can calculate the median with the following:

SELECT
  PERCENTILE_CONT(x, 0.5) OVER () AS median
FROM
  UNNEST([1, 2, 4, 5, NULL]) AS x
LIMIT
  1
median
3

By default NULL values are ignored in the calculation but they can be included with the key words RESPECT NULLS. If you include NULL values then the following behaviour is important to know.

  • Any interpolation between two NULL values returns NULL.
  • An interpolation between a NULL and a non-NULL value returns the non-NULL value. (NULL is not just simply treated as 0).

In the example below the results are different because we now take the NULL value in the list of numbers into account, shifting the result to 2.

SELECT
  PERCENTILE_CONT(x, 0.5 RESPECT NULLS) OVER () AS median
FROM
  UNNEST([1, 2, 4, 5, NULL]) AS x
LIMIT
  1
median
2

Practical Info

  • PERCENTILE_CONT is a window function. You can learn more about window functions here.
  • value_expression must be either NUMERIC, BIGNUMERIC, FLOAT64.
  • percentile must be a literal between 0 and 1.
  • If you set percentile to 0 or 1 you can use percentile_cont to calculate the minimum and maximum values respectively.
  • The function PERCENTILE_DISC has the same arguments as PERCENTILE_CONT but provides a percentile value for a discrete set of values including strings and any data type that can be ordered.

Common Questions

Why is there no MEDIAN function in BigQuery?

There's no official explanation why MEDIAN function isn't supported. BigQuery has a number of approximate aggregate functions which are designed to give approximate results but are much less computationally expensive across big data sets.

Troubleshooting Common Errors

The query could not be executed in the allotted memory. OVER() operator used too much memory

This error occurs when the table you're analysing is too big for BiqQuery to fit in memory. This can happen with big data tables and particularly if the window function has no partition and it is therefore running the calculation across the whole table. Instead you can use the approx_quantiles function to give you an estimated value.

Related Pages

Become a SQL Expert
Get five free SQL lessons from Count's Head of Data, Taylor Brownlow.
Email
Title