Arrays Explained
Introduction
Arrays are ordered lists in BigQuery. They are very powerful once you know how to use them, and can help with:
The Basics
ARRAY
s are their own Data Type in BigQuery. They can be comprised of any data type EXCEPT for ARRAY
s. So no ARRAY
of ARRAY
s, which is a relief, honestly. To do that kind of logic you'll need to use STRUCT
s.
Otherwise, the elements in an ARRAY
must all be of the same Supertype. For example:
SELECT
[1,2,3] int_array,
[1.5, cast(2 as INT64), 6.7] mixed_numerical_array,
['2020-01-01' , '2020-01-01 12:01:00'] mixed_date_array
In the query above there is an ARRAY
of INT64
, an ARRAY
of INT64
and FLOAT64
, and an ARRAY
with a DATE
and a DATETIME
.
You can also explicitly define the data types in an ARRAY
using ARRAY<DATA TYPE>
before the ARRAY
is specified:
SELECT ARRAY<FLOAT64>[1,2,3] float_array
Let's get into the good stuff now!
Creating Arrays
Generating an array using literals
As in the examples above the simplest way to generate an array is using Literals, or using brackets [
, ]
to build an ARRAY
.
SELECT
['hello', ' ', 'there'] AS welcome_array
Generating an array using GENERATE functions
Using Literals is fine for a few arrays, but doesn't work when you're trying to do something with more than a few elements and a few rows. For faster and more scalable ARRAY
creation you can use one of the GENERATE
functions:
GENERATE_ARRAY(start_expression, end_expression[, step_expression])
GENERATE_DATE_ARRAY(start_date, end_date[, INTERVAL INT64_expr date_part])
GENERATE_TIMESTAMP_ARRAY(start_timestamp, end_timestamp, INTERVAL step_expression date_part)
SELECT
GENERATE_ARRAY(0, 10, 2) AS even_numbers,
GENERATE_DATE_ARRAY('2020-01-01', '2020-03-01', INTERVAL 1 MONTH) AS month_starts
Turn rows to arrays using ARRAY_AGG
To create an array from data in your table, you can use ARRAY_AGG
:
ARRAY_AGG(
[DISTINCT] expression [{IGNORE|RESPECT} NULLS] [ORDER BY key [{ASC|DESC}] [, ... ]] [LIMIT n]
) [OVER (...)]
You can read more about ARRAY_AGG
here.
SELECT
ARRAY_AGG(fruit) AS fruit_basket
FROM
(
SELECT
"apple" AS fruit
UNION ALL
( SELECT
"pear" AS fruit)
UNION ALL
( SELECT
"banana" AS fruit)
) AS table_3
How to combine columns into an array?
To use different columns of data into an ARRAY, we can do the following:
SELECT
id,
ARRAY_AGG(STRUCT(value1, value2)) arr
FROM (
SELECT 1 id, 'a' value1, 'x' value2
UNION ALL (
SELECT 1 , 'b' value1, 'y' value2
) UNION ALL (
SELECT 2 id, 'c' value1, 'z' value2
)
)
GROUP BY id
Accessing Array Elements
Indexing an array
To access particular elements in an ARRAY
we can use either OFFSET
(0-based) or ORDINAL
(1-based).
SELECT
interests[OFFSET(1)] AS offset_1,
interests[ORDINAL(1)] AS ordinal_1
FROM
(
SELECT
['football', 'basketball', 'music'] AS interests
) AS table_1
There's no advantage to using either OFFSET
or ORDINAL
, it simply comes down to personal preference.
Finding the length of an array
Sometimes to do more dynamic indexing of Arrays, you may want to know the total number of elements, or length, of an array. To do that, you can use ARRAY_LENGTH
:
ARRAY_LENGTH(array_expression)
SELECT
ARRAY_LENGTH([1, 2, NULL])
Reversing arrays
It's also handy to reverse Arrays if you want to work with elements at the end of your ARRAY
. To do that you can use ARRAY_REVERSE
:
SELECT
ARRAY_REVERSE(['a', 'b', 'c']) AS reversed
How to get the last element of an array?
There are 2 options here:
OFFSET
/ORDINAL
with ARRAY_LENGTH
OFFSET
/ORDINAL
with ARRAY_REVERSE
SELECT
interests[ORDINAL(ARRAY_LENGTH(interests))] AS last_with_array_length,
ARRAY_REVERSE(interests)[ORDINAL(1)] AS last_with_reverse_array
FROM
(
SELECT
['football', 'basketball', 'music'] AS interests
) AS table_1
Transforming Arrays
Flattening an array
To convert an array to a set of rows we can use UNNEST
:
SELECT
*
FROM
UNNEST(['A', 'B', 'C']) AS element WITH OFFSET AS `offset`
How to create a column with every day in a year?
To do this we'll make use to UNNEST
and GENERATE_DATE_ARRAY
SELECT
*
FROM
UNNEST(GENERATE_DATE_ARRAY(CAST('2020-01-01' AS DATE), CAST('2020-12-31' AS DATE))) AS date_in_2020
There are times when you want to unnest an ARRAY without affecting other columns in your data. To do that we can use UNNEST
in combination with a CROSS JOIN
:
SELECT
id,
flattened_numbers
FROM
(
SELECT
1 AS id,
[0, 1, 1, 2, 3, 5] AS some_numbers
UNION ALL
( SELECT
2 AS id,
[2, 4, 8, 16, 32] AS some_numbers)
UNION ALL
( SELECT
3 AS id,
[5, 10] AS some_numbers)
) AS sequences
CROSS JOIN UNNEST(sequences.some_numbers) AS flattened_numbers
Here, we've replicated our id for each flattened ARRAY
row.
Filtering an array
To filter an ARRAY, we can make use of the ARRAY function:
ARRAY(subquery)
In the subquery
argument, we can put in an entire SELECT...
statement with our own criteria including:
DISTINCT
WHERE
SELECT ARRAY(
SELECT DISTINCT x
FROM UNNEST((
SELECT [0, 1, 1, 2, 3, 5] AS some_numbers
)) AS x WHERE x <= 3
) AS unique_numbers
Here, we've filtered our ARRAY
[0,1,1,2,3,5]
to only unique elements that are less than or equal to 3.
Aggregating Arrays
Many times you want to consolidate an array - either by aggregating them into a single value or string, or by combining it with other arrays.
ARRAY to STRING
To convert an ARRAY
of STRING
s to a single STRING
, you can use ARRAY_TO_STRING
:
ARRAY_TO_STRING(array_expression, delimiter[, null_text])
SELECT
ARRAY_TO_STRING(greeting, " ") AS greetings
FROM
(
SELECT
['Hello', 'World'] AS greeting
) AS table_1
ARRAY to VALUE
To turn an ARRAY of numerical values to a single value, you can use any Aggregation function with the UNNEST function:
SELECT
SUM(x) AS total,
AVG(x) AS avg,
MAX(x) AS max
FROM
UNNEST([1, 2, 3]) AS x
Aggregate arrays across rows
To combine ARRAY
s in several rows to a single ARRAY
, you can use ARRAY_CONCAT_AGG
:
ARRAY_CONCAT_AGG(expression [ORDER BY key [{ASC|DESC}] [, ... ]] [LIMIT n])
SELECT
`group`,
ARRAY_CONCAT_AGG(numbers) AS count_to_six_agg
FROM
(
SELECT
'A' AS `group`,
[1, 2] AS numbers
UNION ALL
( SELECT
'A' AS `group`,
[3, 4] AS numbers)
UNION ALL
( SELECT
'B' AS `group`,
[5, 6] AS numbers)
) AS table_3
GROUP BY
`group`
Combining arrays
To merge several ARRAY
s into a single ARRAY
, you can use ARRAY_CONCAT
:
ARRAY_CONCAT(array_expression_1 [, array_expression_n])
SELECT
ARRAY_CONCAT([1, 2], [6, 4]) AS count_to_six_agg