Arrays are ordered lists in BigQuery. They are very powerful once you know how to use them, and can help with:
SELECT
[1,2,3] int_array,
[1.5, cast(2 as INT64), 6.7] mixed_numerical_array,
['2020-01-01' , '2020-01-01 12:01:00'] mixed_date_arrayARRAYs are their own Data Type in BigQuery. They can be comprised of any data type EXCEPT for ARRAYs. So no ARRAY of ARRAYs, which is a relief, honestly. To do that kind of logic you'll need to use STRUCTs.
Otherwise, the elements in an ARRAY must all be of the same Supertype. For example:
SELECT ARRAY<FLOAT64>[1,2,3] float_array| int_array | mixed_numerical_array | mixed_date_array |
|---|---|---|
| [1,2,3] | [1.5,2,6.7] | ["2020-01-01","2020-01-01 12:01:00"] |
In the query above there is an ARRAY of INT64, an ARRAY of INT64 and FLOAT64, and an ARRAY with a DATE and a DATETIME.
You can also explicitly define the data types in an ARRAY using ARRAY<DATA TYPE> before the ARRAY is specified:
SELECT
['hello', ' ', 'there'] AS welcome_array| float_array |
|---|
| [1,2,3] |
Let's get into the good stuff now!
As in the examples above the simplest way to generate an array is using Literals, or using brackets [ , ] to build an ARRAY.
GENERATE_ARRAY(start_expression, end_expression[, step_expression])| welcome_array |
|---|
| ["hello"," ","there"] |
Using Literals is fine for a few arrays, but doesn't work when you're trying to do something with more than a few elements and a few rows. For faster and more scalable ARRAY creation you can use one of the GENERATE functions:
GENERATE_DATE_ARRAY(start_date, end_date[, INTERVAL INT64_expr date_part])| even_numbers | month_starts |
|---|---|
| [0,2,4,6,8,10] | ["2020-01-01","2020-02-01","2020-03-01"] |
ARRAY_AGGTo create an array from data in your table, you can use ARRAY_AGG:
GENERATE_TIMESTAMP_ARRAY(start_timestamp, end_timestamp, INTERVAL step_expression date_part)You can read more about ARRAY_AGGhere.
| fruit_basket |
|---|
| ["apple","pear","banana"] |
To use different columns of data into an ARRAY, we can do the following:
SELECT
GENERATE_ARRAY(0, 10, 2) AS even_numbers,
GENERATE_DATE_ARRAY('2020-01-01', '2020-03-01', INTERVAL 1 MONTH) AS month_starts| id | arr |
|---|---|
| 1 | [{"value1":"a","value2":"x"},{"value1":"b","value2":"y"}] |
| 2 | [{"value1":"c","value2":"z"}] |
To access particular elements in an ARRAY we can use either OFFSET (0-based) or ORDINAL (1-based).
| offset_1 | ordinal_1 |
|---|---|
| basketball | football |
There's no advantage to using either OFFSET or ORDINAL, it simply comes down to personal preference.
Sometimes to do more dynamic indexing of Arrays, you may want to know the total number of elements, or length, of an array. To do that, you can use ARRAY_LENGTH:
ARRAY_AGG(
[DISTINCT] expression [{IGNORE|RESPECT} NULLS] [ORDER BY key [{ASC|DESC}] [, ... ]] [LIMIT n]
) [OVER (...)]ARRAY_LENGTH(array_expression)It's also handy to reverse Arrays if you want to work with elements at the end of your ARRAY. To do that you can use ARRAY_REVERSE:
SELECT
ARRAY_AGG(fruit) AS fruit_basket
FROM
(
SELECT
"apple" AS fruit
UNION ALL
( SELECT
"pear" AS fruit)
UNION ALL
( SELECT
"banana" AS fruit)
) AS table_3| reversed |
|---|
| ["c","b","a"] |
There are 2 options here:
SELECT
id,
ARRAY_AGG(STRUCT(value1, value2)) arr
FROM (
SELECT 1 id, 'a' value1, 'x' value2
UNION ALL (
SELECT 1 , 'b' value1, 'y' value2
) UNION ALL (
SELECT 2 id, 'c' value1, 'z' value2
)
)
GROUP BY idOFFSET/ORDINAL with ARRAY_LENGTHOFFSET/ORDINAL with ARRAY_REVERSE| last_with_array_length | last_with_reverse_array |
|---|---|
| music | music |
To convert an array to a set of rows we can use UNNEST:
SELECT
interests[OFFSET(1)] AS offset_1,
interests[ORDINAL(1)] AS ordinal_1
FROM
(
SELECT
['football', 'basketball', 'music'] AS interests
) AS table_1| element | offset |
|---|---|
| A | 0 |
| B | 1 |
| C | 2 |
To do this we'll make use to UNNEST and GENERATE_DATE_ARRAY
| date_in_2020 |
|---|
| 2020-01-01 |
| 2020-01-02 |
| 2020-01-03 |
| 2020-01-04 |
| 2020-01-05 |
| 2020-01-06 |
| 2020-01-07 |
| 2020-01-08 |
| 2020-01-09 |
| 2020-01-10 |
| 2020-01-11 |
| 2020-01-12 |
| 2020-01-13 |
| 2020-01-14 |
| 2020-01-15 |
| 2020-01-16 |
| 2020-01-17 |
| 2020-01-18 |
| 2020-01-19 |
| 2020-01-20 |
| 2020-01-21 |
| 2020-01-22 |
| 2020-01-23 |
| 2020-01-24 |
| 2020-01-25 |
| 2020-01-26 |
| 2020-01-27 |
| 2020-01-28 |
| 2020-01-29 |
| 2020-01-30 |
| 2020-01-31 |
| 2020-02-01 |
| 2020-02-02 |
| 2020-02-03 |
| 2020-02-04 |
| 2020-02-05 |
| 2020-02-06 |
| 2020-02-07 |
| 2020-02-08 |
| 2020-02-09 |
| 2020-02-10 |
| 2020-02-11 |
| 2020-02-12 |
| 2020-02-13 |
| 2020-02-14 |
| 2020-02-15 |
| 2020-02-16 |
| 2020-02-17 |
| 2020-02-18 |
| 2020-02-19 |
| 2020-02-20 |
| 2020-02-21 |
| 2020-02-22 |
| 2020-02-23 |
| 2020-02-24 |
| 2020-02-25 |
| 2020-02-26 |
| 2020-02-27 |
| 2020-02-28 |
| 2020-02-29 |
| 2020-03-01 |
| 2020-03-02 |
| 2020-03-03 |
| 2020-03-04 |
| 2020-03-05 |
| 2020-03-06 |
| 2020-03-07 |
| 2020-03-08 |
| 2020-03-09 |
| 2020-03-10 |
| 2020-03-11 |
| 2020-03-12 |
| 2020-03-13 |
| 2020-03-14 |
| 2020-03-15 |
| 2020-03-16 |
| 2020-03-17 |
| 2020-03-18 |
| 2020-03-19 |
| 2020-03-20 |
| 2020-03-21 |
| 2020-03-22 |
| 2020-03-23 |
| 2020-03-24 |
| 2020-03-25 |
| 2020-03-26 |
| 2020-03-27 |
| 2020-03-28 |
| 2020-03-29 |
| 2020-03-30 |
| 2020-03-31 |
| 2020-04-01 |
| 2020-04-02 |
| 2020-04-03 |
| 2020-04-04 |
| 2020-04-05 |
| 2020-04-06 |
| 2020-04-07 |
| 2020-04-08 |
| 2020-04-09 |
There are times when you want to unnest an ARRAY without affecting other columns in your data. To do that we can use UNNEST in combination with a CROSS JOIN:
SELECT
ARRAY_LENGTH([1, 2, NULL])| id | flattened_numbers |
|---|---|
| 1 | 0 |
| 1 | 1 |
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 5 |
| 2 | 2 |
| 2 | 4 |
| 2 | 8 |
| 2 | 16 |
| 2 | 32 |
| 3 | 5 |
| 3 | 10 |
Here, we've replicated our id for each flattened ARRAY row.
To filter an ARRAY, we can make use of the ARRAY function:
SELECT
ARRAY_REVERSE(['a', 'b', 'c']) AS reversedARRAY(subquery)In the subquery argument, we can put in an entire SELECT... statement with our own criteria including:
SELECT
interests[ORDINAL(ARRAY_LENGTH(interests))] AS last_with_array_length,
ARRAY_REVERSE(interests)[ORDINAL(1)] AS last_with_reverse_array
FROM
(
SELECT
['football', 'basketball', 'music'] AS interests
) AS table_1| unique_numbers |
|---|
| [0,1,2,3] |
Here, we've filtered our ARRAY[0,1,1,2,3,5] to only unique elements that are less than or equal to 3.
Many times you want to consolidate an array - either by aggregating them into a single value or string, or by combining it with other arrays.
To convert an ARRAY of STRINGs to a single STRING, you can use ARRAY_TO_STRING:
SELECT
*
FROM
UNNEST(['A', 'B', 'C']) AS element WITH OFFSET AS `offset`ARRAY_TO_STRING(array_expression, delimiter[, null_text])| greetings |
|---|
| Hello World |
To turn an ARRAY of numerical values to a single value, you can use any Aggregation function with the UNNEST function:
SELECT
*
FROM
UNNEST(GENERATE_DATE_ARRAY(CAST('2020-01-01' AS DATE), CAST('2020-12-31' AS DATE))) AS date_in_2020| total | avg | max |
|---|---|---|
| 6 | 2 | 3 |
To combine ARRAYs in several rows to a single ARRAY, you can use ARRAY_CONCAT_AGG:
SELECT
id,
flattened_numbers
FROM
(
SELECT
1 AS id,
[0, 1, 1, 2, 3, 5] AS some_numbers
UNION ALL
( SELECT
2 AS id,
[2, 4, 8, 16, 32] AS some_numbers)
UNION ALL
( SELECT
3 AS id,
[5, 10] AS some_numbers)
) AS sequences
CROSS JOIN UNNEST(sequences.some_numbers) AS flattened_numbers| group | count_to_six_agg |
|---|---|
| A | [1,2,3,4] |
| B | [5,6] |
To merge several ARRAYs into a single ARRAY, you can use ARRAY_CONCAT:
SELECT ARRAY(
SELECT DISTINCT x
FROM UNNEST((
SELECT [0, 1, 1, 2, 3, 5] AS some_numbers
)) AS x WHERE x <= 3
) AS unique_numbers| count_to_six_agg |
|---|
| [1,2,6,4] |
SELECT
ARRAY_TO_STRING(greeting, " ") AS greetings
FROM
(
SELECT
['Hello', 'World'] AS greeting
) AS table_1SELECT
SUM(x) AS total,
AVG(x) AS avg,
MAX(x) AS max
FROM
UNNEST([1, 2, 3]) AS xARRAY_CONCAT_AGG(expression [ORDER BY key [{ASC|DESC}] [, ... ]] [LIMIT n])SELECT
`group`,
ARRAY_CONCAT_AGG(numbers) AS count_to_six_agg
FROM
(
SELECT
'A' AS `group`,
[1, 2] AS numbers
UNION ALL
( SELECT
'A' AS `group`,
[3, 4] AS numbers)
UNION ALL
( SELECT
'B' AS `group`,
[5, 6] AS numbers)
) AS table_3
GROUP BY
`group`ARRAY_CONCAT(array_expression_1 [, array_expression_n])SELECT
ARRAY_CONCAT([1, 2], [6, 4]) AS count_to_six_agg