SELECT knowledge FROM sql_resources WHERE category='bigquery-standard-sql' AND slug='arrays-explained'

Arrays Explained

Introduction

Arrays are ordered lists in BigQuery. They are very powerful once you know how to use them, and can help with:

Loading code...
  • Transforming non-normalized data (e.g. Survey responses)
  • Advanced string manipulation
  • Optimising storage and performance

The Basics

ARRAYs are their own Data Type in BigQuery. They can be comprised of any data type EXCEPT for ARRAYs. So no ARRAY of ARRAYs, which is a relief, honestly. To do that kind of logic you'll need to use STRUCTs.

Otherwise, the elements in an ARRAY must all be of the same Supertype. For example:

Loading code...
int_arraymixed_numerical_arraymixed_date_array
[1,2,3][1.5,2,6.7]["2020-01-01","2020-01-01 12:01:00"]

In the query above there is an ARRAY of INT64, an ARRAY of INT64 and FLOAT64, and an ARRAY with a DATE and a DATETIME.

You can also explicitly define the data types in an ARRAY using ARRAY<DATA TYPE> before the ARRAY is specified:

Loading code...
float_array
[1,2,3]

Let's get into the good stuff now!

Creating Arrays

Generating an array using literals

As in the examples above the simplest way to generate an array is using Literals, or using brackets [ , ] to build an ARRAY.

GENERATE_ARRAY(start_expression, end_expression[, step_expression])
welcome_array
["hello"," ","there"]

Generating an array using GENERATE functions

Using Literals is fine for a few arrays, but doesn't work when you're trying to do something with more than a few elements and a few rows. For faster and more scalable ARRAY creation you can use one of the GENERATE functions:

GENERATE_DATE_ARRAY(start_date, end_date[, INTERVAL INT64_expr date_part])
even_numbersmonth_starts
[0,2,4,6,8,10]["2020-01-01","2020-02-01","2020-03-01"]

Turn rows to arrays using ARRAY_AGG

To create an array from data in your table, you can use ARRAY_AGG:

GENERATE_TIMESTAMP_ARRAY(start_timestamp, end_timestamp, INTERVAL step_expression date_part)

You can read more about ARRAY_AGGhere.

fruit_basket
["apple","pear","banana"]

How to combine columns into an array?

To use different columns of data into an ARRAY, we can do the following:

Loading code...
idarr
1[{"value1":"a","value2":"x"},{"value1":"b","value2":"y"}]
2[{"value1":"c","value2":"z"}]

Accessing Array Elements

Indexing an array

To access particular elements in an ARRAY we can use either OFFSET (0-based) or ORDINAL (1-based).

offset_1ordinal_1
basketballfootball

There's no advantage to using either OFFSET or ORDINAL, it simply comes down to personal preference.

Finding the length of an array

Sometimes to do more dynamic indexing of Arrays, you may want to know the total number of elements, or length, of an array. To do that, you can use ARRAY_LENGTH:

Loading code...
  • ARRAY_LENGTH(array_expression)

Reversing arrays

It's also handy to reverse Arrays if you want to work with elements at the end of your ARRAY. To do that you can use ARRAY_REVERSE:

Loading code...
reversed
["c","b","a"]

How to get the last element of an array?

There are 2 options here:

Loading code...
  • Use OFFSET/ORDINAL with ARRAY_LENGTH
  • Use OFFSET/ORDINAL with ARRAY_REVERSE
last_with_array_lengthlast_with_reverse_array
musicmusic

Transforming Arrays

Flattening an array

To convert an array to a set of rows we can use UNNEST:

Loading code...
elementoffset
A0
B1
C2

How to create a column with every day in a year?

To do this we'll make use to UNNEST and GENERATE_DATE_ARRAY

date_in_2020
2020-01-01
2020-01-02
2020-01-03
2020-01-04
2020-01-05
2020-01-06
2020-01-07
2020-01-08
2020-01-09
2020-01-10
2020-01-11
2020-01-12
2020-01-13
2020-01-14
2020-01-15
2020-01-16
2020-01-17
2020-01-18
2020-01-19
2020-01-20
2020-01-21
2020-01-22
2020-01-23
2020-01-24
2020-01-25
2020-01-26
2020-01-27
2020-01-28
2020-01-29
2020-01-30
2020-01-31
2020-02-01
2020-02-02
2020-02-03
2020-02-04
2020-02-05
2020-02-06
2020-02-07
2020-02-08
2020-02-09
2020-02-10
2020-02-11
2020-02-12
2020-02-13
2020-02-14
2020-02-15
2020-02-16
2020-02-17
2020-02-18
2020-02-19
2020-02-20
2020-02-21
2020-02-22
2020-02-23
2020-02-24
2020-02-25
2020-02-26
2020-02-27
2020-02-28
2020-02-29
2020-03-01
2020-03-02
2020-03-03
2020-03-04
2020-03-05
2020-03-06
2020-03-07
2020-03-08
2020-03-09
2020-03-10
2020-03-11
2020-03-12
2020-03-13
2020-03-14
2020-03-15
2020-03-16
2020-03-17
2020-03-18
2020-03-19
2020-03-20
2020-03-21
2020-03-22
2020-03-23
2020-03-24
2020-03-25
2020-03-26
2020-03-27
2020-03-28
2020-03-29
2020-03-30
2020-03-31
2020-04-01
2020-04-02
2020-04-03
2020-04-04
2020-04-05
2020-04-06
2020-04-07
2020-04-08
2020-04-09

There are times when you want to unnest an ARRAY without affecting other columns in your data. To do that we can use UNNEST in combination with a CROSS JOIN:

SELECT
  ARRAY_LENGTH([1, 2, NULL])
idflattened_numbers
10
11
11
12
13
15
22
24
28
216
232
35
310

Here, we've replicated our id for each flattened ARRAY row.

Filtering an array

To filter an ARRAY, we can make use of the ARRAY function:

Loading code...
  • ARRAY(subquery)

In the subquery argument, we can put in an entire SELECT... statement with our own criteria including:

Loading code...
unique_numbers
[0,1,2,3]

Here, we've filtered our ARRAY[0,1,1,2,3,5] to only unique elements that are less than or equal to 3.

Aggregating Arrays

Many times you want to consolidate an array - either by aggregating them into a single value or string, or by combining it with other arrays.

ARRAY to STRING

To convert an ARRAY of STRINGs to a single STRING, you can use ARRAY_TO_STRING:

Loading code...
  • ARRAY_TO_STRING(array_expression, delimiter[, null_text])
greetings
Hello World

ARRAY to VALUE

To turn an ARRAY of numerical values to a single value, you can use any Aggregation function with the UNNEST function:

Loading code...
totalavgmax
623

Aggregate arrays across rows

To combine ARRAYs in several rows to a single ARRAY, you can use ARRAY_CONCAT_AGG:

Loading code...
groupcount_to_six_agg
A[1,2,3,4]
B[5,6]

Combining arrays

To merge several ARRAYs into a single ARRAY, you can use ARRAY_CONCAT:

Loading code...
count_to_six_agg
[1,2,6,4]
Loading code...
Loading code...
ARRAY_CONCAT_AGG(expression  [ORDER BY key [{ASC|DESC}] [, ... ]]  [LIMIT n])
Loading code...
ARRAY_CONCAT(array_expression_1 [, array_expression_n])
Loading code...