SQL Resources/BigQuery/Data Types

Data Types

BigQuery supports several data types, some of which are standard (e.g. STRING, INT64), and others that are more complex (e.g. ARRAY, STRUCT).

In general though they easily fall into the following categories or Supertypes:

  1. Numeric
  1. INT64
  1. FLOAT64
  1. NUMERIC & BIGNUMERIC
  1. Boolean
  1. BOOL
  1. String
  1. STRING
  1. Bytes
  1. BYTES
  1. Date/Time
  1. DATE
  1. DATETIME
  1. TIMESTAMP
  1. TIME
  1. Array
  1. ARRAY
  1. Struct
  1. STRUCT
  1. Geography
  1. GEOGRAPHY

INT64

SELECT cast('12' as INT64) int
int
12

About

  • Range: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

Properties

  • ✅ Can be NULL
  • ✅ Can be used in an ORDER BY clause
  • ✅ Can be used in a GROUP BY , DISTINCT, PARTITION BY
  • ✅ Can be compared to other INT64s
  • CAST to: BOOL, INT64, NUMERIC, BIGNUMERIC, FLOAT64, STRING
  • Coercion To*: NUMERIC, BIGNUMERIC, FLOAT64

FLOAT64

SELECT cast('12.44' as FLOAT64) float
float
12.44

About

  • There are non-numeric values to consider: NaN and +/-inf
  • The order of values from least to greatest is:
  • NULL
  • NaN
  • -inf
  • Negative numbers
  • 0
  • Positive numbers
  • +inf

Properties

  • ✅ Can be NULL
  • ✅ Can be used in an ORDER BY clause
  • ✅ Can be used in a GROUP BY , DISTINCT, but NOT PARTITION BY
  • ✅ Can be compared to other FLOAT64s
  • CAST to: INT64, NUMERIC, BIGNUMERIC, FLOAT64, STRING
  • Coercion To*: N/A

NUMERIC and BIGNUMERIC

SELECT cast('12.4E17' as NUMERIC) numeric, cast('12.4E37' as BIGNUMERIC) bignumeric
numeric
1240000000000000000
bignumeric
124000000000000000000000000000000000000

About

  • Often used for financial calculations
  • NUMERIC
  • Precision: 38
  • Scale: 0
  • Min : -9.9999999999999999999999999999999999999E+29
  • Max: 9.9999999999999999999999999999999999999E+29
  • BIGNUMERIC
  • Precision: 76.76 (the 77th digit is partial)
  • Scale: 38
  • Min:-5.7896044618658097711785492504343953926634992332820282019728792003956564819968E+38
  • Max:5.7896044618658097711785492504343953926634992332820282019728792003956564819968E+38

Properties

  • ✅ Can be NULL
  • ✅ Can be used in an ORDER BY clause
  • ✅ Can be used in a GROUP BY , DISTINCT, PARTITION BY
  • ✅ Can be compared to other NUMERICSs and BIGNUMERICs
  • CAST to: INT64, NUMERIC, BIGNUMERIC, FLOAT64, STRING
  • Coercion To*:BIGNUMERIC, FLOAT64

BOOL

SELECT
  CAST('True' AS BOOL) AS boolean
boolean
TRUE

About

  • Represented by keywords: TRUE and FALSE
  • They are case insensitive
  • Boolean values are sorted from least to greatest:
  • NULL
  • FALSE
  • TRUE

Properties

  • ✅ Can be NULL
  • ✅ Can be used in an ORDER BY clause
  • ✅ Can be used in a GROUP BY , DISTINCT, PARTITION BY
  • ✅ Can be compared to other BOOLs
  • CAST to: INT64, BOOL, STRING
  • Coercion To*: N/A

STRING

SELECT
  CAST(123 AS STRING) AS string
string
123

About

  • Uses UTF-8 encoding
  • All STRING functions operate on Unicode characters rather than bytes
  • Most STRING functions are also defined on BYTES
  • Casting from STRING-> BYTES does UTF-8 encoding and decoding
  • BYTES -> STRING will return in error if BYTES are not valid UTF-8

Properties

  • ✅ Can be NULL
  • ✅ Can be used in an ORDER BY clause
  • ✅ Can be used in a GROUP BY , DISTINCT, PARTITION BY
  • ✅ Can be compared to other STRINGs
  • CAST to: BOOL, INT64, NUMERIC, BIGNUMERIC, FLOAT64,STRING, BYTES, DATE, DATETIME, TIME, TIMESTAMP
  • Coercion To*: N/A

BYTES

SELECT cast('hi' as BYTES) bytes
bytes
[104, 105]

About

  • Most STRING functions are also defined on BYTES
  • Casting from STRING-> BYTES does UTF-8 encoding and decoding
  • BYTES -> STRING will return in error if BYTES are not valid UTF-8

Properties

  • ✅ Can be NULL
  • ✅ Can be used in an ORDER BY clause
  • ✅ Can be used in a GROUP BY , DISTINCT, PARTITION BY
  • ✅ Can be compared to other BYTEs
  • CAST to: STRING, BYTES
  • Coercion To*: N/A

DATE

SELECT
  CAST('2020-08-24' AS DATE) AS date
date
2020-08-24

About

  • Independent of time zone.
  • Represents a calendar day, not a specific 24 hour window.
  • Range: 0001-01-01 to 9999-12-31.
  • Format: YYYY:[M]M:[D]D or 4-digit year : one or two digit month : one or two digit day

Properties

  • ✅ Can be NULL
  • ✅ Can be used in an ORDER BY clause
  • ✅ Can be used in a GROUP BY , DISTINCT, PARTITION BY
  • ✅ Can be compared to other DATEs
  • CAST to: STRING, DATE, DATETIME, TIMESTAMP
  • Coercion To*: N/A

DATETIME

SELECT
  CAST('2020-08-24 13:11:05' AS DATETIME) AS datetime
datetime
2020-08-24T13:11:05

About

  • Independent of time zone.
  • Represents a date and time as they would be displayed on a calendar or clock, not a specific 24 hour window.
  • Range: 0001-01-01 00:00:00 to 9999-12-31 23:59:59.999999
  • Format: YYYY-[M]M-[D]D[( |T)[H]H:[M]M:[S]S[.DDDDDD]]
  • YYYY: Four-digit year
  • [M]M: One or two digit month
  • [D]D: One or two digit day
  • ( |T): A space or a `T` separator
  • [H]H: One or two digit hour (valid values from 00 to 23)
  • [M]M: One or two digit minutes (valid values from 00 to 59)
  • [S]S: One or two digit seconds (valid values from 00 to 59)
  • [.DDDDDD]: Up to six fractional digits (microsecond precision)

Properties

  • ✅ Can be NULL
  • ✅ Can be used in an ORDER BY clause
  • ✅ Can be used in a GROUP BY , DISTINCT, PARTITION BY
  • ✅ Can be compared to other DATETIMEs
  • CAST to: STRING, DATE, DATETIME, TIMESTAMP, TIME
  • Coercion To*: N/A

TIMESTAMP

SELECT
  CAST('2020-08-24 13:11:05+5:00' AS TIMESTAMP) AS timestamp
timestamp
2020-08-24T08:11:05.000Z

About

  • Independent of time zone.
  • Represents a date and time as they would be displayed on a calendar or clock, not a specific 24 hour window.
  • Range: 0001-01-01 00:00:00 to 9999-12-31 23:59:59.999999 UTC
  • Format: YYYY-[M]M-[D]D[( |T)[H]H:[M]M:[S]S[.DDDDDD]][time zone]
  • YYYY: Four-digit year
  • [M]M: One or two digit month
  • [D]D: One or two digit day
  • ( |T): A space or a `T` separator
  • [H]H: One or two digit hour (valid values from 00 to 23)
  • [M]M: One or two digit minutes (valid values from 00 to 59)
  • [S]S: One or two digit seconds (valid values from 00 to 59)
  • [.DDDDDD]: Up to six fractional digits (microsecond precision)
  • [time zone]: String representing the time zone. When a time zone is not explicitly specified, the default time zone, UTC, is used. See the time zones section for details.

Properties

  • ✅ Can be NULL
  • ✅ Can be used in an ORDER BY clause
  • ✅ Can be used in a GROUP BY , DISTINCT, PARTITION BY
  • ✅ Can be compared to other TIMESTAMPs
  • CAST to: STRING, DATE, DATETIME, TIMESTAMP, TIME
  • Coercion To*: N/A

TIME

SELECT
  CAST('13:11:05' AS TIME) AS time
time
13:11:05

About

  • Independent of a specific date.
  • Independent of a timezone.
  • Range: 00:00:00 to 23:59:59.99999.
  • Format: [H]H:[M]M:[S]S[.DDDDDD]
  • [H]H: One or two digit hour (valid values from 00 to 23)
  • [M]M: One or two digit minutes (valid values from 00 to 59)
  • [S]S: One or two digit seconds (valid values from 00 to 59)
  • [.DDDDDD]: Up to six fractional digits (microsecond precision)

Properties

  • ✅ Can be NULL
  • ✅ Can be used in an ORDER BY clause
  • ✅ Can be used in a GROUP BY , DISTINCT, PARTITION BY
  • ✅ Can be compared to other TIME's
  • CAST to: STRING, TIME
  • Coercion To*: N/A

ARRAY

SELECT CAST([1,2,3] as ARRAY<INT64>) array_
array_
[1, 2, 3]

About

  • ARRAYs of ARRAYs are not allowed (use STRUCT instead)
  • There are currently 2 limitations with respect to NULLs and ARRAYs:
  • BigQuery raises an error if a query result contains ARRAYs which contain NULL elements, although such ARRAYs can be used inside the query.
  • BigQuery translates NULL into an empty ARRAY in the query result, although inside the query NULL and empty ARRAYs are two distinct values.
  • Declaring an ARRAY type: ARRAY<T>, where the type of elements in the ARRAY are contained inside the < and >.
  • ARRAY<INT64> = Simple ARRAY of integers.
  • ARRAY<STRUCT<INT64,INT64>> : ARRAY of STRUCTs that contain 2 integers.

Properties

  • 🚫 Cannot be NULL
  • 🚫 Cannot be used in an ORDER BY clause
  • 🚫 Cannot be used in a GROUP BY , DISTINCT, PARTITION BY
  • 🚫 Can be compared to other ARRAYs
  • CAST to: ARRAY
  • Coercion To*: N/A

STRUCT

SELECT CAST((1,'a') as STRUCT<INT64,STRING>) struct_
struct_
{"_field_1":1,"_field_2":"a"}

About

  • Declaring a STRUCT type: STRUCT<T>, where the type of elements in the STRUCT are contained inside the < and >.
  • STRUCT<INT64> = Simple STRUCT of integers.
  • STRUCT<x STRUCT<y INT64, z INT64>> : A STRUCT with a nested STRUCT named x inside it. The STRUCT x has two fields, y and z, both of which are 64-bit integers.
  • Constructing a STRUCT: (expr1, expr2 [, ... ])
  • There must be at least 2 expressions specified, otherwise it's indistinguishable from the element that is defined
  • If column names are used(e.g. (x, x+y)), the data types are derived from the column types
  • Comparing STRUCTs:
  • You can use any of the following operators:
  • Equal (=)
  • Not Equal (!= or <>)
  • [NOT] IN
  • These compare pairwise in ordinal order so you cannot be sure that you're comparing the same fields. To do that, compare the fields directly.

Properties

  • ✅ Can be NULL
  • 🚫 Cannot be used in an ORDER BY clause
  • 🚫 Cannot be used in a GROUP BY , DISTINCT, PARTITION BY
  • 🟡 LIMITED comparisons to other STRUCTs
  • CAST to: STRUCT
  • Coercion To*: N/A

GEOGRAPHY

SELECT CAST(ST_GEOGFROMTEXT('POINT(1 1)') as GEOGRAPHY) geography
geography
POINT(1 1)

About

  • To compare GEOGRAPHY values, use ST_Equals

Properties

  • ✅ Can be NULL
  • 🚫 Cannot be used in an ORDER BY clause
  • 🚫 Cannot be used in a GROUP BY , DISTINCT, PARTITION BY
  • 🟡 LIMITED comparisons to other GEOGRAPHYs

Coercion

Coercion is an implicit type conversion, which BigQuery performs automatically between certain data types.

Try a better way to write SQL in BigQuery
Write SQL together with Count.
Try a better way to write SQL in BigQuery
Write SQL together with Count.