SELECT knowledge FROM sql_resources WHERE category='bigquery-standard-sql' AND slug='data-types-in-bigquery'

Data Types

BigQuery supports several data types, some of which are standard (e.g. STRING, INT64), and others that are more complex (e.g. ARRAY, STRUCT).

In general though they easily fall into the following categories or Supertypes:

SELECT cast('12' as INT64) int
  • NUMERIC & BIGNUMERIC

Integers are always whole numbers; they have no decimals or fractional components.

  • Range: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

Properties

  • ✅ Can be NULL
  • ✅ Can be used in an ORDER BY clause
  • ✅ Can be used in a GROUP BY , DISTINCT, PARTITION BY
  • ✅ Can be compared to other INT64s
  • CAST to: BOOL, INT64, NUMERIC, BIGNUMERIC, FLOAT64, STRING
  • Coercion To*: NUMERIC, BIGNUMERIC, FLOAT64

Floating point numbers are approximate numeric values with decimal/fractional components.

float
12.44
  • There are non-numeric values to consider: NaN and +/-inf
  • The order of values from least to greatest is:
  • Negative numbers
  • Positive numbers

Properties

  • ✅ Can be NULL
  • ✅ Can be used in an ORDER BY clause
  • ✅ Can be used in a GROUP BY , DISTINCT, but NOT PARTITION BY
  • ✅ Can be compared to other FLOAT64s
  • CAST to: INT64, NUMERIC, BIGNUMERIC, FLOAT64, STRING
  • Coercion To*: N/A

NUMERIC and BIGNUMERIC

Decimals are numeric values with fixed precision and scale. Precision = the number of digits, and scale = the number of these digits appear after the decimal point.

numericbignumeric
1240000000000000000124000000000000000000000000000000000000
  • Often used for financial calculations
  • Precision: 38
  • Min : -9.9999999999999999999999999999999999999E+29
  • Max: 9.9999999999999999999999999999999999999E+29
  • BIGNUMERIC
  • Precision: 76.76 (the 77th digit is partial)
  • Min:-5.7896044618658097711785492504343953926634992332820282019728792003956564819968E+38
  • Max:5.7896044618658097711785492504343953926634992332820282019728792003956564819968E+38

Properties

  • ✅ Can be NULL
  • ✅ Can be used in an ORDER BY clause
  • ✅ Can be used in a GROUP BY , DISTINCT, PARTITION BY
  • ✅ Can be compared to other NUMERICSs and BIGNUMERICs
  • CAST to: INT64, NUMERIC, BIGNUMERIC, FLOAT64, STRING
  • Coercion To*:BIGNUMERIC, FLOAT64

Booleans are True or False

boolean
true
  • Represented by keywords: TRUE and FALSE
  • They are case insensitive
  • Boolean values are sorted from least to greatest:

Properties

  • ✅ Can be NULL
  • ✅ Can be used in an ORDER BY clause
  • ✅ Can be used in a GROUP BY , DISTINCT, PARTITION BY
  • ✅ Can be compared to other BOOLs
  • CAST to: INT64, BOOL, STRING
  • Coercion To*: N/A

Variable-length text (unicode) data

  • Uses UTF-8 encoding
  • All STRING functions operate on Unicode characters rather than bytes
  • Most STRING functions are also defined on BYTES
  • Casting from STRING-> BYTES does UTF-8 encoding and decoding
  • BYTES -> STRING will return in error if BYTES are not valid UTF-8

Properties

  • ✅ Can be NULL
  • ✅ Can be used in an ORDER BY clause
  • ✅ Can be used in a GROUP BY , DISTINCT, PARTITION BY
  • ✅ Can be compared to other STRINGs
  • CAST to: BOOL, INT64, NUMERIC, BIGNUMERIC, FLOAT64,STRING, BYTES, DATE, DATETIME, TIME, TIMESTAMP
  • Coercion To*: N/A

Variable-length binary data

bytes
104,105
  • Most STRING functions are also defined on BYTES
  • Casting from STRING-> BYTES does UTF-8 encoding and decoding
  • BYTES -> STRING will return in error if BYTES are not valid UTF-8

Properties

  • ✅ Can be NULL
  • ✅ Can be used in an ORDER BY clause
  • ✅ Can be used in a GROUP BY , DISTINCT, PARTITION BY
  • ✅ Can be compared to other BYTEs
  • CAST to: STRING, BYTES
  • Coercion To*: N/A

Represents a logical calendar date (e.g. 2020-01-03)

date
2020-08-24
  • Independent of time zone.
  • Represents a calendar day, not a specific 24 hour window.
  • Range: 0001-01-01 to 9999-12-31.
  • Format: YYYY:[M]M:[D]D or 4-digit year : one or two digit month : one or two digit day

Properties

  • ✅ Can be NULL
  • ✅ Can be used in an ORDER BY clause
  • ✅ Can be used in a GROUP BY , DISTINCT, PARTITION BY
  • ✅ Can be compared to other DATEs
  • CAST to: STRING, DATE, DATETIME, TIMESTAMP
  • Coercion To*: N/A

Represents a logical calendar datetime (e.g. 2020-01-03 12:04:22)

datetime
2020-08-24T13:11:05
  • Independent of time zone.
  • Represents a date and time as they would be displayed on a calendar or clock, not a specific 24 hour window.
  • Range: 0001-01-01 00:00:00 to 9999-12-31 23:59:59.999999
  • Format: YYYY-[M]M-[D]D[( |T)[H]H:[M]M:[S]S[.DDDDDD]]
  • YYYY: Four-digit year
  • [M]M: One or two digit month
  • [D]D: One or two digit day
  • ( |T): A space or a `T` separator
  • [H]H: One or two digit hour (valid values from 00 to 23)
  • [M]M: One or two digit minutes (valid values from 00 to 59)
  • [S]S: One or two digit seconds (valid values from 00 to 59)
  • [.DDDDDD]: Up to six fractional digits (microsecond precision)

Properties

  • ✅ Can be NULL
  • ✅ Can be used in an ORDER BY clause
  • ✅ Can be used in a GROUP BY , DISTINCT, PARTITION BY
  • ✅ Can be compared to other DATETIMEs
  • CAST to: STRING, DATE, DATETIME, TIMESTAMP, TIME
  • Coercion To*: N/A

Represents an absolute point in time (e.g. 2020-01-03 12:04:22 UTC)

timestamp
2020-08-24T08:11:05.000Z
  • Independent of time zone.
  • Represents a date and time as they would be displayed on a calendar or clock, not a specific 24 hour window.
  • Range: 0001-01-01 00:00:00 to 9999-12-31 23:59:59.999999 UTC
  • Format: YYYY-[M]M-[D]D[( |T)[H]H:[M]M:[S]S[.DDDDDD]][time zone]
  • YYYY: Four-digit year
  • [M]M: One or two digit month
  • [D]D: One or two digit day
  • ( |T): A space or a `T` separator
  • [H]H: One or two digit hour (valid values from 00 to 23)
  • [M]M: One or two digit minutes (valid values from 00 to 59)
  • [S]S: One or two digit seconds (valid values from 00 to 59)
  • [.DDDDDD]: Up to six fractional digits (microsecond precision)
  • [time zone]: String representing the time zone. When a time zone is not explicitly specified, the default time zone, UTC, is used. See the time zones section for details.

Properties

  • ✅ Can be NULL
  • ✅ Can be used in an ORDER BY clause
  • ✅ Can be used in a GROUP BY , DISTINCT, PARTITION BY
  • ✅ Can be compared to other TIMESTAMPs
  • CAST to: STRING, DATE, DATETIME, TIMESTAMP, TIME
  • Coercion To*: N/A

Represents a time as might be displayed on a watch (e.g. 10:14:00)

time
13:11:05
  • Independent of a specific date.
  • Independent of a timezone.
  • Range: 00:00:00 to 23:59:59.99999.
  • Format: [H]H:[M]M:[S]S[.DDDDDD]
  • [H]H: One or two digit hour (valid values from 00 to 23)
  • [M]M: One or two digit minutes (valid values from 00 to 59)
  • [S]S: One or two digit seconds (valid values from 00 to 59)
  • [.DDDDDD]: Up to six fractional digits (microsecond precision)

Properties

  • ✅ Can be NULL
  • ✅ Can be used in an ORDER BY clause
  • ✅ Can be used in a GROUP BY , DISTINCT, PARTITION BY
  • ✅ Can be compared to other TIME's
  • CAST to: STRING, TIME
  • Coercion To*: N/A

Ordered list of 0+ elements of non-ARRAY type

array_
1,2,3
  • ARRAYs of ARRAYs are not allowed (use STRUCT instead)
  • There are currently 2 limitations with respect to NULLs and ARRAYs:
  • BigQuery raises an error if a query result contains ARRAYs which contain NULL elements, although such ARRAYs can be used inside the query.
  • BigQuery translates NULL into an empty ARRAY in the query result, although inside the query NULL and empty ARRAYs are two distinct values.
  • Declaring an ARRAY type: ARRAY, where the type of elements in the ARRAY are contained inside the < and >.
  • ARRAY = Simple ARRAY of integers.
  • ARRAY> : ARRAY of STRUCTs that contain 2 integers.

Properties

  • 🚫 Cannot be NULL
  • 🚫 Cannot be used in an ORDER BY clause
  • 🚫 Cannot be used in a GROUP BY , DISTINCT, PARTITION BY
  • 🚫 Can be compared to other ARRAYs
  • CAST to: ARRAY
  • Coercion To*: N/A

Container of ordered fields, each with a type (required) and field name (optional).

struct_
[object Object]
  • Declaring a STRUCT type: STRUCT, where the type of elements in the STRUCT are contained inside the < and >.
  • STRUCT = Simple STRUCT of integers.
  • STRUCT> : A STRUCT with a nested STRUCT named x inside it. The STRUCT x has two fields, y and z, both of which are 64-bit integers.
  • Constructing a STRUCT: (expr1, expr2 [, ... ])
  • There must be at least 2 expressions specified, otherwise it's indistinguishable from the element that is defined
  • If column names are used(e.g. (x, x+y)), the data types are derived from the column types
  • Comparing STRUCTs:
  • You can use any of the following operators:
  • Not Equal (!= or <>)
  • These compare pairwise in ordinal order so you cannot be sure that you're comparing the same fields. To do that, compare the fields directly.

Properties

  • ✅ Can be NULL
  • 🚫 Cannot be used in an ORDER BY clause
  • 🚫 Cannot be used in a GROUP BY , DISTINCT, PARTITION BY
  • 🟡 LIMITED comparisons to other STRUCTs
  • CAST to: STRUCT
  • Coercion To*: N/A

A collection of points, lines, and polygons, which is represented as a point set or a subset of the surface of the Earth.

geography
POINT(1 1)
  • Based on the OGC Simple Features specification (SFS).
  • A GEOGRAPHY is the result of, or an argument to, a Geography Function.
  • To compare GEOGRAPHY values, use ST_Equals

Properties

  • ✅ Can be NULL
  • 🚫 Cannot be used in an ORDER BY clause
  • 🚫 Cannot be used in a GROUP BY , DISTINCT, PARTITION BY
  • 🟡 LIMITED comparisons to other GEOGRAPHYs

Coercion is an implicit type conversion, which BigQuery performs automatically between certain data types.

SELECT cast('12.44' as FLOAT64) float
Loading code...
Loading code...
SELECT CAST(123 AS STRING) AS string
SELECT cast('hi' as BYTES) bytes
Loading code...
Loading code...
Loading code...
Loading code...
Loading code...
SELECT CAST((1,'a') as STRUCT) struct_
SELECT CAST(ST_GEOGFROMTEXT('POINT(1 1)') as GEOGRAPHY) geography