SQL Resources
SQL
SUBSTR

SUBSTR

Definition

The SUBSTR function allows you to extract a section of a longer string in BigQuery.

Syntax

SUBSTR(value, position[, length])
SELECT
  location,
  SUBSTR(location, (-2), 2) AS state
FROM
  (
    SELECT
      'New York, NY' AS location
    UNION ALL
(    SELECT
      'Miami, FL' AS location)
  ) AS table_2
locationstate
New York, NYNY
Miami, FLFL
The query above gets the last 2 characters of the location string.

Practical Info

  • SUBSTR works with STRING or BYTES data types
  • The position argument is 1-based, so the first character is at position 1
  • You can use -1 to indicate the last character position
  • if length < 0 then the function returns an error
  • Common Questions

    How to dynamically extract sub-strings?

    What if we have a series of email addresses: [email protected], [email protected], etc. and we want to find the domain (e.g. gmail.com, yahoo.com), we can use SUBSTR in combination with STRPOS and LENGTH to dynamically extract everything after the @.

    SELECT
      email,
      SUBSTR(email, (STRPOS(email, '@') + 1), (LENGTH(email) - STRPOS(email, '@'))) AS domain
    FROM
      (
        SELECT
          '[email protected]' AS email
        UNION ALL
    (    SELECT
          '[email protected]' AS email)
      ) AS table_2
    emaildomain
    [email protected]gmail.com
    [email protected]yahoo.com

    Alternatively, you can use REGEXP_EXTRACT to achieve the same result:

    SELECT
      email,
      REGEXP_EXTRACT(email, r'@([a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)') AS domain
    FROM
      (
        SELECT
          '[email protected]' AS email
        UNION ALL
    (    SELECT
          '[email protected]' AS email)
      ) AS table_2
    emaildomain
    [email protected]gmail.com
    [email protected]yahoo.com

    Troubleshooting Common Errors

    It's rare SUBSTR returns an error unless you call it with the wrong data types (e.g. pass a STRING to the position parameter), but it's often that you don't get what you want.

    The best thing to do here is to either use REGEXP functions, or start get your query working with 1-2 example rows before trying to apply it to the entire column.

    Subscribe to newsletter

    Subscribe to receive the latest blog posts to your inbox every week.

    By subscribing you agree to our Privacy Policy.
    Thank you! Your submission has been received!
    Oops! Something went wrong while submitting the form.

    Start solving your organization's biggest problems with Count today.

    Stay up to date with all things count, data and problem solving.