Skip to main content

UNNEST

Some times it is easy and even benefitial to store denormalized data as an array. For example, a user record may store all the roles the user belongs to. In a fully normalized data model this is stored as a separate user_role table. But if this relation only tracks the list of roles and no other information, it could very well be denormalized into the user relation. Some databases such as PostgreSQL and Google BigQuery provide native support for ARRAY fields and also provide the UNNEST operator to deal with a certain class of queries that are otherwise not possible.

Array Fields

JSON

When using JSON, the Array fields can be created using the JavaScript Array notation.

Loading...

CSV

When using CSV, the Array fields can be created by providing additional metadata.

Loading...

Array Filtering Ops

Two new filter operations called containsAll and containsAny are introduced to be able to filter data.

SQL

These operators are not currently supported in SQL generation

containsAll

Loading...

containsAny

Loading...

UNNEST

Single column

Loading...

Multiple columns

It is possible to select multiple array fields. Rows are generated by placing one value from each array field. If an array is smaller than the other, null values are used to generate the additional rows.

Loading...
Matching Data

Usually it may not make sense to unnest multiple fields that are unrelated. This is best useful if two array fields positionally store information that is related.

Mixed Columns

It is possible to mix both array and non-array fields while selecting

Loading...

Data Transformations

The resulting unnested DataFrame can further be used in data transformation pipeline just like any other DataFrame.

Group By

Following provides which roles are most common

Loading...

Following code provides users with maximum roles

Loading...