UNNEST
Some times it is easy and even benefitial to store denormalized data as an array.
For example, a user
record may store all the roles
the user belongs to. In a fully
normalized data model this is stored as a separate user_role
table. But if this
relation only tracks the list of roles and no other information, it could very well be
denormalized into the user
relation. Some databases such as PostgreSQL and Google BigQuery
provide native support for ARRAY fields and also provide the UNNEST
operator to
deal with a certain class of queries that are otherwise not possible.
Array Fields
JSON
When using JSON, the Array fields can be created using the JavaScript Array notation.
CSV
When using CSV, the Array fields can be created by providing additional metadata.
Array Filtering Ops
Two new filter operations called containsAll
and containsAny
are introduced to be able to
filter data.
These operators are not currently supported in SQL generation
containsAll
containsAny
UNNEST
Single column
Multiple columns
It is possible to select multiple array fields. Rows are generated by placing one value from each array field. If an array is smaller than the other, null values are used to generate the additional rows.
Usually it may not make sense to unnest multiple fields that are unrelated. This is best useful if two array fields positionally store information that is related.
Mixed Columns
It is possible to mix both array and non-array fields while selecting
Data Transformations
The resulting unnested DataFrame can further be used in data transformation pipeline just like any other DataFrame.
Group By
Following provides which roles are most common
Following code provides users with maximum roles