Dicing
As per OLAP Cube dicing is an operation that results in creating a subcube of a data cube by picking specific values for multiple dimensions.
In SQL Frames the advanced Group By clauses are used to create a multi-dimensional data cube along with the necessary rollups. Hence, dicing here is described as picking the necessary levels of details using a Group By clause that is a subset of the original Group By used for the data cube. Desired fields can be projected while dicing the data cube and HAVING and/or WHERE clauses can then be applied using the corresponding SQL Frames APIs.
Below are two examples of dicing the data cube of 72 LOD given in the introduction.
The following selects 1*4*3 (12) levels of detail.
GROUP BY country,
CUBE(item type,sales channel),
ROLLUP(year,quarter)
and the one below selects 3*3 (9) levels of detail.
GROUP BY ROLLUP(quarter,month),
ROLLUP(region,country)
First we create a cube using the df.cdf()
API. Note that the Data Cube itself doesn't display any data
and doesn't generate the SQL.
There are two APIs for dicing a Data Cube. cube.gdf()
which is similar to the df.gdf()
API.
The other is cube.dice()
. While SQL Frames embedded engine computes them exactly the same,
the SQL generation differs depending on which API is used. cube.dice()
works even with Analytic
Functions based fields in the Data Cube while cube.gdf()
works only with aggregate fields.
Using cdf.gdf()
The Data Cube can be diced using the same df.gdf()
API used to create a Grouped DataFrame.
When this API is called on a Data Cube, it dices the cube rather than doing any further aggregation.
When only aggregate fields are present in the Data Cube, the SQL is generated efficiently based on what is being diced rather than the entire cube. Analytic fields on the other hand can make it hard to optimize (see below).
Selecting fields while Dicing
It is possible to pick the fields from the cube while dicing.
HAVING clause while Dicing
It is possible to apply a separate HAVING clause while dicing a Data Cube.
Using cdf.dice()
As per Standard SQL, the analytic function fields are evaluated after the HAVING clause. Hence, the Data Cube only evaluates aggregate values and calculated fields based on them while the analytic functions and calculated fields based on them are evaluated at the time of dicing.
Since analytic fields contain partitioning and ordering dimensions that may not exist in
the dicing group by, the generated SQL defines the entire data cube and filters the partitions
needed by the dicing group by. As a result, cube.dice()
generates SQL that builds
the entire data cube and then filters the necessary partitions while cube.gdf()
tries to
directly define a subcube which would only work when there are no analytic functions.
The SQL generating for cube.dice()
is not optimized at this time though that might change in the future.
However, databases might still be optimally evaluating the SQL. Also, the SQL defines the entire data cube
whether or not there are analytic functions.