Rollup
SQL Frames allows creating multi-dimensional data frames along with rollups along one or more dimensions. The advanced GROUP BY clauses ROLLUP and CUBE can be used to specify additional levels of details needed for the data cube. All levels of detail are pre-computed and stored.
Note that the data cube itself doesn't display the data. It needs to be diced by picking the rollups, necessary fields and optionally applying filters. Refer to Dicing for how to dice a data cube.
The GROUP BY clause allows ROLLUP and CUBE. These clauses can be repeated and/or combined. For example two rollups one for geography hierarchy (region, country) and another for time hierarchy (year, quarter, month) and a cube of other dimensions (item type, sales channel and order priority) can be used altogether a total of 8 dimensions. Such a multidimensional cube is created as
GROUP BY ROLLUP(region,country),
ROLLUP(year,quarter,month),
CUBE(item type,sales channel,order priority)
The above GROUP BY results in a total of 3*3*8 (72) levels of detail (LOD).
Example
Below is an example of creating a multi-dimensional data cube with rollups.
Multi-dimensional cubes can be very expensive if there are too many dimensions or there are several high cardinality dimensions. In such cases, it is better to build separate cubes and present smaller sets of data rather than all at once.
Levels of detail
It is possible to get information about the levels of details (LODs) created by a DataCube or a Grouped DataFrame
using the df.lods
API. For example, the above advanced group by clause resulted in 96 LODs.