Skip to main content

Analytic Functions

With ROLLUP and CUBE

Analytic functions such as REPORT_TO_RATIO makes it easy to compare an aggregate against higher level aggregate (grand total or some other level of detail). However, when the GROUP BY generates different levels of details by using advanced GROUP BY clauses such as ROLLUP or CUBE things become complicated. The RATIO_TO_REPORT seems to be incorrect as it doesn't match up to 100 percent. It may seem like it's double counted and so the percent reduces to 50 percent but that's not the case either. It could end up being 25% or something else. It all depends on how complex your GROUP BY clause is. The reason for this complexity is how the analytic functions are evaluated. They are evaluated after creating the rollups but by default these rollup partitions are not treated as separate levels and instead all levels are treated as a single set of rows. That is why when there is a single level of detail everything works as expected but when there are more than one level of detail, the percentage sums to 100/number-of-levels-of-detail.

The way to fix the above behavior is to make use of GROUPING_ID (or GROUP_ID in some databases) which allows to identify which level of detail an aggregate row belongs to. By explicitly partitioning by the GROUPING_ID, it is possible to make analytic functions to operate within each level of detail rather than the entire result set. SQL Frames has taken the approach of computing the analytic functions within the partitions by default (with few exceptions, see within partitions in analytic functions) and also generate SQL appropriately by automatically augmenting the GROUPING_ID when there are multiple levels of detail.

Analytic Functions & Partitions

It is still possible to achieve the standard SQL functionality by simply creating a filtered DataFrame (with no where clause) on top of the Grouped DataFrame and then adding the analytic function to it. Alternatively, it is possible to explicitly ask the computation to be performed across partitions using the partitioned(false) API. All these approaches are shown below.

Loading...

Nested Analytic Functions

In standard SQL it is not possible to nest analytic functions. That is, the following SQL is not valid.

SELECT a,b, max(rank() over(PARTITION BY a)) over ()
FROM table1
GROUP BY a,b

However, SQL Frames supports this just like referring to other aliases in the same context and an appropriate nested query is generated. This lets an analyst focus on solving the problem than worry about the syntax.

Loading...