Analytic Functions
Analytic functions, also referred to as Window functions, provide the ability to compute values for each row that are dependent on other rows in the result set. This is different from aggregate functions which result in merging multiple rows into one and compute an aggregation on all the merged rows.
Say you want to view each sales order but also see the average revenue generated per order by Country
.
One way to do this is using a separate GROUP BY and then joining (as given in the WITH example). Another
approach is to use Analytic functions which are processed more efficiently.
OVER clause
Analytic functions are based on the OVER clause. This clause contains three sub-clauses
- PARTITION BY - to partition the result set into different partitions, say by
Country
, or (Sales Channel
,Order Priority
). - ORDER BY - each partition can be further ordered to pick deterministic set of rows relative to the current row.
- WINDOW - it is possible to further subset the ordered partition by specifying a window of rows. The window of rows can be physical rows, ranges (logical rows based on values) or groups (peer rows with same rank by specified ordering). Windowing allows computing moving averages.
The purpose of these three sub-clauses are explained below.
Full sum
Partitioning allows computing full aggregate values useful for comparison.
Cumulative sum
Ordering allows computing cumulative values. Combined with Partitioning, the below example shows Quarter To Date revenue as each new order comes in.
Moving sum (average)
By specifying a physical window of size 3, the below query computes the average revenue of the 3 last transactions including the current transaction.
All analytic functions are also grouped under SQL.agg
.
Within Partitions
When using a group by that generates multiple partitions (such as ROLLUP
and CUBE
)
SQL standard assumes that the analytic functions are computed across the partitions. However
this results in some wierd results such as the ratio to report being incorrect. Hence,
SQL Frames has an extra option to fine tune this without writing complex logic. By
default most analytic functions are calculated within the group by partitions. Any exceptions
are indicated in the tables below. It is possible to alter this default behavior by using
the API analyticField.partitioned(false)
.
Aggregate Analytic Functions
Following are the aggregate analytic functions currently provided by SQL Frames.
SQL | API | Comments |
---|---|---|
COUNT | counto | |
SUM | sumo | |
AVG | avgo | |
MIN | mino | |
MAX | maxo | |
MEDIAN | mediano | |
VARIANCE | varianceo | |
VAR_SAMP | varSampo varianceSampleo | |
VAR_POP | varPopo variancePopulationo | |
STDDEV | stddevo | |
STDDEV_SAMP | stddevSampo | |
STDDEV_POP | stddevPopo | |
COVAR_POP | covarPopo | |
COVAR_SAMP | covarSampo | |
CORR | corro | |
REGR_* | regro | |
LISTAGG | listaggo | string concatenation |
COLLECT | collecto | list concatenation |
RATIO_TO_REPORT | ratioToReport | |
prodo | analytic version of the prod aggregate function |
Non-Aggregate Analytic Functions
Following are the non-aggregate analytic functions currently provided by SQL Frames.
SQL | API | Comments |
---|---|---|
LAG | lag | |
LEAD | lead | |
FIRST_VALUE | firstValue | * |
LAST_VALUE | lastValue | * |
NTH_VALUE | nthValue | * |
RANK | ranko | |
DENSE_RANK | denseRanko | |
ROW_NUMBER | rowNumber | |
NTILE | ntile | |
CUME_DIST | cumeDisto | |
PERCENT_RANK | percentRanko | |
PERCENTILE_CONT | percentileConto | |
PERCENTILE_DISC | percentileDisco |
* computed across partitions by default