Aggregate Functions
All aggregate functions are grouped together into SQL.agg
. Some functions have multiple API names.
The name of the field is automatically derived by default though it can be customized with an alias.
Alternate API may result in a different default name.
SQL Frames currently provides the following aggregate functions.
SQL | Aggregate Function | Comments |
---|---|---|
ANY_VALUE | anyValue | return any value (non-deterministic) |
COUNT | count | |
SUM | sum | |
AVG | avg mean | |
MIN | min | |
MAX | max | |
MEDIAN | median | |
MODE | mode | returns an array as there can be multiple modes |
PERCENTILE | percentile | |
VARIANCE | variance | |
VAR_SAMP | varSamp varianceSample | |
VAR_POP | varPop variancePopulation | |
STDDEV | stddev | |
STDDEV_SAMP | stddevSamp | |
STDDEV_POP | stddevPop | |
COVAR_POP | covarPop | |
COVAR_SAMP | covarSamp | |
CORR | corr | |
REGR_* | regr regrCount regrSlope regrIntercept | regr returns an object with all the individual components of the linear regression analysis |
LISTAGG | listagg | string concatenation |
COLLECT | collect | list concatenation |
prod | product of values like in Excel |
Distinct Aggregates
Some aggregates operate on distinct values within the aggregating set.
SQL | API |
---|---|
COUNT DISTINCT | countd |
SUM DISTINCT | sumd |
AVG DISTINCT | avgd |
LISTAGG DISTINCT | listd |
COLLECT DISTINCT | collectd |
Ordered Aggregates
Following aggregates compute relative values of ordered data sets.
SQL | API |
---|---|
RANK | rank |
DENSE_RANK | denseRank |
CUME_DIST | cumeDist |
PERCENT_RANK | percentRank |
FIRST & LAST Aggregates
Some aggregates have FIRST and LAST qualifiers. That is, data is first ordered based on a sort criteria and then the FIRST or LAST ranking rows of this ordered set are used to perform the aggregation.
The following functions have corresponding FIRST/LAST implementations.
SQL | API |
---|---|
count | countf, countl |
SUM | sumf, suml |
AVG | avgf, avgl |
MIN | minf, minl |
MAX | maxf, maxl |
VARIANCE | --TODO-- |
STDDEV | --TODO-- |
GROUPING_ID
There may be a need to have different logic for base partition vs rollup partitions.
It is possible to accomplish such requirement using the SQL.groupingId()
API as shown below.
The requirement is to us the Unit Price as per the Item Type but for higher levels compute it using a formula such as Total Revenue/Total Quantity.