Aliases
When projecting columns, it is possible to provide expressions that can be aliased.
For example 100*ratio AS percentage
.
Nested Aliases
In standard SQL it is not possible for one alias to refer to another within the same
select context. For example, the following SQL is not valid as y
can't be used to define z
.
select x, x+1 as y, 2*y as z
from table1
Hence, expressions either have to be duplicated or need to resort to nested queries so the common expressions are in the sub-query and they are referred in the main query.
Authoring complex calculations in SQL should be as simple as doing the same in a spreadsheet. That is why SQL Frames allows referencing the aliases within the same DataFrame as long as there is no circular dependency (which is caught by the engine). This convenience feature makes it easy to author complex calculations and this is transpiled to SQL by flattening the expressions so there is no referencing of aliases.
This feature is also referred to as lateral alias reference by some databases such as Amazon Redshift.
Parent context has precedence
Note that if a field is redefined from parent context, other references to the alias still refer to the field from the parent context. Below, the value of y is not 10*x+1 (as x is redefined to be 10*x) but simply x+1 which is consistent with standard SQL.
Aggregate and non-aggregate contexts
However, there is one difference to the above rule of not being able to access a redefined alias in the same context. First let's see a failing SQL.
SELECT a,SUM(x) AS x,MIN(x) AS minx,2*x AS x2
FROM (SELECT 10*x as x,a
FROM (SELECT * FROM (VALUES (42, 'n')
,(1, 'd')
,(2, 'd')
,(2, 'd')) AS table1(x, a)) table1) table1
GROUP BY a
ORDER BY a ASC;
It is not possible to define x2
as 2*x
in the standard SQL as x
still refers to the
non-aggregate parent context.
In the below code, x
is redefined with an aggregate definition. Since the context switches to
an aggregate context with a GROUP BY, only aggregates refer to the parent context while other
calculations refer to the redefined aggregate.
Analytic Functions
Analytic Functions are evaluated after computing the aggregates. Hence, these expressions have access to the aggregate context and not the non-aggregate parent context.
In the following code, sum(sum()) over(...)
is achieved by referring to the inner sum
via
the sumx
aggregate field.
At present it is not possible to directly create an analytic function on top of an aggregate without selecting it. If only analytic field is desired in the final output, the aggregate can be defined to support the analytic aggregate and can then be projected out.
The local variables experimental feature may be used to work around this.