Nulls in SQL and JavaScript
The semantics of null
values in SQL and JavaScript are quite different.
For example null
+42 in SQL is null
while null
+42 in JavaScript is 42.
This may or may not be a concern depending on how the application is written
or what it is trying to do.
SQL Frames provides scripted fields which are compiled and then evaluated
against each row. SQL Frames also provides the ability to generate
SQL based on the data transformation logic used to create the DataFrame.
If one were to freely move the computation of expressions between server and client
as desired end users would expect the results to be same. That is why the null
semantics matters.
When evaluting a script, it is possible to provide different null
semantics as outlined
below.
-
No check - the values are not explicitly checked and freely used in the computation and it matches the semantics of JavaScript.
-
All check - in this mode, if a script field uses, say, 3 fields, then all three field values are checked (for the given row) and if any of them is null, then that
null
value is returned. The downside of this is, if the user has a conditional logic that avoided accessing thenull
value the result will be something other than anull
. -
Access check - in this mode, a value is checked for
null
only checked when it is accessed and not at the beginning of evaluating the script. Upon access, the entire expression becomesnull
. This provides some of the semantics of SQL but not all. -
Usage based check - While the above avoids returning
null
when a value is not accessed, sometimes even if the value is accessed and determined to be null, user might do something else. For example, usingNVL()
orNULLIF
type of functions it is possible to detect for null and then use something else in place of them.NULLIF(x,42)
for example evalutes to 42 ifx
is null. With the *Access check mode, this will not work because as soon as the value of x is accessed, it evaluates tonull
without regards to the rest of the computation such as theNVL
andNULLIF
.
Usage based check assumes that the user is writing the script explicitly do the check
which actually is same as No check. Of course, the ideal case would be that the script context
provides information on when to make the expression evaluate to null
based on accessing a null
value
vs when to let the script to use the null
value and do something with it.
SQL Frames has experimental feature to be able to compile the scripts to any of the first 3 modes. It is porbably not possible to provide the 4th mode (*Usasge based) without a full interpreter (as opposed to letting the JavaScript engine to run the compiled script).
I don't think this issue of semantics is only specific to SQL Frames or JavaScript. I suspect anyone using Phyton Pandas and trying to scale it using underlying database technologies might also encounter results that don't match when evaluted in pure Python vs the underlying Databases.