Nulls in SQL and JavaScript

September 26, 2022 · 3 min read

Siva Dirisala

Creator of SQL Frames

The semantics of null values in SQL and JavaScript are quite different. For example null+42 in SQL is null while null+42 in JavaScript is 42. This may or may not be a concern depending on how the application is written or what it is trying to do.

SQL Frames provides scripted fields which are compiled and then evaluated against each row. SQL Frames also provides the ability to generate SQL based on the data transformation logic used to create the DataFrame. If one were to freely move the computation of expressions between server and client as desired end users would expect the results to be same. That is why the null semantics matters.

When evaluting a script, it is possible to provide different null semantics as outlined below.

No check - the values are not explicitly checked and freely used in the computation and it matches the semantics of JavaScript.
All check - in this mode, if a script field uses, say, 3 fields, then all three field values are checked (for the given row) and if any of them is null, then that null value is returned. The downside of this is, if the user has a conditional logic that avoided accessing the null value the result will be something other than a null.
Access check - in this mode, a value is checked for null only checked when it is accessed and not at the beginning of evaluating the script. Upon access, the entire expression becomes null. This provides some of the semantics of SQL but not all.
Usage based check - While the above avoids returning null when a value is not accessed, sometimes even if the value is accessed and determined to be null, user might do something else. For example, using NVL() or NULLIF type of functions it is possible to detect for null and then use something else in place of them. NULLIF(x,42) for example evalutes to 42 if x is null. With the *Access check mode, this will not work because as soon as the value of x is accessed, it evaluates to null without regards to the rest of the computation such as the NVL and NULLIF.

Usage based check assumes that the user is writing the script explicitly do the check which actually is same as No check. Of course, the ideal case would be that the script context provides information on when to make the expression evaluate to null based on accessing a null value vs when to let the script to use the null value and do something with it.

SQL Frames has experimental feature to be able to compile the scripts to any of the first 3 modes. It is porbably not possible to provide the 4th mode (*Usasge based) without a full interpreter (as opposed to letting the JavaScript engine to run the compiled script).

I don't think this issue of semantics is only specific to SQL Frames or JavaScript. I suspect anyone using Phyton Pandas and trying to scale it using underlying database technologies might also encounter results that don't match when evaluted in pure Python vs the underlying Databases.