Sometimes user has access to two copies of the data one of which may have been modified with new information and user is interested in seeing what has changed. SQL Frames provides a robust data comparision application where any two dataframes can be compared.
This functionality requires custom table cell renderers which make use of the library Eta. Although Eta is optional, this functionality will make it a required dependency.
The diff API
DataFrame.diff provides the necessary API to visualize the difference between two DataFrames.
The API takes two DataFrames and a list of one or more fields that uniquely identify the rows in the DataFrames.
The two DataFrames are referred as dataset A and dataset B.
This approach of doing comparision based on the unique identifiers is advanced, productive and performant than a simple Excel data comparision based on row and column position.
It is possible to augment the Diff dataframe with a set of actions in the menubar
and contextual actions in the field headers using
The Diff menu allows viewing the following subsets of the data
- All - rows present in either A or B
- Both - rows present in both A and B
- Changed - rows present in both A and B and also there is some difference
- Not in both - rows present in only A or only B
- Only in A - rows present in A but not in B
- Only in B - rows present in B but not in A
ui.diffActions() API also adds column header level context menu to provide different
sort options. The possible options are
Sort by value
- Using value from A and then B if A value is null
- Using value from B and then A if B value is null
Sort by absolute difference (for number fields)
Sort by absolute percent change (for number fields)
The best thing with SQL Frames is that the integrated UI allows viewing data in different ways and features such as this data comparision can work on other types visual representation.
All the options to look at different subsets of the compared data and sort options are still available in the hierarchical view.
Slice & Compare
It is possible to slice and compare specific data. Current slicer API works on a single DataFrame.
This poses a challenge. Also, both datasets being compared may have different list of values for the same field
posing another change. The trick to overcome both these challenges is to use a
udf, slice the
and then filter and extract out both the datasets after slicing and then using them to do the diff.
An example is shown below with much smaller dataset to understand this advanced usage of SQL Frames low-code API.
withRowSetId() API is needed only to generate correct SQL. This may not be necessary in the future.
This call can be omitted if SQL generation is not required without impacting the actual data computation.
Filtering Diff DataFrame
Diff dataframe is a special dataframe that can only support filtering using the set operations provided
by Diff actions. Ad-hoc filtering similar to a FilteredDataFrame is not possible. Data context menu actions of
not equals are supported but no other filters are supported.
The context menu actions infact allow filtering for rows with the same difference.
This functionality assumes the data to be present in both DataFrames as follows
- The unique id(s) are used to match the rows from both the DataFrames. Change to the unique id(s) results in two separate rows one in each DataFrame without a coresponding matching row from the other.
- Comparision is performed only among fields with the same name.
- Order of the fields doesn't matter.
- Each DataFrame can have fields that are only present in it and they are not used for comparision.
- Percent with base value of 0 is undefined. It is considered as
nulland the standard SQL
nulllogic is applied fort sorting.
- Difference and Percent change are availale only when the entire rows of the respective DataFrames have the number values (or nulls). Mixed data doesn't recognize the fields as number fields and hence these sort metrics are not available.
If the same physical column has different names in both the DataFrames, it is still possible to compare them by first projecting with the same name.