The UNION clause allows concatenating the rows of two or more tables into a single table. It is required that the names and order of the fields should be exactly same in all the merged tables.
With UNION ALL duplicate rows are copied over to the resulting DataFrame as separate rows.
SQL Frames supports the UNION ALL operation using
df.unionall(...dfs) API which takes one or
more DataFrames and produces a concatenated DataFrame.
In the below example, we first create temporary DataFrames by filtering the data and then concatenate them (this is just an example, as the same can be done with a more advanced filtering).
UNION without the ALL modifier ensures there are no duplicates. However, this is more expensive to
compute. SQL Frames supports UNION with the
df.union(...dfs) API allowing two more more
DataFrames to be concatenated.
Row Set ID
The UNION ALL DataFrame has a special field called Row Set ID that tracks the underlying data set to which the row belongs to. Coupled with Sort Prefix this field allows implicitly maintaining the order of the sets of DataFrames that are being combined.
Notice how sorting on different columns sorts them within the respective sets of rows identified via the Row Set ID.