SQL Set Operations
UNION and UNION ALL are the two most common set operations used in SQL. However, it is
also possible to perform other types of set operations in SQL notably INTERSECT and EXCEPT
each of which also come with their ALL version suitable for multi-sets.
INTERSECT
INTERSECT operates on the unique rows of each DataFrame.
INTERSECT ALL
INTERSECT ALL operates on the multi-sets and not their unique rows. That is,
if set A has 3 rows of same values and set B has 2 rows of the same values, then
the resulting set contains two rows (while INTERSECT will only produce one row).
In addition, when multiple DataFrames are intersected, the final set will contain the
least number of repeated rows among all the sets for each unique row.
EXCEPT (MINUS)
EXCEPT operates on the unique rows of each DataFrame.
EXCEPT ALL
EXCEPT ALL operates on the multi-sets and not their unique rows. That is,
if set A has 3 rows of same values and set B has 2 rows of the same values, then
the final set contains one row (while EXCEPT will not output this row). In addition,
when multiple DataFrames are provided to EXCEPT, the result is similar to applying
the operator on the first two DataFrames, followed by the result with the third DataFrame
and so on.