SQL Set Operations
UNION
and UNION ALL
are the two most common set operations used in SQL. However, it is
also possible to perform other types of set operations in SQL notably INTERSECT
and EXCEPT
each of which also come with their ALL
version suitable for multi-sets.
INTERSECT
INTERSECT
operates on the unique rows of each DataFrame.
INTERSECT ALL
INTERSECT ALL
operates on the multi-sets and not their unique rows. That is,
if set A has 3 rows of same values and set B has 2 rows of the same values, then
the resulting set contains two rows (while INTERSECT
will only produce one row).
In addition, when multiple DataFrames are intersected, the final set will contain the
least number of repeated rows among all the sets for each unique row.
EXCEPT (MINUS)
EXCEPT
operates on the unique rows of each DataFrame.
EXCEPT ALL
EXCEPT ALL
operates on the multi-sets and not their unique rows. That is,
if set A has 3 rows of same values and set B has 2 rows of the same values, then
the final set contains one row (while EXCEPT
will not output this row). In addition,
when multiple DataFrames are provided to EXCEPT
, the result is similar to applying
the operator on the first two DataFrames, followed by the result with the third DataFrame
and so on.