The FROM clause allows joining multiple tables to get unified and enriched view of the data. SQL Frames supports the from clause using multiple related APIs to support the various types of joins.
When working with a single DataFrame, there is no need to combine the data with other DataFrames. In this case calling any of the SELECT clause APIs on a DataFrame automatically establishes the FROM context.
Sometimes data from multiple DataFrames need to be combined for a unified view. This is accomplished by joining the DataFrames.
Below are the list of join types and the corresponding APIs provided by SQL Frames.
|SQL Join Type
|LEFT OUTER JOIN
|RIGHT OUTER JOIN
|FULL OUTER JOIN
SQL.join() API can be used to specify the first DataFrame in a series of joins that can be chained together.
More than two DataFrames can be joined by chaining the above join APIs.
In the following example, the penguin measurements data provides the name of the island and a separate islands data contains details about the island such as the latitude and longitude. By combining these two it is possible for example to visualize the data on a map.
When both tables are joined on the same column name(s), the join can be simply stated with the USING clause.
While standard SQL expects the
using clause to be used only when the join columns on both sides are the same,
SQL Frames provides additional convinience of leveraging
using when the columns are not the same. This
is done as
using('col',['coll','colr']). Note how the second join condition uses different column names (namely
See the WHERE clause to learn more about the
fdf() API in the
The join condition can be specified using the ON clause when the join keys are not the same or the join criteria is more complex. Further, the join tables can be given alias names which is useful if the same table needs to be referred multiple times.
By default all the fields from all the joined tables are selected. It is possible to project different sets of fields.