Star Schema Join Optimization
One of the common use cases is to analyze certain transactions and these transactions have several FK relations to various dimensions such as time, location, department, project and so on. SQL Frames allows creating very complex joins and there is a generic join engine that can resolve all types of joins. However, star schema joins is a special case and it is possible to compute it more efficiently.
SQL Frames allows specifying a join is such a special case of star schema join by
proving a simple hint jdf.starFact
and indicating which of the DataFrames in the join
acts as the fact table. Note that this optimization has higher impact with larger fact sizes.
Certain assumptions are made when this hint is provided.
- All the join columns to the dimension DataFrames form unique keys.
- The fact DataFrame is the driving DataFrame and for each record in it the final output can have at most one output record.
Even though the hint is provided it may not be used based on the overall join logic because of the above assumptions. For example, if there is non-inner join with one of the dimension DataFrames on the outer-side of the join, then this optimization will not be utilized. However, the class of joins that are not optimized are not common.