Skip to main content

Recursive Hierarchy Flattening

Recursive hierarchies are very common in the enterprise applications. Employee-Manager relationship is the most common recursive hierarchy everyone is familiar with. In addition, Bills of Material for Supply Chain Manufacturing and Territories for Sales are a few other common recursive hierarchies.

Recursive hierarchies pose a challenge for analytics because there are no simple SQL constructs to create hierarchical aggregate rollups. The Oracle CONNECT BY/STARTS WITH clause and the more generic WITH RECURSIVE CTE (common table expression) can process data in a top-down or bottom-up manner. However, to process an aggregate query, the processing of the data needs to identify the top node which needs to be analyzed and its child nodes need to be identified one level at a time from top-to-bottom. Aggregation on the other hand has to happen bottom-up. That is why, most solutions to analyze data using CTE end up at least two recursive CTEs creating a lot of confusion.

Hierarchy Flattening

A common technique to analyze recursive data is hierarchy flattening. That is, the hierarchy is assumed to be at most of a certain depth and those many columns are explicitly created and the entire hierarchy path is denormalized into these columns. Once this is done, then many hierarchy questions can be answered using the rest of the familiar SQL constructs.

There are no direct SQL constructs to help with the hierarchy flattening. However, SQL Frames provides an API to flatten and denormalize the hierarchy.

Let's take a look at the example below using the df.wrangle.flatten() API.

Hierarchy Flattening Requirements

Hierarchy Flattening is a special operation that adds dynamic number of columns to the DataFrame. Hence, there are some limitations

  • Should not be done on DataFrames that only project columns.
  • The parent nodes should be uniquely identifiable.

Hierarchy Aggregation

Now that the table is flattened, let's see how this can then be used for aggregation.


Flattened to Hierarchy

Recursive data can be visualized hierarchically and SQL Frames is smart enough to construct such recursive hierarchies from data transformed using WITH RECURSIVE CTE. But flattened recursive hierarchies can also be visualized hierarchically using the regular fields grouping based hierarchies created using the hdf() API.


Even though non-aggregate hiearchical DataFrames store all the data rows at the leaf level, flattened hierarchies do store them in non-leaf node level as appropriate. If the flattened hierarchy has N flattened columns but the hierarchy is based on M fields (M < N) then only the first M-1 recursive rows are stored in the non-leaf nodes.