Top N and Others Sets
While Top N Sets provide quick access to the top N items, there are many reasons to view the reports and charts with top N along with all the other data lumped into a single Other bucket.
Top N and Others
The simplest way to get the other slice is by calling the
on an existing data set.
It is possible to find the top N members of each group specified by one or more dimensions. In the following example, the top 5 countries within each region along with the Other (as opposed to top 5 countries globally as shown above) is displayed.
withOthers() API is much more flexible and allows creating complex topN and other
reports. The API is similar to the
Hence, it is possible to provide a
groupBy (see the Top N trends below for how to use it)
and a set of aggregates including calculated fields can be provided. All of these are
computed for the set and the other slice.
It is possible to use one metric for top N and a different metric for the final set & other data set. For example, top N can be based on the average revenue while the display focuses on the total revenue.
Top N and Others Charts
The Data Set interface provides a getter called
ds which returns a DataFrame. Hence
all the features of a DataFrame such as the ability to create a chart are readily available.
With the ability to specify multiple metrics (as shown above) it is possible to create a chart with cumulative percentage of the contribution of the top N.
If the distribution of the data is relatively uniform and not top-heavy then the bar chart might end up showing a very large Other bar and may not be ideal. One option is to increase the number of N for the Top N.
By making use of the dynamic grid charts, it is possible to display the Nested Top N and Other charts.
Top N and Others trends
ds.withOthers API takes an optional parameter which provides a
groupBy. The groupBy should include all the dimensions used for the set
and optionally include additional dimensions to get a granular level of detail.
This allows, for example, to plot the trend by adding the time dimension.
overall top N
Note how data is explicitly ordered by Year and Item Type to ensure correct rendering of the trend lines.
sliced top N
It is not required that the top N data set should be based on the same DataFrame as the final set & other data set. For example, it is possible to first get the top 3 item categories for the current year and use them to draw the trend for the last 4 years. Below example infact uses the DataFrame Slicer to allow picking top N based on arbitrary criteria and then show the trend of those categories!
df.sets.setNother takes a Data Set such as Top N set and creates a set and other
Top N and Others Pivot Tables
overall top N
It is possible to use advanced group by clauses like
with top N API to get sub totals and grand totals.
nested top N
It is possible to get the top 3 Item Types within each Region and pivot. In this case the Item Type dimension most likely has more than 3 Item Type values because each Region may have its own set of top 3 Item Types.
When there is a blank cell, it is not possible to know whether that is because there is no underlying data for that Item Type and Region combination or if that Item Type is not among the top N within that Region.
Top M and Others within Top N and Others
Let's say the requirement is to show the top 5 Regions and for each the top 3 Item Types based on Revenue. There are two types of Others buckets
- one within each Region for all Item Types not in the top 3 for that Region
- one for all Regions not in the top 5.
At first this may be a bit daunting but it is possible to make use of the
sets.setNother APIs to accomplish such a requirement. Along the way, we will be making
- non-aggregate topN (we aggregate but forget that it is an aggregate and treat it like a non-aggegate)
- double-aggregation (aggregating and aggregating the aggregates)
This report provides top 3 Item Types of top 5 Regions.