Top N and Others Sets
While Top N Sets provide quick access to the top N items, there are many reasons to view the reports and charts with top N along with all the other data lumped into a single Other bucket.
Top N and Others
single dimension
The simplest way to get the other slice is by calling the withOthers()
API
on an existing data set.
multiple dimensions
nested
It is possible to find the top N members of each group specified by one or more dimensions. In the following example, the top 5 countries within each region along with the Other (as opposed to top 5 countries globally as shown above) is displayed.
multiple metrics
The withOthers()
API is much more flexible and allows creating complex topN and other
reports. The API is similar to the gdf()
API.
Hence, it is possible to provide a groupBy
(see the Top N trends below for how to use it)
and a set of aggregates including calculated fields can be provided. All of these are
computed for the set and the other slice.
It is possible to use one metric for top N and a different metric for the final set & other data set. For example, top N can be based on the average revenue while the display focuses on the total revenue.
Top N and Others Charts
pie chart
The Data Set interface provides a getter called ds
which returns a DataFrame. Hence
all the features of a DataFrame such as the ability to create a chart are readily available.
bar chart
With the ability to specify multiple metrics (as shown above) it is possible to create a chart with cumulative percentage of the contribution of the top N.
If the distribution of the data is relatively uniform and not top-heavy then the bar chart might end up showing a very large Other bar and may not be ideal. One option is to increase the number of N for the Top N.
nested chart
By making use of the dynamic grid charts, it is possible to display the Nested Top N and Other charts.
Top N and Others trends
The ds.withOthers
API takes an optional parameter which provides a
groupBy
. The groupBy should include all the dimensions used for the set
and optionally include additional dimensions to get a granular level of detail.
This allows, for example, to plot the trend by adding the time dimension.
overall top N
Note how data is explicitly ordered by Year and Item Type to ensure correct rendering of the trend lines.
sliced top N
It is not required that the top N data set should be based on the same DataFrame as the final set & other data set. For example, it is possible to first get the top 3 item categories for the current year and use them to draw the trend for the last 4 years. Below example infact uses the DataFrame Slicer to allow picking top N based on arbitrary criteria and then show the trend of those categories!
The API df.sets.setNother
takes a Data Set such as Top N set and creates a set and other
data set.
Top N and Others Pivot Tables
overall top N
It is possible to use advanced group by clauses like CUBE
and ROLLUP
along
with top N API to get sub totals and grand totals.
nested top N
It is possible to get the top 3 Item Types within each Region and pivot. In this case the Item Type dimension most likely has more than 3 Item Type values because each Region may have its own set of top 3 Item Types.
When there is a blank cell, it is not possible to know whether that is because there is no underlying data for that Item Type and Region combination or if that Item Type is not among the top N within that Region.
Top M and Others within Top N and Others
Let's say the requirement is to show the top 5 Regions and for each the top 3 Item Types based on Revenue. There are two types of Others buckets
- one within each Region for all Item Types not in the top 3 for that Region
- one for all Regions not in the top 5.
At first this may be a bit daunting but it is possible to make use of the sets.topN
, ds.withOthers
and sets.setNother
APIs to accomplish such a requirement. Along the way, we will be making
use of
- non-aggregate topN (we aggregate but forget that it is an aggregate and treat it like a non-aggegate)
- double-aggregation (aggregating and aggregating the aggregates)
This report provides top 3 Item Types of top 5 Regions.