Skip to main content

2 posts tagged with "tip"

View All Tags

· 5 min read

Finding items that are frequently shopped together is an interesting piece of information. For brick-and-mortar companies this can help to decide which items to offer promotions to attract more customers without comprimising the overall profitability. For online stores, it can help with being able to appropriately cross-sell items.

Frequent itemsets is a well studied data mining technique. Finding frequent pairs is a more specific but important variation since higher cardinality sets can be easily identified using lower cardinality sets. Depending on the size of data, finding the frequent pairs may have to be run on a cluster of computers to just as a single database query! When writing it as a SQL query, care must be taken to avoid performance issues. Below we investigate three different SQL queries all providing the same results.

· 5 min read

It is common to pre-aggregate data to improve the performance of generating the reports. While pre-aggregation is in general a good technique, grouping of data by various dimensions and their permutations and time dimension and its various buckets including sliding windows can lead to explosion of pre-aggregated data. Hence the requirements should be carefully evaluated and in some cases it may be better solved using a technique called double aggregation, where data is pre-aggregated to the lowest level first and is then aggregated further as needed on-demand.