SAAS App Analytics
These days most enterprise applications are delivered on the cloud as SAAS applications. There are also many low-code application development platforms that make it easy to create custom workflows and user experiences. However, even the big players in this space such as Salesforce, seldom provide powerful analytics. That is, we are not talking about simple reports but more advanced reports that actually can provide deeper insights. Wondering why? I will explain it with a few examples.
Persistence API
Most well designed SAAS platforms come with persistence layer as part of their
platform. This can be in the form of an API (like ServiceNow GlideRecord and GlideAggregate)
or a full fledged custom language (like Salesforce SOQL). These platform API do
a fantastic job of taking care of the basic OLTP data management requirements and
also simple reporting requirements. They also make it easy to write complex joins
with a low-code approach such as dot-walking where the end user (typically a developer)
is provided with a logical object that can reference to other objects and the fields of
these referenced objects can be included in the query by simply referring to them as a.b.c
(hence the name 'dot' walking or simplying 'dot' notation).
Useful insights, complex queries
Tableau published an article in 2015 (pre-SalesForce acquisition) called Top 15 LOD Expressions which presents 15 common questions their customers frequently try to answer. By carefully analyzing each of the 15 reports, it is clear that most of them are not possible to be accomplished directly with any of the existing SAAS platforms directly based on the API they provide.
Let's take an example. The very first report in the above LOD expressions article is Customer order frequency. Given an order transaction table, one wants to find out how many customers have placed 1 order, 2 orders, ... N orders. There are at least two ways to write a SQL query for this which are presented below.
Using Window Functions
Here the window function (also called analytic function) is used to get the count of orders by each customer and then the table is grouped on that window function field.
-- auto generated by SQL Frames for PostgreSQL
SELECT "Num Orders"
,COUNT(DISTINCT "Customer ID") AS "Customers"
FROM ( SELECT "Orders".*
,((DENSE_RANK() OVER(PARTITION BY "Customer ID" ORDER BY "Order ID" ASC))+(DENSE_RANK() OVER(PARTITION BY "Customer ID" ORDER BY "Order ID" DESC))-1) AS "Num Orders"
FROM "Orders") t0
GROUP BY "Num Orders"
ORDER BY "Num Orders" ASC;
Using Double Group By
Another approach, also more performant, is to use group by on top of another group by.
-- auto generated by SQL Frames for PostgreSQL
SELECT "Num Orders"
,COUNT(DISTINCT "Customer ID") AS "Customers"
FROM ( SELECT "Customer ID"
,COUNT(DISTINCT "Order ID") AS "Num Orders"
FROM "Orders"
GROUP BY "Customer ID") t0
GROUP BY "Num Orders"
ORDER BY "Num Orders" ASC;
Either of the above queries are not straight-forward.
Note that the above example is one of the easy cases of the above article on the LOD expressions. The 15th example when expressed in an efficient SQL resulted in 3 levels of grouping as shown below.
-- auto generated by SQL Frames for PostgreSQL
SELECT EXTRACT(YEAR FROM "OD Year") AS "OD Year"
,EXTRACT(YEAR FROM "FOD Year") AS "FOD Year"
,"Orders"
,"Cohort Customers"
,COUNT('*') AS "Loyal Customers"
,SUM(COUNT('*')) OVER(PARTITION BY EXTRACT(YEAR FROM "OD Year"), EXTRACT(YEAR FROM "FOD Year") ORDER BY "Orders" DESC) AS "Cumulative Loyal Customers"
,100.0*(SUM(COUNT('*')) OVER(PARTITION BY EXTRACT(YEAR FROM "OD Year"), EXTRACT(YEAR FROM "FOD Year") ORDER BY "Orders" DESC))/("Cohort Customers") AS "% of Total"
FROM ( SELECT "Customer ID"
,"Customer Name"
,CAST(EXTRACT(YEAR FROM "Order Date")||'/01/01' AS DATE) AS "OD Year"
,CAST(EXTRACT(YEAR FROM "First Order Date")||'/01/01' AS DATE) AS "FOD Year"
,SUM("Orders") AS "Orders"
,((DENSE_RANK() OVER(PARTITION BY CAST(EXTRACT(YEAR FROM "First Order Date")||'/01/01' AS DATE) ORDER BY "Customer ID" ASC))+(DENSE_RANK() OVER(PARTITION BY CAST(EXTRACT(YEAR FROM "First Order Date")||'/01/01' AS DATE) ORDER BY "Customer ID" DESC))-1) AS "Cohort Customers"
FROM ( SELECT "Customer ID"
,"Customer Name"
,"Order Date"
,COUNT(DISTINCT "Order ID") AS "Orders"
,MIN("Order Date") OVER(PARTITION BY "Customer ID") AS "First Order Date"
FROM "Orders"
GROUP BY "Customer ID", "Customer Name", "Order Date") t0
GROUP BY "Customer ID", "Customer Name", CAST(EXTRACT(YEAR FROM "Order Date")||'/01/01' AS DATE), CAST(EXTRACT(YEAR FROM "First Order Date")||'/01/01' AS DATE)) t1
GROUP BY "OD Year", "FOD Year", "Orders", "Cohort Customers"
ORDER BY "OD Year" ASC, "FOD Year" ASC, "Orders" ASC, "Cohort Customers" ASC;
The above query is to answer the question What percent of customers from each cohort (year of acquisition) purchased at least 1, 2, 3, N times in 2014.
Most SAAS platforms fall short of providing API for constructing such advanced queries. There are two main reasons for this
- They may try to create a database agnostic API with lowest denominator feature set
- They may be concerned about the performance of their backend services to execute such complex queries
In addition, providing a generic SQL constructing API is also non-trivial. This is probably one of the reasons why SalesForce acquired Tableau.
What then?
The question then is how to rely on these cloud applications and cloud platforms for workflows and applications and yet be able to gather better insights with advanced analytics? In most cases the approach to this will be to extract the data for further analysis into a separate data warehouse and then use a powerful reporting solution that allows creating complex queries.
For many advanced reports that are bound by time such as "Quarter to date sales", the query of the report may me complex but the amount of data to create the report may be much smaller. Most of the SAAS platforms offer REST API to query data in bulk but limit to a few thousand records. Some of them allow querying more records by repeatedly querying the data with a moving range on a cursor. In such cases it will be practical to query the data to the client side and then do the more complex visualization of that data. However, these complex visualizations first require complex data transformations. This can be done ad-hocly or leverage a solution such as SQL Frames that allows creating complex data transformation logic using familiar SQL constructs and execute them within the browser.
SQL Frames also has the ability to generate SQL for the target database so it is possible to query the data from a database using the same logic that can be executed within the browser. This makes it possible to scale to large datasets if needed.
Conclusion
When the departments are using best of the breed SAAS applications to support their business needs, the IT department can add value by providing a unified advanced analytics solution that can be used to bring data from all the various SAAS applications so the business users can get better insights of their data. Without such a data strategy, most organizations will be generating a lot of data but not be utilizing it for driving the business.