Real Time Analytics
The two most popular charts are Top N charts and Trend charts. That is because almost all businesses want to ask questions like
- Who are my top buyers (top N)
- Which are the top selling product categories (top N)
- Is the services revenue growing quarter over quarter (trend)
- How did this stock perform in the last two years (trend)
There is also a third type of analysis that is applicable to certain use cases such as infrastructure monitoring, stock price fluctuations and monitoring of manufacturing processes. This class of analytics is near-realtime in nature.
SQL Frames can support such near-realtime analytics use cases. More importantly such user experiences can be easily tailored with SQL Frames low code API as shown in the demo below.
Realtime Data Sources
When creating realtime analytics, it is very important to understand how the data is moving from the server to the client. This is non-trivial for some of the reasons mentioned below
- Velocity of data. Rapidly generating data such as a stock price fluctuation needs to react to the end user as fast as possible.
- Volume of data. If there is an IT dashboard showing the overall health of a datacenter there can be several metrics that are tracked for the entire datacenter which can lead to a large volume of data. A combination of velocity and volume is even more challenging.
- Number of consumers of data. When there are a large number of end users consuming the data, then the high volume and/or velocity data also needs to reach a large number of destination.
All of these put together, for scalability and even preserving consistency of data, one might use a solution like Kafka which can even allow aggregating the data streams to some extent before sending it off to the clients.
Push vs Pull
It is possible to pull the data by client at a certain frequency and that works to a certain extent. Especially if the data streams are cached on the server and streaming based on timestamp, then the overall network bandwidth and the serverside load can be drastically reduced.
In some cases, especially if it is highly time sensitive, pull mechanism might not work. In such cases using technologies like Web-Sockets it is possible to push the data from the server to the client based on a true change to the data.
Change Data Capture
Change Data Capture (CDC) is a popular way to streaming continuously changing data. The receiving side can incrementally apply these changes on their side to maintain the recent state of data on the server side. Applying changes on the client-side has many benefits. One of the key benefits when it comes to visualization is the ability to show the changes with animation.
SQL Frames supports all the key SQL constructs to perform complex data transformations. However, at present it doesn't have the capability to apply delta changes and propagate these changes through out the data pipeline. This is a complex problem, albiet doable. I hope to make this happen at some point in the future.
Even without the incremental change capability, it is possible to carefully design the data pipeline such that the client transformations are complete under a second and rerender the entire table and/or chart and make it seem like the visualization is animated.
Virtual DOM
One of the benefits of using React like frameworks is the VDOM. When the UI is based on VDOM, the underlying framework works hard to do the diff and only apply the nodes that truly changed. This creates the illusion that the data is being animated within a table for example. Imagine a table being completely re-rendered and replacing an already existing table. In that case, since the entire table is being replaced, the browser itself might take some time to rerender and in the process loses the smooth display experience.
SQL Frames uses PREACT, a high performance variant of the popular React framework. This makes the SQL Frames tables to display data by applying delta changes to the DOM making them very smooth and amenable to real-time analytics.
eCharts Animations
SQL Frames also uses Apache eCharts which also has the ability to merge the chart configuration with the exiting configuration. As mentioned above, at present SQL Frames doesn't support delta changes for the data transformation. As a result, the charts needs to be completely re-rendered. However, since the charts are closely integrated with the underlying DataFrames, the same chart object is reused. This results in a smooth animation of data changes.
While the animation makes the visualization pleasant, there are certain limitations when it comes to realtime analytics based on timeseries. Ideally the chart should be moving from right to left as time progresses. However, that right-to-left shift is not visible at this time.
Demo
In the following example, the data is completely generated on the client-side and not retrieved from any server to make this a standalone example. Data is randomly generated initially and then every 10 seconds. In addition, the data is aggregated every minute. So, make sure to watch the chart for a couple of minutes to see the realtime analytics experience with SQL Frames.
Note that most of the below code is to randomly generate data.
Initial data load
We are not returning the DataFrame that loaded the initial data because it is going to get modified continuously. So, everything is displayed together along with the incremental data load shown below.
Incremental data load
Conclusion
All the above discussions makes it clear that realtime analytics is much more complex than just regular data analytics. Not only does it require a sophisticated data architecture on the backend to ensure scalability, consistency and reliability the data transformation and presentation architecture on the frontend is also equally important to provide visualizations that fully do justice to the temporal changes with suitable animations.
SQL Frames can help with that goal by providing highly interactive UX based on complex data transformations and tightly integrated UI for tabular and chart visualizations.