Awesome Semantic Data Grid
Large enterprise application software companies like Oracle and SAP used to have powerful desktop applications. During the Internet boom of the late 90s, these companies tried to provide a class of software called "self-service" where all employees, and not just those working on back-office data could interact with the applications to perform simple tasks. The web UI of that era was very basic and so it wasn't possible to create desktop grade enterprise applications for the power users.
As time passed, concepts such as PPR (partial page rendering) and AJAX have helped to push the sophistication of the UI in these web applications to the extent that these days even the power user applications are completely delivered via the web. However this used to be a dark art until the advent of the frameworks like Angular and React. These days anyone can build extremely complex UI very declaratively.
This complexity on the fontend perhaps pushed the balance of the backend-vs-frontend. While historically backend development used to be hard and meant for advanced developers these days the complexity of the frontend also requires highly trained developers and in some ways simplifies the complexity on the backend. No wonder, a good React developer charges $250/hr or more these days.
In spite of all these advances, I think there is one area where things haven't really caught up enough. That is the ability to deal with semantically displaying large amounts of data. It could be too many rows or too many columns or both. Basically, the lack of an awesome semantic data grid. But SQL Frames wants is making progress to change that.
Use cases
Large data grids are needed in many scenarios. Below are a few examples.
- Financial Forecast Viewing the forecast day for a year at day level granularity requries atleast 365 columns (and more if weekly, monthly and quarterly aggregates are also shown). The same forecast data when visualized by department, location and currency results in several rows.
- Factory monitoring Collecting device data every few seconds and viewing the associated aggregate matrics by minute, 5-minute, 15-minute and so on for multiple products and manufacturing eqipment results in several columns and rows of data. An added challenge to this type of data is the temporal dimension. That is, viewing this data live by continuously streaming the measurements and aggregating. This article is not about this type of data.
Data Grid features
First, let's review some of the key features desired in a Data Grid.
- Ability to fix header rows
- Ability to fix header columns
- Ability to display dynamic height rows
- Ability to display dynamic width columns
- Ability to display hierarchical data
- Ability to display pivoted data
- Ability to display pivoted data hierarchically (both row and column hierarchies)
- Ability to display large volumes of data with infinite scrolling
- Ability to display very wide tables (especially wide pivot tables) without browser freeze
- Ability to automatically provide the right width for each column based on the data
It is important to note, most existing grid systems provide one or more features but there is
hardly any grid that is capable of everything, especially if a restriction is imposed
that the markup should use the semantic table
element instead of the visual div
element.
table
vs div
Most powerful grid systems use div
based markup. This is because the div
element is a
fundamental building block of HTML and is very versatile. Many HTML and CSS specifications
are available for div
and ony subsequently were they added, sometimes with restrictions, on
other elements such as table
.
While HTML in general is a markup for presentation, a few elements such as table
are infact
semantic. The thead
, tbody
, tr
, th
, td
, colgroup
, col
elements provide information
about the underlying data. div
elements on the other hand have absolutely no such concept.
Accessiblity constructs such as aria role can help provide additional context to div
but it is not natural.
Further, I have come across grid systems where the order of the divs in markup doesn't match
the order in the display! By using absolute or relative positioning, some of these grid systems
make it hard to get the meaning of the data without the visualization.
One of the key decisions to make when building a Data Grid is whether to go with table
or div
.
Each has its own challenges but I personally think that using table
is slightly more complex for
developers creating the Data Grid while it is easy for the developers using the Data Grid. Of course,
it is the best option for end users due to the usage of a semantic markup.
SQL Frames data grid is based on table
Fixed Headers
When there is large amount of data, it is important to be able to fix the row and column headers.
Header rows
This is easily accomplished with sticky (css) header row cells and also a z-index of 1 to ensure the rows below gets tucked under the header rows while scrolling.
For header rows, the top
css property doesn't seem to be required when position
is sticky.
Header columns
This is slightly more complex because horizontal sticky position also requires the left
(right
in case of RTL display)
value of the element. If the width of the column is known up front, then this would be easy.
If it is not known up front, then it has to be dynamically updated after rendering the HTML.
They also need to have a z-index of 1 to ensure the columns to the right gets tucked
under the header columns while scrolling.
The corner header is both a row header and a column header. It needs to have a z-index of 2 so that the non-header column header row fields get tucked under the header columns while scrolling horizontally.
Virtualization
In order to display large volumes of data, it is important to make sure only the visible part of the Data Grid is rendered (think of minecraft with unlimited terrain but only the visible part is rendered for better performance).
Row virtualization
To deal with display of large number of rows, the rendering should limit the number of rows
being displayed. One of the ways to achieve this for table
based virtualization is
to have the following markup.
<div><!-- viewport -->
<div><!-- vertical scrollpane -->
<table><!-- limited to visible rows -->
The viewport div
provides the illusion of a small window into the underlying vast data grid.
The vertical scrollpane div
maintains the scroll position for the rows within the larger table
(even though only a small number of rows are displayed).
Column virtualization.
Recently I saw a blog post by a Y-combinator backed startup about how their
journey of grid virtualization improved the performance 500%. During the journey they
indicate that they had to move away from the table
based approach to div
based approach.
Even though SQL Frames had row virtualization for a long time, support for column
virtualization came in only recently (starting version 0.11.12). This is because,
while it is easy to virtualize the rows by adjusting the viewport height to the visible
rows, it is not practical to adjust the viewport width to the visual columns.
Every column virtualization system that has been surveyed assumes that the column widths are known upfront. While this is an easy decision from coding the grid point of view, it adds development work for those using the grid.
The semantic table
automatically calculates the width of a column based on all the cells
within the column and then picks the right value. But by using div
elements, there is no coordination
happening automatically by the browser and hence the need for fixed known widths upfront with div
based solutions.
SQL Frames has a lot of intelligence to identify type of the data and also the display sizes
required for optimal viewing experience (the calculated sizes are only a hint and the browser can adjust as needed).
The goal was to ensure this smart width feature is retained even with virtualization.
However, due to all these complexities, column virtualization was not provided for a long time by SQL Frames.
Then came the great idea that made the column virtualization possbile with table
.
The idea is simple. In case of row virtualization, only the visible rows are displayed and no
other rows (this is actually not 100% true, see the accessibility section below). However,
in case of column virtualization, the markup will be creating a table with the entire width
but the invisible data columns will be collapsed into a single td
! This is done by making
use of colspan
which indicates how many columns a td
should be spanning. For this to work
the header rows still need to be rendered completely (and several thousands seem to be OK, although column resize is slow).
The good thing is, the header rows don't need to be re-rendered every time the underlying data is
scrolled which triggers a rerendering due to row virtualization. So, there is a onetime cost
of rendering the entire header rows (just 1 for regular table, 2 if there are column groupings
and a few more for pivot tables due to the pivoted column headers).
The markup for the column virtualization that works using table
is the following
<tr><td>Fixed</td><td colspan='left-colspan'></td><td>multiple tds</td><td colspan='right-colspan'></td></tr>
This solution requires keeping track of the left and right colspans as the user scrolls horizontally or when a column is resized.
colspan
limitAs per the HTML standard
a colspan
can have a valid value of 1 to 1000. Hence, when the logical colspan is greater than 1000
it requires creating multiple td
elements.
Accessibility
It is important to be able to navigate the virtualized data grid using the keyboard. However, when only the visible parts of the grid are displayed, this becomes tricky. The browsers automatically adjust the focus to the next logical element when using the left/right and top/down arrows. But when the user is already at the edge of the viewport, using these keys will not work because the browser doesn't see any additional elements. The trick here is to create extra elements that are not visible so when the browser is at the edge of the visible area, it still can navigate to the next element that is invisible and bring it to focus.
While focus was fine when navigating past the left edge, it was getting lost when navigating past the right edge.
This seems to be happening because of the virtual dom reconciliation using the node.insertBeore
API which
seems to temporarily remove the node which results in losing focus. This needs to be handled explicity to
ensure keyboard navigation continues to work.
When navigating fast, it is possible the rendering hasn't caught up and the focus can be lost because of that. Also, after focusing on a cell if the user scrolls such that the focused cell is out of the viewport, the corresponding focused cell markup is lost and so is the access. So, accessibility with virtualization is still not perfect. It may be possible to programatically keep track of the focus but that may have some performance overhead.
Weird grids in the wild
As part of figuring out all this, I reseached many grids out in the wild. Below are some interesting implementations
- Two tables to achieve fixing the header columns one sliding on the other. This pontentially has accessibility issues not to mention the extra markup parsing and synchronization issues.
- Three tables, 1 for top-left corner, one for the rest of the headr rows and a 3rd one for all the data.
- React-virtualized has an example of Multigrid
which similarly stitches together multiple
div
s to provide fixed rows and columns.
Conclusion
SQL Frames is creating the best UX to work with data no matter what size and shape it comes in. This requires working on the backend technologies like data transformation using SQL constructs, auto generation of SQL, highly complex charting engine (that sits on top of eCharts) and also a highly performant and semantic enterprise data grid. The low-code API also emphasizes on least amount of coding. Feel free to reach out to me for more details or how your team or organization can make use of such an advanced technology.