Skip to main content

Awesome Semantic Data Grid

· 11 min read
Siva Dirisala
Creator of SQL Frames

Large enterprise application software companies like Oracle and SAP used to have powerful desktop applications. During the Internet boom of the late 90s, these companies tried to provide a class of software called "self-service" where all employees, and not just those working on back-office data could interact with the applications to perform simple tasks. The web UI of that era was very basic and so it wasn't possible to create desktop grade enterprise applications for the power users.

As time passed, concepts such as PPR (partial page rendering) and AJAX have helped to push the sophistication of the UI in these web applications to the extent that these days even the power user applications are completely delivered via the web. However this used to be a dark art until the advent of the frameworks like Angular and React. These days anyone can build extremely complex UI very declaratively.

This complexity on the fontend perhaps pushed the balance of the backend-vs-frontend. While historically backend development used to be hard and meant for advanced developers these days the complexity of the frontend also requires highly trained developers and in some ways simplifies the complexity on the backend. No wonder, a good React developer charges $250/hr or more these days.

In spite of all these advances, I think there is one area where things haven't really caught up enough. That is the ability to deal with semantically displaying large amounts of data. It could be too many rows or too many columns or both. Basically, the lack of an awesome semantic data grid. But SQL Frames wants is making progress to change that.

Use cases

Large data grids are needed in many scenarios. Below are a few examples.

  1. Financial Forecast Viewing the forecast day for a year at day level granularity requries atleast 365 columns (and more if weekly, monthly and quarterly aggregates are also shown). The same forecast data when visualized by department, location and currency results in several rows.
  2. Factory monitoring Collecting device data every few seconds and viewing the associated aggregate matrics by minute, 5-minute, 15-minute and so on for multiple products and manufacturing eqipment results in several columns and rows of data. An added challenge to this type of data is the temporal dimension. That is, viewing this data live by continuously streaming the measurements and aggregating. This article is not about this type of data.

Data Grid features

First, let's review some of the key features desired in a Data Grid.

  • Ability to fix header rows
  • Ability to fix header columns
  • Ability to display dynamic height rows
  • Ability to display dynamic width columns
  • Ability to display hierarchical data
  • Ability to display pivoted data
  • Ability to display pivoted data hierarchically (both row and column hierarchies)
  • Ability to display large volumes of data with infinite scrolling
  • Ability to display very wide tables (especially wide pivot tables) without browser freeze
  • Ability to automatically provide the right width for each column based on the data

It is important to note, most existing grid systems provide one or more features but there is hardly any grid that is capable of everything, especially if a restriction is imposed that the markup should use the semantic table element instead of the visual div element.

table vs div

Most powerful grid systems use div based markup. This is because the div element is a fundamental building block of HTML and is very versatile. Many HTML and CSS specifications are available for div and ony subsequently were they added, sometimes with restrictions, on other elements such as table.

While HTML in general is a markup for presentation, a few elements such as table are infact semantic. The thead, tbody, tr, th, td, colgroup, col elements provide information about the underlying data. div elements on the other hand have absolutely no such concept.

info

Accessiblity constructs such as aria role can help provide additional context to div but it is not natural. Further, I have come across grid systems where the order of the divs in markup doesn't match the order in the display! By using absolute or relative positioning, some of these grid systems make it hard to get the meaning of the data without the visualization.

One of the key decisions to make when building a Data Grid is whether to go with table or div. Each has its own challenges but I personally think that using table is slightly more complex for developers creating the Data Grid while it is easy for the developers using the Data Grid. Of course, it is the best option for end users due to the usage of a semantic markup.

Semantic Data Grid

SQL Frames data grid is based on table

Fixed Headers

When there is large amount of data, it is important to be able to fix the row and column headers.

Header rows

This is easily accomplished with sticky (css) header row cells and also a z-index of 1 to ensure the rows below gets tucked under the header rows while scrolling.

Sticky header rows

For header rows, the top css property doesn't seem to be required when position is sticky.

Header columns

This is slightly more complex because horizontal sticky position also requires the left (right in case of RTL display) value of the element. If the width of the column is known up front, then this would be easy. If it is not known up front, then it has to be dynamically updated after rendering the HTML. They also need to have a z-index of 1 to ensure the columns to the right gets tucked under the header columns while scrolling.

tip

The corner header is both a row header and a column header. It needs to have a z-index of 2 so that the non-header column header row fields get tucked under the header columns while scrolling horizontally.

Virtualization

In order to display large volumes of data, it is important to make sure only the visible part of the Data Grid is rendered (think of minecraft with unlimited terrain but only the visible part is rendered for better performance).

Row virtualization

To deal with display of large number of rows, the rendering should limit the number of rows being displayed. One of the ways to achieve this for table based virtualization is to have the following markup.

<div><!-- viewport -->
<div><!-- vertical scrollpane -->
<table><!-- limited to visible rows -->

The viewport div provides the illusion of a small window into the underlying vast data grid. The vertical scrollpane div maintains the scroll position for the rows within the larger table (even though only a small number of rows are displayed).

Column virtualization.

Recently I saw a blog post by a Y-combinator backed startup about how their journey of grid virtualization improved the performance 500%. During the journey they indicate that they had to move away from the table based approach to div based approach. Even though SQL Frames had row virtualization for a long time, support for column virtualization came in only recently (starting version 0.11.12). This is because, while it is easy to virtualize the rows by adjusting the viewport height to the visible rows, it is not practical to adjust the viewport width to the visual columns.

Every column virtualization system that has been surveyed assumes that the column widths are known upfront. While this is an easy decision from coding the grid point of view, it adds development work for those using the grid.

info

The semantic table automatically calculates the width of a column based on all the cells within the column and then picks the right value. But by using div elements, there is no coordination happening automatically by the browser and hence the need for fixed known widths upfront with div based solutions.

SQL Frames has a lot of intelligence to identify type of the data and also the display sizes required for optimal viewing experience (the calculated sizes are only a hint and the browser can adjust as needed). The goal was to ensure this smart width feature is retained even with virtualization. However, due to all these complexities, column virtualization was not provided for a long time by SQL Frames. Then came the great idea that made the column virtualization possbile with table.

The idea is simple. In case of row virtualization, only the visible rows are displayed and no other rows (this is actually not 100% true, see the accessibility section below). However, in case of column virtualization, the markup will be creating a table with the entire width but the invisible data columns will be collapsed into a single td! This is done by making use of colspan which indicates how many columns a td should be spanning. For this to work the header rows still need to be rendered completely (and several thousands seem to be OK, although column resize is slow). The good thing is, the header rows don't need to be re-rendered every time the underlying data is scrolled which triggers a rerendering due to row virtualization. So, there is a onetime cost of rendering the entire header rows (just 1 for regular table, 2 if there are column groupings and a few more for pivot tables due to the pivoted column headers).

The markup for the column virtualization that works using table is the following

<tr><td>Fixed</td><td colspan='left-colspan'></td><td>multiple tds</td><td colspan='right-colspan'></td></tr>

This solution requires keeping track of the left and right colspans as the user scrolls horizontally or when a column is resized.

colspan limit

As per the HTML standard a colspan can have a valid value of 1 to 1000. Hence, when the logical colspan is greater than 1000 it requires creating multiple td elements.

Accessibility

It is important to be able to navigate the virtualized data grid using the keyboard. However, when only the visible parts of the grid are displayed, this becomes tricky. The browsers automatically adjust the focus to the next logical element when using the left/right and top/down arrows. But when the user is already at the edge of the viewport, using these keys will not work because the browser doesn't see any additional elements. The trick here is to create extra elements that are not visible so when the browser is at the edge of the visible area, it still can navigate to the next element that is invisible and bring it to focus.

preact VDOM Reconciliation

While focus was fine when navigating past the left edge, it was getting lost when navigating past the right edge. This seems to be happening because of the virtual dom reconciliation using the node.insertBeore API which seems to temporarily remove the node which results in losing focus. This needs to be handled explicity to ensure keyboard navigation continues to work.

Focus problems

When navigating fast, it is possible the rendering hasn't caught up and the focus can be lost because of that. Also, after focusing on a cell if the user scrolls such that the focused cell is out of the viewport, the corresponding focused cell markup is lost and so is the access. So, accessibility with virtualization is still not perfect. It may be possible to programatically keep track of the focus but that may have some performance overhead.

Weird grids in the wild

As part of figuring out all this, I reseached many grids out in the wild. Below are some interesting implementations

  1. Two tables to achieve fixing the header columns one sliding on the other. This pontentially has accessibility issues not to mention the extra markup parsing and synchronization issues.
  2. Three tables, 1 for top-left corner, one for the rest of the headr rows and a 3rd one for all the data.
  3. React-virtualized has an example of Multigrid which similarly stitches together multiple divs to provide fixed rows and columns.

Conclusion

SQL Frames is creating the best UX to work with data no matter what size and shape it comes in. This requires working on the backend technologies like data transformation using SQL constructs, auto generation of SQL, highly complex charting engine (that sits on top of eCharts) and also a highly performant and semantic enterprise data grid. The low-code API also emphasizes on least amount of coding. Feel free to reach out to me for more details or how your team or organization can make use of such an advanced technology.