Full Text Search with DataFrames

September 28, 2021 · 5 min read

Siva Dirisala

Creator of SQL Frames

One of the best things about JAMSTACK is that it is possible to provide searching the entire site with the search processing done entirely on the client. This is a big deal as there is no need for expensive servers to drive the search and more importantly it provides milliseconds latency for the users so they love to stay on the website for longer and explore.

Text search is an important part of data analysis and being able to do it entirely on the client is possible by projects like lunr. SQL Frames has integrated text search (thank you lunr). Use cases with examples are discussed below.

Anyone that provides service management be it IT, HR or Customer Support, knows that they deal with a lot of structured and unstructured data. Structured data includes fields such as incident category, priority and product release. Unstructured information includes fields such as short description, long description, incident closing comments, satisfaction survey comments and work notes. When analyzing for trends, it is common for many analytics solutions to provide reports based on structured data. However, there are not that many solutions out there that can provide analytics on unstructured data. Some may provide solutions such as word clouds but those don't tell the whole story without digging further. That digging is only possible if both structured and unstructured data can be combined together in the analysis.

Incidents Data

I was looking for publicly available incidents data and found Oil Spilling and other incidents data provided by IncidentNews @ NOAA.

Simple Text Search

Searched data trend

Rather than just doing text search on the raw data it is possible to aggregate and show the trends. Perform search in the above UI and then look at this chart to see the searched incidents trend compared against the backdrop of the overall incidents.

Use cases

While the examples above is provided in the context of full text search, the searched data trend can be applied to any type of search. Below are some use case of this type of search.

Ability to understand impact of release upgrades by searching for from and to release names in the customer support tickets data.
Ability to search for data spread across different fields and trend it by grouping against structured fields. This is useful since the keywords looking for might be in short description, closing comments or post-incident review notes.
Ability to understand customer feedback (from NPS survey for example) of your SAAS application such as how they feel about your documentation, performance, user experience, on-boarding complexity and trend this information to see whether it is improving or not.

Performance

It is important to realize that the text index can get very large if there is a lot of noise in the data. SQL Frames allows specifying which fields in the DataFrame should be used to create the text index. Not specifying any fields automatically indexes the entire DataFrame. This is usually fine with smaller DataFrames (a few thousand rows). But for larger DataFrames, it is better to selectively index only a few fields such as short description and comments and not all the fields. Further, while it is possible to include unique fields such as transaction number in the text index, this unnecessarily bloats the text index due to a large number of unique words being indexed.

Pluggable plugin

When I was evaluating different JavaScript text search engines, I came across a few. One of them is lunr and the other very promising one is Elasticlunr. Elasticlunr is much faster to index which made me to integrate with it. As I tested the search experience I realized that while it is very fast, it doesn't have the user friendly query syntax that lunr provides. I wanted to focus on the search experience than worry about the performance and so changed the default integration to using lunr. But I do realize there may be cases where fast search indexing might need to be given preference and it should be possible to make the full text search plugin to pick one of the multiple search indexing solutions. When there is a business need, I will get to it.

Incidents Data​

Simple Text Search​

Searched data trend​

Use cases​

Performance​

Pluggable plugin​