Skip to main content

· 4 min read

Any non-trivial data model has relationships and databases and application platforms may or may not support the notion of storing the relationships inline, that is, denormalizing the relationships into the main record. To give a real example, a user may have a set of roles in the application and this user-role relationship can be either stored in a separate table or directly within the user table. However, since the user-role relationship is 1-to-many, storing this directly within the user data model requires storing the relationship as a multi-valued column and arrays are perfect for that.

The challenge that does come with storing the 1-to-many relationsips as arrays is that it then makes it difficult to do certain SQL queries that are easy to do had the relationship been stored explicitly. That is why some databases offer a special operator to deal with such array fields.

· 3 min read

The semantics of null values in SQL and JavaScript are quite different. For example null+42 in SQL is null while null+42 in JavaScript is 42. This may or may not be a concern depending on how the application is written or what it is trying to do.

SQL Frames provides scripted fields which are compiled and then evaluated against each row. SQL Frames also provides the ability to generate SQL based on the data transformation logic used to create the DataFrame. If one were to freely move the computation of expressions between server and client as desired end users would expect the results to be same. That is why the null semantics matters.

· 5 min read

Finding items that are frequently shopped together is an interesting piece of information. For brick-and-mortar companies this can help to decide which items to offer promotions to attract more customers without comprimising the overall profitability. For online stores, it can help with being able to appropriately cross-sell items.

Frequent itemsets is a well studied data mining technique. Finding frequent pairs is a more specific but important variation since higher cardinality sets can be easily identified using lower cardinality sets. Depending on the size of data, finding the frequent pairs may have to be run on a cluster of computers to just as a single database query! When writing it as a SQL query, care must be taken to avoid performance issues. Below we investigate three different SQL queries all providing the same results.

· 4 min read

Last week I talked about opening a very large CSV file within ElectronJS based app. That post talked about how to work around the 500MB limit of a string in JavaScript for Chromium browser. I decided to push the boundary so that even larger CSV files can be opened. It turns out it is possible to open even larger files and the same techniques can be used in both ElectronJS (or NodeJS) but also within the browser.

Eventually, SQL Frames managed to load more than 1GB file with 10 million records first in ElectronJS and then within the standard Chrome browser itself. Here is how it was done.

· 4 min read

CSV files are one of the most common formats used to share data. Yet, opening very large files of this file format can be challenging for many applications. Since CSV formats are commonly opened with Excel on Windows and Numbers on Mac, one would expect these applications to open these files very fast. Surprisingly, they are extremely slow. There are many utility apps out there that provide various levels of functionality to manage large CSV files.

CSV is one of the common file formats for SQL Frames as well. However, since SQL Frames is written in TypeScript (which compiles to JavaScript) and JavaScript has a maximum limit for the size of a string, there can be challenges dealing with large CSV file sizes.