Skip to main content

Data.gov Data Preview

· 3 min read
Siva Dirisala
Creator of SQL Frames

CKAN is a popular open source data management system. It is used by Data.Gov and other public data initiatives through out the world. These systems allow a way to collect and catalog the data in various formats including CSV, one of the most common data formats. CKAN has extensions to provide data visualization. However, it appears that the visualizations are not widely deployed perhaps due to the additional server-side compute required to support them.

SQL Frames provides a client-side data, visualization and intelligence platform. It is ideal for working with remote data directly within the browser. Hence, we are making this technology available for previewing datasets from Data.gov.

Schema Detection

Data lakes allow storing all kinds of data and they allow storing both structured and semi-structured data. Data can be stored with proper schema validations at the time of write or the schema can be derived/validated at the time of read. Both has advantages and disadvantages. Data catalogs that provide the metadata of your data will have to work even harder when there is a lot of semi-structured data as the data sources may not provide an easily consumable schema.

Detecting schema automatically can reduce a lot of development and/or manual effort. In this day of the world rapidly adopting AI technologies for various complex tasks, it is imperative that there be solutions to automatically detect the schemas. That is why Snowflake came out with a set of schema detection features back in July.

SQL Frames Schema Detection

SQL Frames has intelligence built into it to automatically derive the data types of various types of data. In addition, it can also recognize various formats of data like the date and datetime formats. Because of this, just providing the URL to fetch data is all that is needed to create a DataFrame.

Check the tutorial that fetches earthquake data from usgs.gov where it automatically identifies the number and datetime fields with zero configuration. This ability to automatically infer the data types and formats automatically is what allows it to be able to consume any dataset from a data lake or data catalog very easily without writing any code.

Data.Gov Data Preview

Today we are announcing Data.Gov Data Preview a browser based data preview solution for all the datasets at Data.Gov. You just setup a bookmarklet, then visit one of the datasets on Data.Gov website and then access the bookmark. It brings you back to the Data.Gov Data Preview page that displays the dataset within the browser. From within the rendered DataFrame UI, user can access the DataFrame Explorer which gives the ability to perform simple tasks such as understand the data distributions (histograms and binning) of the various fields in the data and plot SPLOM (scatter plot matrix) to understand the correlation among various fields in the dataset.

Hope you enjoy this free service and if you are interested in exploring what SQL Frames can do for your organization, reach out to us.