Skip to main content

TABLE

API to create a DataFrame

A DataFrame is equivalent of a database TABLE. DataFrames can be created either using static API from the DataFrame class or member functions of already existing DataFrame instances.

Base DataFrames

Base DataFrames are DataFrames that directly represent the underlying data source without any row level data transformations (such as those possible via data wrangling). These DataFrames are creating using the static API. These can be thought of as the regular tables in a database. They can then be transformed using instance APIs. These instance level data transformations result in derived DataFrames that can be thought of as the views in the database but persisted in-memory (alternatively they can be thought of as materialized views).

APIComments
DataFrame.fromURL(url)Source is a url from which a csv/json file can be downloaded
DataFrame.fromText(string)Source is a csv string
DataFrame.fromJSONArray(array)Source is an array of JavaScript objects
DataFrame.fromFile(file)Source is a HTML File API object
DataFrame.fromDataSource()Source can be one of the above

Below is an example of creating a source DataFrame loading from remote data. The remote data can be given an explicit name to be used in auto generated sql using schemaName as shown below.

Loading...
CORS enabled REST API

The REST API providing the data needs to allow CORS if the data source and the analytics UI are on different domains.

Here are some interesting points of the above code.

  1. Integrated UI: SQL Frames comes with integrated data visualizations such as data tables and charts.
  2. Zero configuration: SQL Frames is smart enough to automatically detect the data types and also provide a default table layout with appropriate column widths with zero configuration.
  3. Schema Name: Even though the data is fetched from a remote csv, an explicit schema name can be given to use in the auto generated SQL. This is how it is possible to prototype with sample data as a CSV file but fallback to executing against a big data backend with SQL if needed.
note

SQL Frames is architected for efficient usage of memory with data shared between related DataFrames.

Options

There are many options to fine tune the creation of the DataFrame. These are documented in the api.d.ts types file. The options object is the optional second parameter to all the above static API.

Transformed DataFrames

Below are the various instance level API to transform data in existing DataFrames and create new ones. Each type of DataFrame has a short code shown below. These short codes are used in the DataFrame Explorer.

APIDataFrame ClassShort CodeDescription
pdf(field1?,field2?,...)ProjectedDataFramepdfAPI to project the desired columns. No fields is same as projecting all fields.
fdf(filter?)FilteredDataFramefdfAPI to filter the rows in a DataFrame. No filter is same as selecting all rows.
SQL.join(df,alias?)....fdf(filter?)JoinedDataFramejdfAPI to create a joined DataFrame.
gdf(groupBy,agg1?,agg2?...)GroupedByDataFramegdfAPI to create aggregated (group by) DataFrame.
union(df1,df2?...)UnionDataFrameudfAPI to create UNION DataFrames.
unionall(df1,df2?...)UnionDataFrameudfAPI to create UNION ALL DataFrames.
intersect(df1,df2?...)IntersectDataFrameidfAPI to create INTERSECT DataFrames.
intersectall(df1,df2?...)IntersectDataFrameidfAPI to create INTERSECT ALL DataFrames.
except(df1,df2,...)ExceptDataFrameedfAPI to create EXCEPT DataFrames.
exceptall(df1,df2,...)ExceptDataFrameedfAPI to create EXCEPT ALL DataFrames.
vdf(rowFields,colFields,valFields?)PivotDataFramevdfAPI to create a PIVOT DataFrame. No value fields automatically selects all value fields.
uvdf()UnpivotedDataFrameuvdfAPI to create an UnpiVoted DataFrame
hdf(field1,field2?...)HierarchicalDataFramehdfAPI to create a hierarchical DataFrame.
vdf().hdf(field1,field2?...)PivotedHierarchicalDataFramehdfAPI to create a hiearchical DataFrame of a pivoted data frame.
unnest(field1,field2?...)UnnestedDataFrameundfAPI to create an unnested dataframe.
note

The short code for source DataFrames created using the static API mentioned above is sdf.

Projected DataFrame

DataFrames used to project a set of columns share their data storage with the DataFrame from which they are projected. Hence, they incur very minimal storage and computational cost. However, additional calculated fields can be specified on these DataFrames which do incur storage and comptuational cost.

Hierarchical DataFrame

Hierarchical DataFrames are not part of SQL standard. These are convinience DataFrames for hierarchical representation and visualization of data. Any DataFrame can be converted to a Hieararchical DataFrame using the df.hdf(...fields) API.

Loading...

Hierarchical DataFrames contain a root row, leaf rows and intermediate rows. For non-summary DataFrames their rows are represented as leaf rows in the corresponding hierarchy. For summary DataFrames, ROLLUP/CUBE group by clauses result in higher levels of details that are stored in the intermediate rows. See GROUP BY and PIVOT clauses for hierarchical DataFrames based on aggregated DataFrames.

Sorting happens within each hierarchical row and sorts the children of the row proceeding recursively till the leaf rows.

SQL Generation

As there is no equivalent SQL construct for Hierarchical DataFrames, the generated SQL is same as the original DataFrame used to create the hierarchy.