My son got selected into ACSL finals which happened on 29th May. During the year he managed to get 40 on 40 which I think is awesome. He gave his best for finals as well but didn't manage to get any of the medals. While he was a bit disappointed, I personally think he did great. But is it me the dad saying this or is there some objectivity to this thinking? The good thing is ACSL finals data is publicly available and that immediately got me into thinking about doing some analysis. Here are my findings for the 2020-2021 Junior division.
But before I could start my analysis, unlike the yearly contests results which displayed the grade of the student, the finals leader board did not have the grade field in the table. Worst case I would have had to do some scraping. However, I used the contact form and requested to add the Grade and promptly it was added (thank you Marc). Immediately I downloaded the data and started my analysis using SQL Frames.
The data contains Student, Gr (Grade), School, ST (State), S (Short Answers Points), P (Programming Points), T (Total Points) and Medal. I wanted to find out some basic questions
- How many are awarded in each medal category of bronze, silver and gold?
- Does the grade of the student matter? (age group analysis)
- Are there any specific states that are doing better? (geography analysis)
- How is a student ranked overall? How about by grade?
Note that ACSL website only has the data in html format and so I had to manually get it into csv format for this analysis.
So we loaded the data and added ranking fields using the handy SQL analytic function, RANK OVER.
Multi-dimensional data cube
For our analysis, we want to create a multi-dimensional cube using Grade, State and Medals.
Visuals with Charts
Medals by Grade and State
Now that the data cube is built, let's visualize them with charts.
Average scores by Grade and State
Count by Points
For each category (short answers and programming and their total), how many students scored certain number of points for each grade? This is a complex statement and not so for SQL Frames. By leveraging the UNPIVOT SQL construct and the dynamic grid charts we will be able to easily do this analysis.
What the data tells us?
The above charts tell us the following
- 4th grade is good enough to learn theory but too early for programming
- 6th graders can compete enough to start getting medals
- 6th graders clearly separated as two clusters in programming, those with 17 and above and the rest.
- overall score gradually increases with grade (obvious but better backed by data)
- majority of the gold medals went to the 9th graders
- disproportionate participation from California and Virginia
This is why, given my son is in 6th grade, I think he has done relatively pretty well that too with a score of 20 on 20 in the programming part of the context.
I wanted to showcase using SQL Frames to analyze publicly available data in a blog and thought this is a great opportunity to do so using my own personal use case.