Hey, we’re building a dashboard. Why?

We, as an organization, have been putting together MOOCs for a few years now, and we’re trying to figure out what we’re learning from all of that work. edX give us access to MOOC data weekly. Currently, the way we’re getting at that data is a set of Python scripts that organize the large lump of data we get from edX into clearer divisions.

You’ve probably seen some of this data in visualization form through what we’ve been referring to as fact sheets; here’s an example:


Looks good, right? Well every time one of these is produced, it means that a member of our team has to run these multiple scripts, wait a long time, parse out what information is actually needed, then put together the fact sheet. Almost always, requests for these fact sheets are accompanied by a request for them quickly.

What we’ve been working on is an online dashboard that will make this data more readily available and current. One of the issues that we’ve been facing is automating the process. Currently, here’s how it’s done:

1. Get the event logs (that contain data about student activity) and the database dumps (data about the course and enrollment) from edX’s repository.
2. This gives us data on all of the courses. We then need to filter this huge dataset down to the course(s) of interest. Often, we’re looking for a specific snapshot (eg, what did students do in week 3 of a course?).
3. With that snapshot, then run a bunch of scripts to figure out things like the number of students, number of certificates, etc.

This is just one part of it. There’s a ton of activity data, like how many participated in an exercise, which is all included in the event logs.

Clearly, this is a lot of work for one person. We’re in the process of creating an app where this process of accessing, filtering and displaying data is automated. Here’s a screenshot to give you an idea:


In today’s hackathon, we were wrestling for a bit with the question of how we will store course metadata (basically, ways of describing the course that are outside of how edX describes their data…one example would be how we distinguish these courses), and a lot of work went into consolidating the existing scripts with a focus on streamlining the process.

We started this project during the Fall 2015 semester, but this is the first time we’ve had a chance to work on it since December. If you’ve ever returned to a project after a long departure, you know that it takes a while to get back to speed.

We hope to continue working on this dashboard because it will make MOOC data (enrollment info, demographic breakdown of students, student interactions with the course) always available, in online form.

We’re looking forward to get your feedback on it soon!