Convert College Scorecard Files to Tableau Hyper Format using pypeds

Convert College Scorecard Files to Tableau Hyper Format using pypeds

A few weeks back, there was a request to generate Tableau versions of the College Scorecard datasets.

No alt text provided for this image

I previously supported the Scorecard datasets in the python library pypeds via the datasets module, but noticed that the features broke when the data moved to data.ed.gov. What a great excuse to update the library in order to support Jon's request!

Today I am announcing an update to pypeds that now supports the updated College Scorecard data while also providing a simple pathway to generate the updated files as Tableau hyper files.

But wait, I don't know python! That's ok, I created a Google Colab notebook for you. The link can be found below:

Click here to generate the hyper files using Google Colab.

All that you have to do is select Runtime > Run All from the menu at the top of the notebook. Google Colab will install the pypeds library for you, download the College Scorecard files, and save out the hyper files to the file browser. No python coding needed! Once the code has completed, all that you have to do is right-click the file in order to download the .hyper file to your machine for user in Tableau!

The resulting .hyper files are large (almost 3,000 columns!) so please be mindful that Tableau may slow down a bit as a result. Also, it is worth noting that if you are comfortable with python, the code shows how to write each dataset as a key in the scorecard dictionary. Each file is a pandas DataFrame, of course. For those of you wondering how this all works, pypeds is using the excellent pantab library under the hood.

That's it! If there are common data cleaning tasks (remove certain variables, recoding values, renaming columns, etc.) that you always perform with the Scorecard data, please do let me know, as pypeds really aims to standardize the extraction and collection of education datasets. If anything does not behave the way that you expected, please don't hesitate to submit bug reports or feature requests on the Github repo for pypeds.

Let me know how it goes!

Brock Tibert - did you remove the scorecard data from the library? When trying to run the notebook, I get an error: 'pypeds.datasets' has no attribute 'scorecard_merged', and checking the GitHub repo, the datasets file doesn't seem to have it. I'm relatively inexperienced in Python, though, so I may be missing it.

Like
Reply

Very cool Brock Tibert. Great use of Collab. Thanks for sharing and for continuing to create data sets that are broadly helpful to the higher ed data community.

To view or add a comment, sign in

More articles by Brock Tibert

  • Serverless Web Scraping using Google Cloud

    I recently completed a project for a client that appeared fairly straightforward on the surface, but proved to be a bit…

    2 Comments
  • Introducing pypeds

    TL;DR pip install pypeds Want to get started? Make a copy of this Google Colab notebook and run through the code…

  • Marketing Streams and Content Recommendations in Higher Ed

    Content marketing is certainly not new, but lately I have seen increased attention to this topic when it comes to…

    4 Comments
  • Project Announcement: Higher Ed School Code Crosswalk Database

    I am excited to announce the launch of what I hope to be a community-driven effort to generate a crosswalk for all of…

    17 Comments

Others also viewed

Explore content categories