Highlights of DataFest 2018

Highlights of DataFest 2018

For the second consecutive year, I've enjoyed being part of the judge team at the annual DataFest event at Chapman University. Led by Dr. Michael Fahy and his team, this year's event attracted 24 teams from universities in the greater SoCal area.

What is DataFest?

DataFest is a weekend-long data hackathon, where teams of 2-5 university students are given a large, commerical dataset and a set of questions. The teams then work together using data science methods to produce one or more findings. They share their results during the presentation portion of the event.

We judged their work on 3 categories - Overall Insight, Visualization and Data.

What do Students do with data?

This is such an interesting question for me as an industry person. I was fascinated to see the different tools, approaches and results that the teams produced. There were a couple of observations that particularly interested me:

  • Most teams augmented the data that was provided with one or more sets of public data.
Students used open data, particularly government-supplied data
  • The more types of data the teams used, the more data munging (cleaning) they had to do. Some teams got stuck in this area. I actually thought that was very useful and reflective of real-world work with data.
  • Several teams used data visualizations early in their process, to get a 'view' of the quality or information in the data. One team found a bug in the vendor's website that populated a default which skewed their data!
  • The teams generally focused on using the data to gain insight into questions in one of three areas. These were as either a) making more money for the company that provided the dataset, b) providing more useful information for students or c) investigating relationships between socioeconomic impacts (poverty, levels of education, rural markets...) and the provided dataset.

Which Tools and Languages do Students use?

In this area, it interested me to observe that there seemed to be less use the R language (than in previous year's entries) and more use of Python. The most common algorithm used was logistic regression. A couple of teams built full custom machine learning models.

I heard from several teams that they were resource constrained given the size of the dataset -- due to the lack of storage and processing power on their laptops. Being a Cloud Architect, it pains me to hear that students are not using the public cloud in this work. Here's a quote from one team:

It took 2 hours to render these heat maps on our laptop.

An obvious growth area is to include mentorship with one more of the public cloud vendors for next year's event.

Hoodies and Blankets were on display during the wee hours of the hackathon.

What's Next?

Congratulations to the hardworking hosts at Chapman University and participating students on a great event. As I did last year, I invited members of the winning team to join me in real-world work. To date, I've hired one person from the winning team of 2017 - he's doing great. The energy, creativity and skills of the students inspires me.

Let's help them grow - to contribute contact Dr. Fahy via fahy@chapman.edu


Thanks for the recap Lynn. I couldn’t agree more regarding the topic of providing more real world (Cloud) tools for the teams. Several asked me how to execute James Peach’s recommendation to load the dataset into MySQL for quick high-level analysis but their laptops couldn’t handle the larger import. The one team that I know succeeded in import didn’t get viable results until just before presentation time. A standard set of on-demand Cloud resources, available to all the teams, would have allowed them to more quickly get the analysis & discovery.

Like
Reply

Congratulations to Hernan Padilla and his son!

Like
Reply

Love the narrative Lynn Langit!

Like
Reply

had a good time, just a little under the weather from the event. stayed up all night writing sql. :D

Way to go Ryan! Congrats to all on the UCI team! Go Anteaters!!

Like
Reply

To view or add a comment, sign in

More articles by Lynn Langit

  • Immigrant Stories...from Minneapolis

    The Student At the end of our remote pairing session, my young intern said, "I am happy and sad today." I said, "Why is…

    2 Comments
  • Gratitudes of 2020

    Everyone has faced struggles in this highly unusual year. As the year finally comes to close, I reflect.

    6 Comments
  • Learn with Me

    Over 4 million students have watched some part of some technical course that I've created over the past years. I have…

    1 Comment
  • GCP - What's New

    This week I attended the annual GoogleCloud Next conference in San Francisco. Given my relatively unique perspective of…

  • What should I learn now?

    In my work as a technical educator, speaker and cloud architect, I get a large number of questions from my students…

    2 Comments
  • Getting to Serverless Data

    As an independent cloud architect, I respond to the needs and desires of my customers. Although I have practical…

  • Travel Like a Techie

    I travel frequently, for both fun and work. One of the most satisfying aspects of global travel is connecting in person…

  • What is Remote Pair Programming?

    Global Work Over many years, I've been working on global projects with distributed teams. I am often asked, just how…

    2 Comments
  • Coding in American Middle Schools

    Don't tell me it can't be done. I am not saying it won't take a tremendous effort and a long time, but I am seeing the…

    50 Comments
  • What is TeamTeri?

    Motivation For much of 2017, I've been working on a series of projects in an area that has been new to me. I call this…

    3 Comments

Others also viewed

Explore content categories