Big Data and Machine Learning Hackathon using Azure ML at Microsoft
credit: Microsoft

Big Data and Machine Learning Hackathon using Azure ML at Microsoft

It was an awesome experience to participate in the Big Data and Machine Learning Hackathon using Azure ML, Power BI and other tools available on Microsoft Azure.

Our team came out as a winner for implementing end-to-end solution using Microsoft Azure toolsets.   

We used Hadoop cluster in HDInsights, R Programming, Azure ML Studio, Service Bus, Stream Analytics, Azure Websites, finally Azure Power BI. See below for more details about the solution

[Click here to tweet (can edit before sending) http://ctt.ec/bf8bd ]

About the team

It was a four member team Roy Budiantra, Timmy Liu (both drove down to Seattle all the way from Vancouver), Manish Gupta and myself (met during the week long data science boot camp). We formed the team on the spot at the venue. 

In the picture from left to right: Roy Budiantara, Manish Gupta, Timmy Liu, Scott Klein - Microsoft Azure Evangelist, Joyjeet Dey Majumdar 

Prior to this we spent at least 56+ hours during a week at Data Science hands-on boot camp organized by Raja Iqbal of Data Science Dojo. A lot of credit goes to him for preparing us as a Data Scientist that helped us identify a good model for the solution. He is a qualified data scientist, now an entrepreneur, who has spent most his career analyzing data for Microsoft Bing Ad relevance and data mining team. 

Problem Statement

We were asked to identify data from the Seattle City's data.seattle.gov website, that should be used to identify trends and predict behavior using the Azure toolsets. We as a team decided to work on Seattle elementary school data to predict how external factor may effect the schools ranking. It will also help the schools to be prepared for future need such as maintaining a healthy teacher - student ratio, focusing at areas that will help prosper a student or even reduce cost if needed.

Architecture and Design

Below image illustrates the high level design.

We used data from two sources 

  1. Seattle City website i.e. data.seattle.gov, and
  2. 10 years worth of School Data for the sate of Washington from School Digger website. 

First we used Azure ML to clean the data using R Script, from one of the sources, then put them in the Hive table running on Hadoop cluster in HDInsights. 

Then we used Azure ML and R Script to clean the data from two disparate sources. Thanks to Neeraj Khanchandani, Principal Group Program Manager at Microsoft for his help in R Script for the below cleansing model.

Here is the "R script" for those who are interested, that we used to merge data from two disparate sources. 

# Map 1-based optional input ports to variables
dataset1 <- maml.mapInputPort(1) # class: data.frame
dataset2 <- maml.mapInputPort(2) # class: data.frame

# For loop was used to update name column of schools
# to making it same inin both the datasets. 
for(i in seq_along(dataset1[,3]))
{
       x<-grep( dataset1[i,3],dataset2[,2])
      dataset1[i,3]<-dataset2[x[1],2]
}

data.set<-data.frame(dataset1)

# Select data.frame to be sent to the output Dataset port
maml.mapOutputPort("data.set");

Finally, we used Azure ML to train the model with 7 years of data and tested it with 3 years of data, using 70:30 split. We tried various machine learning algorithm such as Boosted Decision Tree Regression, Decision Forest Regression, Rank Model Temp, but finally settled with Ranker Final algorithm as it provided the lowest error variation on the training data.

We then published it as an API to be consumed by external applications.  

We published an azure website that can be used to send live data that may effect ranking of a school, using Service Bus and Stream Analytic event, that will enable Azure Power BI to show how it effect ranking of a school. Below is an image of the ranking data as it can be seen on the Power BI Dashboard. 

All these were achieved in less than 24 hours.

I can now take my knowledge on Big Data and Machine Learning to implement intelligent model and solution at work to positively impact customers and information techonolgy operation as a Data Scientist.

Proud of you Joy. You have evolved into an entrepreneur and have immersed yourself in self learning path which is so inspirational. Proud to tell people that we worked together at MS.

Awesome, Joyjeet Dey Majumdar. It was a great experience working through problem(s) as a team within 18hours. Great Going!

Like
Reply

A very good job Joyjeet Dey Majumdar and Manish Kumar Gupta, great to see learn to live experience... :)

Like
Reply

Hey nice.. My team had also won third place in Walmart's own Datathon last year. we used SAS and IBM's SPSS. Good to see a lot of this now is in Azure. Let us sync up once. I want to know how much of this is available to me. I have some large sales tax audit analytic use cases

Way to go Joy! its good to see how much you guys were able to accomplish if just few day! Thanks for sharing

Like
Reply

To view or add a comment, sign in

More articles by Joyjeet Dey Majumdar

  • Configuring Continuous Delivery in the Cloud for Azure Web Apps

    Using Azure, you can build and deploy your code continuously in most comfortable way that suites you and your team…

  • What to consider while designing for the Cloud infrastructure

    When it comes to Cloud, Amazon Web Services and Microsoft Azure are the first two that comes in mind. There are other…

    2 Comments
  • Why Windows phone will soon be #1

    I recently changed my phone to an Android device after using Windows OS for 10+ years (starting from Pocket PC). I felt…

    10 Comments
  • Now, is it Expedia vs. Priceline?

    Until Dec, 2014 there were just four brands – Expedia, Priceline, Orbitz WorldWide and Travelocity – who controlled…

    2 Comments
  • 10 ways DevOps can help your organization

    DevOps is not a process or a tool, but it is number of processes in the software development life cycle that helps both…

  • Should you raise money for your Startup?

    The reality is unless you or your team members can work for free or you are getting free services from your partners…

    4 Comments

Others also viewed

Explore content categories