Machine Learning Applications on AWS - Lab 2

Machine Learning Applications on AWS - Lab 2

We've experienced a simple machine learning application in our previous lab. https://www.garudax.id/pulse/machine-learning-applications-aws-lab-1-john-yeung/

In this lab, we're going to extend our experiences and build a custom machine learning model with Amazon SageMaker. It's an AWS managed service that covers the entire machine learning workflow to label and prepare your data, choose an algorithm, train the algorithm, tune and optimise it for deployment of our models, make predictions, and take further actions. Our models get to production faster with much less effort and lower cost as a result with SageMaker.

Let's recap the core elements of using Machine Learning.

Core Elements

Element I. Problem / Business Need: The bank likes to identify some key regions as they are going to launch some new products and review their existing product positioning.

Element II. Data: In this lab, we are using a dataset which is a summary of product distribution for each region.

Element III. Data Analyst: Suppose we are taking this role in this lab.

Element IV. Algorithm: We're going to use K-Means algorithm* and AWS Cloud Platform.

Let's start it together!

* What's K-Means algorithm / clustering? It is an iterative algorithm that tries to partition the dataset into K pre-defined distinct non-overlapping subgroups (clusters) where each data point belongs to only one group. 

Step 1: Download the source file - Customer Data

https://www.dropbox.com/s/ipl52syyp0z9wm4/region_data.csv

This dataset shows the distribution (number of cases) of 4 products, namely Life Insurance, General Insurance, Personal Loan and Home Loan, of 52 regions in Hong Kong. The region cluster (RegionCluster) is the reference value of segmenting regions with similarity. We'll re-calculate the whole dataset again and assign a new cluster value with the K-Means algorithm.

Step 2: Login to AWS Management Console > Amazon SageMaker

Look up "SageMaker" in the AWS Management Console:

Step 3: Start Amazon SageMaker features

Click "Notebook instances", and then "Create notebook instance".

Notebook instance name: SageMakerDemo

Notebook instance type: ml.t2.medium (or larger instance types)

It takes few minutes to start the notebook instance. A notebook instance is a managed machine learning (ML) EC2 compute instance running the Jupyter Notebook App. Once it's ready, the status would be showed as "InService".

Click "Open Jupyter".

Click "New" list-box which shows a list of available file formats. Select "conda_python3".

Step 4: Start writing Python programming

Get the source file to the local directory. %sc means running a shell command. Firstly, we download the source file from the following link and store the file to the local directory.

%sc

!wget 'https://www.dropbox.com/s/ipl52syyp0z9wm4/region_data.csv'

Read the source file from the local directory (region_data.csv) to the "region" dataset.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
region = pd.read_csv('region_data.csv', header=0)
print(region.head())

Convert the region code to number for facilitating the clustering. The following function "regionToNumber" converts every character to ASCII integer and adds them together.

def regionToNumber(s):
    l = 0
    for x in s:
        l = l + int(hex(ord(x)),16)
    return l

Assign the dataset to an array and print the result-set on screen.

xref = pd.DataFrame(region['Region'])

region['Region']=region['Region'].apply(lambda x: regionToNumber(x))
    
region.head()

Assign an array.

regionArray = region.as_matrix().astype(np.float32)
regionArray.shape

Import KMeans library, define the output directory and specify the instance type for building the machine learning model. Also, specify the output location for storing the result.

from sagemaker import KMeans
from sagemaker import get_execution_role

role = get_execution_role()
print(role)

bucket = "jyeung-labs"
data_location = "jyeung-labs"


data_location = 's3://{}/kmeans_highlevel_example'.format(bucket)
output_location = 's3://{}/kmeans_example'.format(bucket)

print('training data will be uploaded to: {}'.format(data_location))
print('training artifacts will be uploaded to: {}'.format(output_location))

kmeans = KMeans(role=role,
                train_instance_count=1,
                train_instance_type='ml.c4.8xlarge',
                output_path=output_location,
                k=10,
                data_location=data_location)

Start building the training model.

slice=regionArray[:,1:5]

%%time
kmeans.fit(kmeans.record_set(slice))

The wall time is the duration of building the model. It takes around 3 to 4 minutes to complete. Then, we deploy the model, using the "kmeans.deploy" method.

%%time
kmeans_predictor = kmeans.deploy(initial_instance_count=1,
instance_type='ml.m4.xlarge')

It takes around 6 minutes to complete. Then, we put all rows into the slice dataset, but drop the first column.

slice=regionArray[:,1:5]
slice.shape
slice

Now, we are able to run the "kmeans_predictor.predict" method and print out the result.

%%time
result = kmeans_predictor.predict(slice)
clusters = [r.label['closest_cluster'].float32_tensor.values[0] for r in result] 
i = 0

for r in result:
    out = {
    "Region" : region['Region'].iloc[i],
    "RegionCode" : xref['Region'].iloc[i],
    "closest_cluster" : r.label['closest_cluster'].float32_tensor.values[0],
    "RegionCluster" : region['RegionCluster'].iloc[i],
    "LifeInsurance" : region['LifeInsurance'].iloc[i],
    "GeneralInsurance" : region['GeneralInsurance'].iloc[i],
    "PersonalLoan" : region['PersonalLoan'].iloc[i],
    "HomeLoan" : region['HomeLoan'].iloc[i] }
    
print(out)
i = i + 1
    

Hope that this lab can let you have some ideas / concepts of building custom machine learning models on AWS.

To view or add a comment, sign in

More articles by John Y.

  • Machine Learning Applications on AWS - Lab 1

    Story Background This article demonstrates how to run a simple Machine Learning ML application on AWS. Amazon has…

    1 Comment
  • Architecting Security and Governance Across a Multi-Account Strategy and Practices on AWS

    We understand the importance of articulating suitable governance models and security controls to manage privileges and…

  • A Real-time Analytic Pattern on AWS

    In this article, we are using a typical log analytic case to explain how to implement a real-time analytic pattern on…

  • Adding the text-to-speech ability to WordPress

    This article shows how to add the text-to-speech ability to WordPress, with Amazon Polly https://aws.amazon.

  • Five Principles of DevOps

    This article summaries some core concepts of applying DevOps. AWS technologies and experiences would be taken as…

  • Goodreads

    Adventures of a Bystander, by Peter F. Drucker This is one of my favourite books written by Peter Drucker, with a good…

  • Builder's Mindset and Skills

    Creativity engaged both sides of the brain and leveraged some key skills (Associating, Questioning, Observing…

Others also viewed

Explore content categories