Machine Learning Applications on AWS - Lab 2

John Y.

Published Feb 12, 2019

We've experienced a simple machine learning application in our previous lab. https://www.garudax.id/pulse/machine-learning-applications-aws-lab-1-john-yeung/

In this lab, we're going to extend our experiences and build a custom machine learning model with Amazon SageMaker. It's an AWS managed service that covers the entire machine learning workflow to label and prepare your data, choose an algorithm, train the algorithm, tune and optimise it for deployment of our models, make predictions, and take further actions. Our models get to production faster with much less effort and lower cost as a result with SageMaker.

Let's recap the core elements of using Machine Learning.

Core Elements

Element I. Problem / Business Need: The bank likes to identify some key regions as they are going to launch some new products and review their existing product positioning.

Element II. Data: In this lab, we are using a dataset which is a summary of product distribution for each region.

Element III. Data Analyst: Suppose we are taking this role in this lab.

Element IV. Algorithm: We're going to use K-Means algorithm* and AWS Cloud Platform.

Let's start it together!

* What's K-Means algorithm / clustering? It is an iterative algorithm that tries to partition the dataset into K pre-defined distinct non-overlapping subgroups (clusters) where each data point belongs to only one group.

Step 1: Download the source file - Customer Data

https://www.dropbox.com/s/ipl52syyp0z9wm4/region_data.csv

This dataset shows the distribution (number of cases) of 4 products, namely Life Insurance, General Insurance, Personal Loan and Home Loan, of 52 regions in Hong Kong. The region cluster (RegionCluster) is the reference value of segmenting regions with similarity. We'll re-calculate the whole dataset again and assign a new cluster value with the K-Means algorithm.

Step 2: Login to AWS Management Console > Amazon SageMaker

Look up "SageMaker" in the AWS Management Console:

Step 3: Start Amazon SageMaker features

Click "Notebook instances", and then "Create notebook instance".

Notebook instance name: SageMakerDemo

Notebook instance type: ml.t2.medium (or larger instance types)

It takes few minutes to start the notebook instance. A notebook instance is a managed machine learning (ML) EC2 compute instance running the Jupyter Notebook App. Once it's ready, the status would be showed as "InService".

Click "Open Jupyter".

Click "New" list-box which shows a list of available file formats. Select "conda_python3".

Step 4: Start writing Python programming

Get the source file to the local directory. %sc means running a shell command. Firstly, we download the source file from the following link and store the file to the local directory.

%sc

!wget 'https://www.dropbox.com/s/ipl52syyp0z9wm4/region_data.csv'

Read the source file from the local directory (region_data.csv) to the "region" dataset.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
region = pd.read_csv('region_data.csv', header=0)
print(region.head())

Convert the region code to number for facilitating the clustering. The following function "regionToNumber" converts every character to ASCII integer and adds them together.

def regionToNumber(s):
    l = 0
    for x in s:
        l = l + int(hex(ord(x)),16)
    return l

Assign the dataset to an array and print the result-set on screen.

xref = pd.DataFrame(region['Region'])

region['Region']=region['Region'].apply(lambda x: regionToNumber(x))
    
region.head()

Assign an array.

regionArray = region.as_matrix().astype(np.float32)
regionArray.shape

Import KMeans library, define the output directory and specify the instance type for building the machine learning model. Also, specify the output location for storing the result.

from sagemaker import KMeans
from sagemaker import get_execution_role

role = get_execution_role()
print(role)

bucket = "jyeung-labs"
data_location = "jyeung-labs"


data_location = 's3://{}/kmeans_highlevel_example'.format(bucket)
output_location = 's3://{}/kmeans_example'.format(bucket)

print('training data will be uploaded to: {}'.format(data_location))
print('training artifacts will be uploaded to: {}'.format(output_location))

kmeans = KMeans(role=role,
                train_instance_count=1,
                train_instance_type='ml.c4.8xlarge',
                output_path=output_location,
                k=10,
                data_location=data_location)

Start building the training model.

slice=regionArray[:,1:5]

%%time
kmeans.fit(kmeans.record_set(slice))

The wall time is the duration of building the model. It takes around 3 to 4 minutes to complete. Then, we deploy the model, using the "kmeans.deploy" method.

%%time
kmeans_predictor = kmeans.deploy(initial_instance_count=1,
instance_type='ml.m4.xlarge')

It takes around 6 minutes to complete. Then, we put all rows into the slice dataset, but drop the first column.

slice=regionArray[:,1:5]
slice.shape
slice

Now, we are able to run the "kmeans_predictor.predict" method and print out the result.

%%time
result = kmeans_predictor.predict(slice)
clusters = [r.label['closest_cluster'].float32_tensor.values[0] for r in result] 
i = 0

for r in result:
    out = {
    "Region" : region['Region'].iloc[i],
    "RegionCode" : xref['Region'].iloc[i],
    "closest_cluster" : r.label['closest_cluster'].float32_tensor.values[0],
    "RegionCluster" : region['RegionCluster'].iloc[i],
    "LifeInsurance" : region['LifeInsurance'].iloc[i],
    "GeneralInsurance" : region['GeneralInsurance'].iloc[i],
    "PersonalLoan" : region['PersonalLoan'].iloc[i],
    "HomeLoan" : region['HomeLoan'].iloc[i] }
    
print(out)
i = i + 1

Hope that this lab can let you have some ideas / concepts of building custom machine learning models on AWS.

To view or add a comment, sign in

Machine Learning Applications on AWS - Lab 2

John Y.

Core Elements

Step 1: Download the source file - Customer Data

Step 2: Login to AWS Management Console > Amazon SageMaker

Step 3: Start Amazon SageMaker features

Step 4: Start writing Python programming

Hope that this lab can let you have some ideas / concepts of building custom machine learning models on AWS.

More articles by John Y.

Others also viewed

Passing the AWS Machine Learning Specialty exam the first time

AWS re:Invent 2024: My Key Announcements and Strategic Takeaways

How I prepared for the AWS Machine Learning Specialty Certification

First Ever Cloud Machine Learning Platforms with Notebooks Released in 2020: AWS SageMaker, Azure Machine Learning Studio, Google AI Platform

Enhancing MLOps with Amazon SageMaker: A Comprehensive Guide

How I passed my AWS ML Specialty Exam

Model Monitoring with AWS Sagemaker Studio

Google's Embedding Gemma 3 270M on AWS Lambda: A Curiosity-Driven Experiment

Understanding Types of Data for Machine Learning in AWS

MLOps on AWS: A Comprehensive Guide

Explore content categories

Core Elements

Step 1: Download the source file - Customer Data

Step 2: Login to AWS Management Console > Amazon SageMaker

Step 3: Start Amazon SageMaker features

Step 4: Start writing Python programming

Hope that this lab can let you have some ideas / concepts of building custom machine learning models on AWS.

More articles by John Y.

Machine Learning Applications on AWS - Lab 1

Architecting Security and Governance Across a Multi-Account Strategy and Practices on AWS

A Real-time Analytic Pattern on AWS

Adding the text-to-speech ability to WordPress

Five Principles of DevOps

Goodreads

Builder's Mindset and Skills

Others also viewed

Passing the AWS Machine Learning Specialty exam the first time

AWS re:Invent 2024: My Key Announcements and Strategic Takeaways

How I prepared for the AWS Machine Learning Specialty Certification

First Ever Cloud Machine Learning Platforms with Notebooks Released in 2020: AWS SageMaker, Azure Machine Learning Studio, Google AI Platform

Enhancing MLOps with Amazon SageMaker: A Comprehensive Guide

How I passed my AWS ML Specialty Exam

Model Monitoring with AWS Sagemaker Studio

Google's Embedding Gemma 3 270M on AWS Lambda: A Curiosity-Driven Experiment

Understanding Types of Data for Machine Learning in AWS

MLOps on AWS: A Comprehensive Guide

Explore content categories