A Platform for Machine Learning

Dattaraj Rao

Published Dec 3, 2018

Our general assumption is that Data Science is all about Algorithm development and Model selection. Not quite true in today's BigData world. If you have a neatly packaged CSV file as data source then yes, you can spend quality time figuring out how different features correlate with each other and extract patterns using Machine Learning (ML). However, today’s data Scientist usually spends more than 50% time dealing with other Concerns (areas of focus) like Data Preparation, Cleansing, Distributed Training, Model Deployment and Scaling. Data in the raw format needs to be explored and converted to Features that can be used for building Models. When you have terabytes of Training data, you need a distributed Training setup to train the Model in parallel in minutes rather than hours. You need ability to experiment with several Hyper-parameter combinations to find the best ones for your Model. Finally, once the Model is developed and validated, it needs to be integrated with your application Software code and deployed to Production. You really don’t want your highly paid Data Scientists struggling with these infrastructure Concerns trying to move data out of your Systems or figure how to distribute training jobs or writing Production code to integrate ML Models with Software.

Today, we are seeing an evolution of Data Science or Machine Learning Platforms that try and address these important concerns for Data Scientists. These could be hosted Platforms like AWS SageMaker, Google CloudML or Azure ML Studio. Here all your data needs to be uploaded to respective Cloud storage and they provide components for building a ML pipeline. Cloud vendors like Amazon and Google are continuously adding new and great features to their ML solutions in an effort to attract customers to move their Data (along with their Business) to their Platform. Another option is to use an on-premise Platform like H2O.ai and DataBricks which can connect to your SQL and NoSQL datasources.

These Platforms are pretty awesome but tend to very generic - they aim to solve ML problems in all domains from Manufacturing to Healthcare to Finance to Gaming. If you want something specific to your problem domain - you could select Components and build your own Data Science Platform. OpenSource Technologies like Kubeflow and TensorFlow-Serving try to do exactly this. They give you tools to integrate Components addressing specific ML Concerns and run these in Docker Containers on-top of an Orchestration layer - that is Kubernetes.

Kubernetes (K8S) provides many abstractions to address Infrastructure Concerns like Scaling & Load-balancing (Deployments), Networking (Services), Storage (Persistent Volumes) and Security (Secrets). Kubeflow sits on Kubernetes and provides components that address specific ML Concerns - JupyterHub for Analytics UI, TensorFlow-Jobs for distributed training, TensorFlow-Serving for Model Deployment at scale - and more Components coming as we speak. You can choose either the whole stack or specific components to customize your own ML pipelines on K8S. Let's take a look at one specific Component addressing the Concern of Deployment of ML Models.

I did a quick experiment to covert a Deep Learning Model written in Keras to TensorFlow-Serving format and “Serve” it as a Microservice. Keras, for those who don't know, is a library running on TensorFlow framework that provides many abstractions to build Deep Learning Models with few lines of code. Here is a quick example along with my documented code.

First, let's build a simple Keras Model that learns from MNIST Handwritten Digits dataset and can predict the digit by reading a 28x28 image. This is the 'Hello World' example of Deep Learning and Keras makes this model extremely easy to build.

# Build a simple Keras Model to recognize handwritten digits
import tensorflow as tf
mnist = tf.keras.datasets.mnist

# Train using MNIST dataset which is available in Keras
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# Build our Deep Learning Model
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(512, activation=tf.nn.relu),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])

# Compile Model - use Adam optimizer and Cross Entropy loss function
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train the Model on MNIST data
model.fit(x_train, y_train, epochs=1)
model.evaluate(x_test, y_test)

# Show Model summary
model.summary()

RESULT

Epoch 1/1
60000/60000 [==============================] - 8s 136us/step - loss: 0.2217 - acc: 0.9347
10000/10000 [==============================] - 0s 33us/step
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten_2 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 512)               401920    
_________________________________________________________________
dropout_2 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_5 (Dense)              (None, 10)                5130      
=================================================================
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0

Next, we will covert this Keras Model into the Prototype buffer format currently used by TF-Serving. This code may not stand the test of time as new TensorFlow and TF-Serving versions are released. But for now for TensorFlow 1.9 things should be good.

import tensorflow as tf

# The export path contains the name and the version of the model
tf.keras.backend.set_learning_phase(0)
#model = tf.keras.models.load_model('./model.h5')
export_path = './MNISTModel/1'

# Fetch the Keras session and save the model
# The signature definition is defined by the input and output tensors
# And stored with the default serving key
with tf.keras.backend.get_session() as sess:
    tf.saved_model.simple_save(
        sess,
        export_path,
        inputs={'input_image': model.input},
        outputs={t.name:t for t in model.outputs})

RESULT

INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: ./MNISTModel/1/saved_model.pb

Finally, a simple command to run a Docker container by passing our Model path as a parameter. If you are deploying this in Kubernetes you will have to add a PersistentVolume and add the Model to that - then provide to your Pod.

$ docker run -it --rm -p 8501:8501 -v '/Users/dattarajrao/tensorflow/MNISTModel:/models/mnist' -e MODEL_NAME=mnist tensorflow/serving

Now you have the Model deployed as a Service and available at a URL. Let's write a simple client to invoke our Model. We will have this client read 2 images from disk and pass these as parameters to predict the Digit.

import requests
import json
import numpy as np
from PIL import Image

import matplotlib.pyplot as plt
%matplotlib inline

# Serving URL where the Model os deployed
SERVING_URL = 'http://localhost:8501/v1/models/mnist:predict'
IMAGE_SIZE = (28,28)

# function to make the Prediction by calling the SERVING_URL
def make_Prediction(img_file):    
    # Convert Image to array
    img = Image.open(img_file).convert("L")
    img = img.resize(IMAGE_SIZE)
    arr = np.array(img)
    
    # first show the image to process
    plt.axis('off')
    plt.imshow(arr, cmap='gray')
    plt.show()

    # convert the image to HTTP POST payload
    payload = {
      "instances": [{'input_image': arr.tolist()}]
    }

    # make the HTTP POST call
    r = requests.post(SERVING_URL, json=payload)
    result = json.loads(r.content)

    # get the Prediction
    print("Prediction: ", np.argmax(result['predictions'][0]))    
    
make_Prediction('digit_test1.png')
make_Prediction('digit_test2.png')

RESULT

This client has no dependency on TensorFlow. It makes an API call to our TF-Serving Model Server and gets back the prediction in JSON format. You can now call this REST endpoint served through TF-Serving in a web or mobile application by passing the input image to be processed and parameter to it.

This was just a quick and easy demo. Machine Learning Platforms and Model Serving is an extremely active and growing field. Many new Products are making their way and making life easy for data scientists. Kubeflow is starting to integrate new serving components like Seldon-core - which supports serving Models developed in R, Spark and H2O. Kubeflow is also adding support for hosting TensorRT from NVIDIA. TensorRT is an inferencing engine with many optimizations for Deep Models like Layer fusion and Precision calibration that improve the inference performance of your Model.

Let's wait and see where this leads to and what the Data Science Platform will look like in 2 years. Thoughts?

DISCLAIMER: This post is not a reflection of views of General Electric. This is just a personal hobby post I have written to share knowledge on this topic.

Amit Choudhary 7y

Lots of new terms to learn from. Thanks Sir!

1 Reaction

See more comments

To view or add a comment, sign in

A Platform for Machine Learning

Dattaraj Rao

More articles by Dattaraj Rao

Others also viewed

The Importance of Databases for Machine Learning

How AI is Transforming the Role of a Data Engineer

Harnessing the Power of Azure Cosmos DB as a Vector Database

Version Datasets in MLOps

Tuning of Machine Learning Solution on Very Large Data Set (VLDS): Spark has it all defined

Machine Learning - in Plain English

Data Science in Microsoft Fabric

Tuning of Machine Learning Solution on Very Large Data Set (VLDS): Spark has it all defined.

Unlocking the Power of Data Engineering: Spark, Airflow, PostgreSQL, AI/ML & Generative AI & Agentic AI

Building and Deploying Machine Learning Models at Scale: Harnessing the Power of Azure and Kubernetes

Machine Learning Frameworks

How to Get Entry-Level Machine Learning Jobs

Applying GenAI and ML in AWS Projects

Building Custom AI Models for AWS Workflows

Explore content categories

More articles by Dattaraj Rao

Learnings from Claude Code Leak

From Copilots to Autonomous Enterprises

Responsible AI for Agents via MCP

Copilots, Agents and Resources

AI Ladder of Intelligence

Agents Everywhere!

Future of Agentic AI in Financial Services

Reimagining Business with Generative AI: A New Era of Corporate Creativity

State Graphs as Agent workflows

One Agent vs Many - Managing expectations

Others also viewed

The Importance of Databases for Machine Learning

How AI is Transforming the Role of a Data Engineer

Harnessing the Power of Azure Cosmos DB as a Vector Database

Version Datasets in MLOps

Tuning of Machine Learning Solution on Very Large Data Set (VLDS): Spark has it all defined

Machine Learning - in Plain English

Data Science in Microsoft Fabric

Tuning of Machine Learning Solution on Very Large Data Set (VLDS): Spark has it all defined.

Unlocking the Power of Data Engineering: Spark, Airflow, PostgreSQL, AI/ML & Generative AI & Agentic AI

Building and Deploying Machine Learning Models at Scale: Harnessing the Power of Azure and Kubernetes

Similar topics

Machine Learning Frameworks

How to Get Entry-Level Machine Learning Jobs

Applying GenAI and ML in AWS Projects

Building Custom AI Models for AWS Workflows

Explore content categories