A Platform for Machine Learning

A Platform for Machine Learning

Our general assumption is that Data Science is all about Algorithm development and Model selection. Not quite true in today's BigData world. If you have a neatly packaged CSV file as data source then yes, you can spend quality time figuring out how different features correlate with each other and extract patterns using Machine Learning (ML). However, today’s data Scientist usually spends more than 50% time dealing with other Concerns (areas of focus) like Data Preparation, Cleansing, Distributed Training, Model Deployment and Scaling. Data in the raw format needs to be explored and converted to Features that can be used for building Models. When you have terabytes of Training data, you need a distributed Training setup to train the Model in parallel in minutes rather than hours. You need ability to experiment with several Hyper-parameter combinations to find the best ones for your Model. Finally, once the Model is developed and validated, it needs to be integrated with your application Software code and deployed to Production. You really don’t want your highly paid Data Scientists struggling with these infrastructure Concerns trying to move data out of your Systems or figure how to distribute training jobs or writing Production code to integrate ML Models with Software.

Today, we are seeing an evolution of Data Science or Machine Learning Platforms that try and address these important concerns for Data Scientists. These could be hosted Platforms like AWS SageMaker, Google CloudML or Azure ML Studio. Here all your data needs to be uploaded to respective Cloud storage and they provide components for building a ML pipeline. Cloud vendors like Amazon and Google are continuously adding new and great features to their ML solutions in an effort to attract customers to move their Data (along with their Business) to their Platform. Another option is to use an on-premise Platform like H2O.ai and DataBricks which can connect to your SQL and NoSQL datasources.

These Platforms are pretty awesome but tend to very generic - they aim to solve ML problems in all domains from Manufacturing to Healthcare to Finance to Gaming. If you want something specific to your problem domain - you could select Components and build your own Data Science Platform. OpenSource Technologies like Kubeflow and TensorFlow-Serving try to do exactly this. They give you tools to integrate Components addressing specific ML Concerns and run these in Docker Containers on-top of an Orchestration layer - that is Kubernetes.

Kubernetes (K8S) provides many abstractions to address Infrastructure Concerns like Scaling & Load-balancing (Deployments), Networking (Services), Storage (Persistent Volumes) and Security (Secrets). Kubeflow sits on Kubernetes and provides components that address specific ML Concerns - JupyterHub for Analytics UI, TensorFlow-Jobs for distributed training, TensorFlow-Serving for Model Deployment at scale - and more Components coming as we speak. You can choose either the whole stack or specific components to customize your own ML pipelines on K8S. Let's take a look at one specific Component addressing the Concern of Deployment of ML Models.

I did a quick experiment to covert a Deep Learning Model written in Keras to TensorFlow-Serving format and “Serve” it as a Microservice. Keras, for those who don't know, is a library running on TensorFlow framework that provides many abstractions to build Deep Learning Models with few lines of code. Here is a quick example along with my documented code.

First, let's build a simple Keras Model that learns from MNIST Handwritten Digits dataset and can predict the digit by reading a 28x28 image. This is the 'Hello World' example of Deep Learning and Keras makes this model extremely easy to build.

# Build a simple Keras Model to recognize handwritten digits
import tensorflow as tf
mnist = tf.keras.datasets.mnist

# Train using MNIST dataset which is available in Keras
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

# Build our Deep Learning Model
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(512, activation=tf.nn.relu),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])

# Compile Model - use Adam optimizer and Cross Entropy loss function
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train the Model on MNIST data
model.fit(x_train, y_train, epochs=1)
model.evaluate(x_test, y_test)

# Show Model summary
model.summary()

RESULT

Epoch 1/1
60000/60000 [==============================] - 8s 136us/step - loss: 0.2217 - acc: 0.9347
10000/10000 [==============================] - 0s 33us/step
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten_2 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 512)               401920    
_________________________________________________________________
dropout_2 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_5 (Dense)              (None, 10)                5130      
=================================================================
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0

Next, we will covert this Keras Model into the Prototype buffer format currently used by TF-Serving. This code may not stand the test of time as new TensorFlow and TF-Serving versions are released. But for now for TensorFlow 1.9 things should be good.

import tensorflow as tf

# The export path contains the name and the version of the model
tf.keras.backend.set_learning_phase(0)
#model = tf.keras.models.load_model('./model.h5')
export_path = './MNISTModel/1'

# Fetch the Keras session and save the model
# The signature definition is defined by the input and output tensors
# And stored with the default serving key
with tf.keras.backend.get_session() as sess:
    tf.saved_model.simple_save(
        sess,
        export_path,
        inputs={'input_image': model.input},
        outputs={t.name:t for t in model.outputs})

RESULT

INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: ./MNISTModel/1/saved_model.pb

Finally, a simple command to run a Docker container by passing our Model path as a parameter. If you are deploying this in Kubernetes you will have to add a PersistentVolume and add the Model to that - then provide to your Pod.

$ docker run -it --rm -p 8501:8501 -v '/Users/dattarajrao/tensorflow/MNISTModel:/models/mnist' -e MODEL_NAME=mnist tensorflow/serving

Now you have the Model deployed as a Service and available at a URL. Let's write a simple client to invoke our Model. We will have this client read 2 images from disk and pass these as parameters to predict the Digit.

import requests
import json
import numpy as np
from PIL import Image

import matplotlib.pyplot as plt
%matplotlib inline

# Serving URL where the Model os deployed
SERVING_URL = 'http://localhost:8501/v1/models/mnist:predict'
IMAGE_SIZE = (28,28)

# function to make the Prediction by calling the SERVING_URL
def make_Prediction(img_file):    
    # Convert Image to array
    img = Image.open(img_file).convert("L")
    img = img.resize(IMAGE_SIZE)
    arr = np.array(img)
    
    # first show the image to process
    plt.axis('off')
    plt.imshow(arr, cmap='gray')
    plt.show()

    # convert the image to HTTP POST payload
    payload = {
      "instances": [{'input_image': arr.tolist()}]
    }

    # make the HTTP POST call
    r = requests.post(SERVING_URL, json=payload)
    result = json.loads(r.content)

    # get the Prediction
    print("Prediction: ", np.argmax(result['predictions'][0]))    
    
make_Prediction('digit_test1.png')
make_Prediction('digit_test2.png')

RESULT

This client has no dependency on TensorFlow. It makes an API call to our TF-Serving Model Server and gets back the prediction in JSON format. You can now call this REST endpoint served through TF-Serving in a web or mobile application by passing the input image to be processed and parameter to it.

This was just a quick and easy demo. Machine Learning Platforms and Model Serving is an extremely active and growing field. Many new Products are making their way and making life easy for data scientists. Kubeflow is starting to integrate new serving components like Seldon-core - which supports serving Models developed in R, Spark and H2O. Kubeflow is also adding support for hosting TensorRT from NVIDIA. TensorRT is an inferencing engine with many optimizations for Deep Models like Layer fusion and Precision calibration that improve the inference performance of your Model.

Let's wait and see where this leads to and what the Data Science Platform will look like in 2 years. Thoughts?

DISCLAIMER: This post is not a reflection of views of General Electric. This is just a personal hobby post I have written to share knowledge on this topic.

To view or add a comment, sign in

More articles by Dattaraj Rao

  • Learnings from Claude Code Leak

    Claude Code from Anthropic had a source code leak not due to a hack, but a release packaging issue caused by human…

    22 Comments
  • From Copilots to Autonomous Enterprises

    Agentic AI is no longer a buzzword—it’s becoming the backbone of business transformation. Just last week, JPMorgan…

    1 Comment
  • Responsible AI for Agents via MCP

    Model Context Protocol (MCP) is the talk of the town. Every vendor is trying to build a MCP server to expose their data…

  • Copilots, Agents and Resources

    To manage the dynamics and uncertainty of Business, modern Enterprise stack is getting more loosely coupled via…

    4 Comments
  • AI Ladder of Intelligence

    As organizations race to implement AI, many struggle with a fundamental question: how do you move beyond simple…

    8 Comments
  • Agents Everywhere!

    Top CXOs understand the benefits brought in by Agentic AI and are really excited about the value it will bring to their…

  • Future of Agentic AI in Financial Services

    Agentic AI refers to AI systems endowed with the ability to act independently, learn from their environments, and make…

    1 Comment
  • Reimagining Business with Generative AI: A New Era of Corporate Creativity

    The relentless march of technology has forever altered the way businesses operate, pushing companies to either evolve…

  • State Graphs as Agent workflows

    In the evolving landscape of artificial intelligence, particularly with the advent of Large Language Models (LLMs), the…

  • One Agent vs Many - Managing expectations

    Agents and Agentic workflows are the next evolution of intelligent applications. Frameworks like LangChain, LangGraph…

Others also viewed

Explore content categories