Practical - convert Azure BatchAI to Azure Machine Learning services

Zenodia Charpy

Published Jan 9, 2019

+ Follow

Azure BatchAI will be replaced by Azure Machine Learning compute !

so this post is a practical one that gives you tips on how one can do the convertion.

basically you will need to do 5 things in order to complete this convertion

(1) you will need to create an Azure Machine Learning Workspace ( if you dont already have one ) , otherwise simply load from config file you saved last time !

(2) supply the configuration you want for the remote compute_target , here we want to convert BatchAI ( with GPU N-series virtual machines ) configuration to the Azure Machine learning Compute

(3) create an experiment and supply the python scripts you want it to run

(4) make sure to specify the estimator ( = the deep learning framework you prefer ), here I am using keras on tensorflow ( hence I choose Tensorflow as an estimator) , then pip install the rest of the other necessary packages I need . Also don't forget to specify the script params associating with the python script you want to run inside the estimator

(5) then run it and see the output directly from within the notebook

Step 1:

first of all you will need to have the Azure Machine Learning workspace , which you can create in Azure portal via this link , otherwise directly load from previously saved config file into workspace

In [1]: # check that your python SDK is up-to-date

import azureml.core
print("SDK version:", azureml.core.VERSION)
SDK version: 1.0.2

In [2]: # load previously saved workspace config file (credentials) into your workspace

if you dont have one then create one via this link

from azureml.core.workspace import Workspace
ws = Workspace.from_config()

Step 2:

In [5]: # AmlCompute is the one that is replacing BatchAI compute

from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

In [7]: # register your azure fileshare ( which contains data saved on Azure fileshare ) to Datastore object , this is the trickest part , since you cannot really ''look'' into what is inside of this registered path , it must be mounted first onto your compute target before you could use it's path for real, but more on that later ...

 
from azureml.core import Datastore
# only need to do it once
ds2 = Datastore.register_azure_file_share(workspace=ws, 
                                         datastore_name='give_a_name', 
                                         file_share_name='your_file_share_name',
                                         account_name='your_storage_acc_name', 
                                         account_key='your_storage_acc_key',
                                         create_if_not_exists=False)

In [8]: # create your GPU virtual machines

from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# choose a name for your cluster
cluster_name = "gpucluster"

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', 
                                                           max_nodes=4)

    # create the cluster
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)

    compute_target.wait_for_completion(show_output=True)

# Use the 'status' property to get a detailed status for the current cluster. 
print(compute_target.status.serialize())
Found existing compute target
{'allocationState': 'Steady', 'allocationStateTransitionTime': '2019-01-09T13:21:06.234000+00:00', 'creationTime': '2019-01-08T10:23:55.033355+00:00', 'currentNodeCount': 0, 'errors': None, 'modifiedTime': '2019-01-08T10:25:35.793472+00:00', 'nodeStateCounts': {'idleNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0, 'preparingNodeCount': 0, 'runningNodeCount': 0, 'unusableNodeCount': 0}, 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 0, 'maxNodeCount': 4, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'targetNodeCount': 0, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_NC6'}

Step 3:

In [20]: # create experiment and give it a name

from azureml.core import Experiment

experiment_name = 'keras-tf-exp'
experiment = Experiment(ws, name=experiment_name)

In [34]: # create a folder

import os

project_folder = './keras-ctscan-folder'
os.makedirs(project_folder, exist_ok=True)

In [35]: copy over keras_cnn_pydicom.py python script from local to your remote compute target folder you just created above

import shutil


shutil.copy('keras_cnn_pydicom.py', project_folder)

Out[35]: examine the python script and make SURE to checkout the tips within inline comments , it is important to make it work !

tip1 - os.makedirs(args.data, exist_ok=True) # here args.data is equal to ds2.path() which you will supply when specifying the estimator script_params

tip2- the relative mounting path is usable after tip1 , however, if you have to access the sub folders from within that mounting path, then you will have to hard code the path like i did below

Step 4:

In [38]: # pay attention to the script_params as it is closely connected to the python script tips we went through above

from azureml.train.dnn import TensorFlow 
script_params={
    '--data': ds2.path(),
    '--epoch': 1,
    '--save_model':'/outputs'
}

estimator = TensorFlow(source_directory=project_folder,
                      compute_target=compute_target,
                      entry_script='keras_cnn_pydicom.py',
                      script_params=script_params,
                      node_count=1,
                      process_count_per_node=1,
                      #distributed_backend='mpi',    
                      pip_packages=['pydicom','keras','scikit-image','scikit-learn','scipy','argparse',
                                    'opencv-contrib-python-headless','pillow','numpy', 'pandas','matplotlib'],
                      #custom_docker_base_image='zecharpy/tfgpupy3:pydicom',
                      use_gpu=True)

Step 5:

run the experiment and sit back , once you reached this message you know you run successfully

To view or add a comment, sign in

Practical - convert Azure BatchAI to Azure Machine Learning services

Zenodia Charpy

More articles by Zenodia Charpy

Others also viewed

Getting Started With Google Colab: A Simple Tutorial for the Frustrated and Confused

PySpark Structured Streaming in Spark 2

Deploying a Machine Learning model in the local disk and on the AWS server using Flask.

PyFunc in MLflow: A Beginners' Guide to Custom Model Deployment

MIT's Recursive Language Model (RLMs) on AWS using Strands and Amazon Bedrock AgentCore

AWS SageMaker based Feature Engineering in Jupyter Notebook

Start exploring Machine Learning in Minutes

Machine Learning for IT Admins

Udacity Machine Learning Engineer Nanodegree Project

Raspberry Pi Camera + Serverless + Azure Cognitive + Twilio = Fun?

Explore content categories

More articles by Zenodia Charpy

experiment TF2.0 training on 1 CPU vs. 1 GPU vs.multiple-GPUs+MixedPrecision with Unet model

train your model with augmented data using DALI

Automate data augmentation with Nvidia's DALI

experiment on using docker + inject dynamic variables to tune a unet's hyper-params

rapidsai experiment round 2- pagerank algorithm with cugraph

converting Keras 2 Tensorflow2.0

The out-of-this-world experience with Rapidsai round 1: CPU(with sklearn) vs. GPU(with cuml)

ActiveLearning with Keras with TF backend with K80 GPU

experiment NGC on Azure - HPC at your fingertip !

serve jupyter notebook with docker, containerise with all your favorite python libraries to enable reproducibility & collaboration