Practical - convert Azure BatchAI to Azure Machine Learning services
https://docs.microsoft.com/en-us/azure/machine-learning/service/azure-machine-learning-release-notes

Practical - convert Azure BatchAI to Azure Machine Learning services

Azure BatchAI will be replaced by Azure Machine Learning compute !

so this post is a practical one that gives you tips on how one can do the convertion.

basically you will need to do 5 things in order to complete this convertion

(1) you will need to create an Azure Machine Learning Workspace ( if you dont already have one ) , otherwise simply load from config file you saved last time !

(2) supply the configuration you want for the remote compute_target , here we want to convert BatchAI ( with GPU N-series virtual machines ) configuration to the Azure Machine learning Compute

(3) create an experiment and supply the python scripts you want it to run

(4) make sure to specify the estimator ( = the deep learning framework you prefer ), here I am using keras on tensorflow ( hence I choose Tensorflow as an estimator) , then pip install the rest of the other necessary packages I need . Also don't forget to specify the script params associating with the python script you want to run inside the estimator

(5) then run it and see the output directly from within the notebook

Step 1:

first of all you will need to have the Azure Machine Learning workspace , which you can create in Azure portal via this link , otherwise directly load from previously saved config file into workspace


In [1]: # check that your python SDK is up-to-date

import azureml.core
print("SDK version:", azureml.core.VERSION)
SDK version: 1.0.2

In [2]: # load previously saved workspace config file (credentials) into your workspace

if you dont have one then create one via this link

from azureml.core.workspace import Workspace
ws = Workspace.from_config()

Step 2:

In [5]: # AmlCompute is the one that is replacing BatchAI compute

from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

In [7]: # register your azure fileshare ( which contains data saved on Azure fileshare ) to Datastore object , this is the trickest part , since you cannot really ''look'' into what is inside of this registered path , it must be mounted first onto your compute target before you could use it's path for real, but more on that later ...

 
from azureml.core import Datastore
# only need to do it once
ds2 = Datastore.register_azure_file_share(workspace=ws, 
                                         datastore_name='give_a_name', 
                                         file_share_name='your_file_share_name',
                                         account_name='your_storage_acc_name', 
                                         account_key='your_storage_acc_key',
                                         create_if_not_exists=False)
                                         

In [8]: # create your GPU virtual machines

from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# choose a name for your cluster
cluster_name = "gpucluster"

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', 
                                                           max_nodes=4)

    # create the cluster
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)

    compute_target.wait_for_completion(show_output=True)

# Use the 'status' property to get a detailed status for the current cluster. 
print(compute_target.status.serialize())
Found existing compute target
{'allocationState': 'Steady', 'allocationStateTransitionTime': '2019-01-09T13:21:06.234000+00:00', 'creationTime': '2019-01-08T10:23:55.033355+00:00', 'currentNodeCount': 0, 'errors': None, 'modifiedTime': '2019-01-08T10:25:35.793472+00:00', 'nodeStateCounts': {'idleNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0, 'preparingNodeCount': 0, 'runningNodeCount': 0, 'unusableNodeCount': 0}, 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 0, 'maxNodeCount': 4, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'targetNodeCount': 0, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_NC6'}


Step 3:

In [20]: # create experiment and give it a name

from azureml.core import Experiment

experiment_name = 'keras-tf-exp'
experiment = Experiment(ws, name=experiment_name)

In [34]: # create a folder

import os

project_folder = './keras-ctscan-folder'
os.makedirs(project_folder, exist_ok=True)

In [35]: copy over keras_cnn_pydicom.py python script from local to your remote compute target folder you just created above

import shutil


shutil.copy('keras_cnn_pydicom.py', project_folder)

Out[35]: examine the python script and make SURE to checkout the tips within inline comments , it is important to make it work !

tip1 - os.makedirs(args.data, exist_ok=True) # here args.data is equal to ds2.path() which you will supply when specifying the estimator script_params

tip2- the relative mounting path is usable after tip1 , however, if you have to access the sub folders from within that mounting path, then you will have to hard code the path like i did below


Step 4:

In [38]: # pay attention to the script_params as it is closely connected to the python script tips we went through above

from azureml.train.dnn import TensorFlow 
script_params={
    '--data': ds2.path(),
    '--epoch': 1,
    '--save_model':'/outputs'
}

estimator = TensorFlow(source_directory=project_folder,
                      compute_target=compute_target,
                      entry_script='keras_cnn_pydicom.py',
                      script_params=script_params,
                      node_count=1,
                      process_count_per_node=1,
                      #distributed_backend='mpi',    
                      pip_packages=['pydicom','keras','scikit-image','scikit-learn','scipy','argparse',
                                    'opencv-contrib-python-headless','pillow','numpy', 'pandas','matplotlib'],
                      #custom_docker_base_image='zecharpy/tfgpupy3:pydicom',
                      use_gpu=True)

Step 5:

run the experiment and sit back , once you reached this message you know you run successfully



To view or add a comment, sign in

More articles by Zenodia Charpy

Others also viewed

Explore content categories