Practical - convert Azure BatchAI to Azure Machine Learning services
Azure BatchAI will be replaced by Azure Machine Learning compute !
so this post is a practical one that gives you tips on how one can do the convertion.
basically you will need to do 5 things in order to complete this convertion
(1) you will need to create an Azure Machine Learning Workspace ( if you dont already have one ) , otherwise simply load from config file you saved last time !
(2) supply the configuration you want for the remote compute_target , here we want to convert BatchAI ( with GPU N-series virtual machines ) configuration to the Azure Machine learning Compute
(3) create an experiment and supply the python scripts you want it to run
(4) make sure to specify the estimator ( = the deep learning framework you prefer ), here I am using keras on tensorflow ( hence I choose Tensorflow as an estimator) , then pip install the rest of the other necessary packages I need . Also don't forget to specify the script params associating with the python script you want to run inside the estimator
(5) then run it and see the output directly from within the notebook
Step 1:
first of all you will need to have the Azure Machine Learning workspace , which you can create in Azure portal via this link , otherwise directly load from previously saved config file into workspace
In [1]: # check that your python SDK is up-to-date
import azureml.core
print("SDK version:", azureml.core.VERSION)
SDK version: 1.0.2
In [2]: # load previously saved workspace config file (credentials) into your workspace
if you dont have one then create one via this link
from azureml.core.workspace import Workspace
ws = Workspace.from_config()
Step 2:
In [5]: # AmlCompute is the one that is replacing BatchAI compute
from azureml.core.compute import ComputeTarget, AmlCompute from azureml.core.compute_target import ComputeTargetException
In [7]: # register your azure fileshare ( which contains data saved on Azure fileshare ) to Datastore object , this is the trickest part , since you cannot really ''look'' into what is inside of this registered path , it must be mounted first onto your compute target before you could use it's path for real, but more on that later ...
from azureml.core import Datastore
# only need to do it once
ds2 = Datastore.register_azure_file_share(workspace=ws,
datastore_name='give_a_name',
file_share_name='your_file_share_name',
account_name='your_storage_acc_name',
account_key='your_storage_acc_key',
create_if_not_exists=False)
In [8]: # create your GPU virtual machines
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
# choose a name for your cluster
cluster_name = "gpucluster"
try:
compute_target = ComputeTarget(workspace=ws, name=cluster_name)
print('Found existing compute target')
except ComputeTargetException:
print('Creating a new compute target...')
compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',
max_nodes=4)
# create the cluster
compute_target = ComputeTarget.create(ws, cluster_name, compute_config)
compute_target.wait_for_completion(show_output=True)
# Use the 'status' property to get a detailed status for the current cluster.
print(compute_target.status.serialize())
Found existing compute target
{'allocationState': 'Steady', 'allocationStateTransitionTime': '2019-01-09T13:21:06.234000+00:00', 'creationTime': '2019-01-08T10:23:55.033355+00:00', 'currentNodeCount': 0, 'errors': None, 'modifiedTime': '2019-01-08T10:25:35.793472+00:00', 'nodeStateCounts': {'idleNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0, 'preparingNodeCount': 0, 'runningNodeCount': 0, 'unusableNodeCount': 0}, 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 0, 'maxNodeCount': 4, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'targetNodeCount': 0, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_NC6'}
Step 3:
In [20]: # create experiment and give it a name
from azureml.core import Experiment
experiment_name = 'keras-tf-exp'
experiment = Experiment(ws, name=experiment_name)
In [34]: # create a folder
import os
project_folder = './keras-ctscan-folder'
os.makedirs(project_folder, exist_ok=True)
In [35]: copy over keras_cnn_pydicom.py python script from local to your remote compute target folder you just created above
import shutil
shutil.copy('keras_cnn_pydicom.py', project_folder)
Out[35]: examine the python script and make SURE to checkout the tips within inline comments , it is important to make it work !
tip1 - os.makedirs(args.data, exist_ok=True) # here args.data is equal to ds2.path() which you will supply when specifying the estimator script_params
tip2- the relative mounting path is usable after tip1 , however, if you have to access the sub folders from within that mounting path, then you will have to hard code the path like i did below
Step 4:
In [38]: # pay attention to the script_params as it is closely connected to the python script tips we went through above
from azureml.train.dnn import TensorFlow
script_params={
'--data': ds2.path(),
'--epoch': 1,
'--save_model':'/outputs'
}
estimator = TensorFlow(source_directory=project_folder,
compute_target=compute_target,
entry_script='keras_cnn_pydicom.py',
script_params=script_params,
node_count=1,
process_count_per_node=1,
#distributed_backend='mpi',
pip_packages=['pydicom','keras','scikit-image','scikit-learn','scipy','argparse',
'opencv-contrib-python-headless','pillow','numpy', 'pandas','matplotlib'],
#custom_docker_base_image='zecharpy/tfgpupy3:pydicom',
use_gpu=True)
Step 5:
run the experiment and sit back , once you reached this message you know you run successfully