My experience in Machine Learning
I have been thinking of trying ML- Machine Learning- for some, and finally, in last 2 weeks spent time to learn and run few experiments using AzureML and AWS Sagemaker.
In the entire process of ML - Data acquisition, Data Preparation, Modelling & Deployment - I realized that model training is a very small piece of the entire exercise. Most of the time and efforts are really needed in data acquisition and preparation. If the organization already has analytics practice setup then data acquisition and data preparation won’t be such a big trouble assuming that data assets, data lake and data views are already setup and domain experts to support analytics are available etc.
Based on my learning in running experiments here is what is important.
Data acquisition - How to make the data available for analysis? Identifying the data source and make it accessible to your machine learning platform. AzureML, AWS and Google are capable of consuming CSV using URLs, hence simply exposing your in-premise/cloud data using URLs is sufficient.
Data preparation - At this step, the key is to identify and prepare data fields which contribute to the problem being addressed - domain knowledge and data schema understanding is important here. This activity often needs knowledge in additional tools like excel, cloud tables, SQL or R language etc. I decided to use R, probably in near future I'll try Python also.
I started with some simple examples of regression(prediction), like house prices, and then tried my hands on predicting machine failure. For this one I faced some complexities to prepare data for modelling - had to learn data analysis techniques like correlations , cross- tabulations, handling missing and outliers etc. After building the model and testing the individual machine level failure prediction was accurate to 95+% for all the 5 algorithms I tried.
Other problem I tried was classification- classification of email as spam or not. Here I was able to hit accuracy of 80%+. One of the real world use cases I'll probably try is to identify the ticket support group-based on email contents and then automate the assignment.
I am yet to explore the other 2 types of data problems - Clustering & Anomaly detection. That's for next 2 weeks.
Between the two platforms like AWS and AzureML I find AzureML is easy to start with. Development visual and scripts are needed only when one wants to use R features for data preparation. AWS is more script oriented. Deployment of model as web service is easy on both the platforms. Of the 2 platforms my personal choice is AzureML just because it let's you achieve a lot without learning scripting languages like Python or R. But for long term it is beneficial to learn at least 1 scripting language.
Overall, it was a good skill addition and experience.
Good insight .. Thanks