Cloud and Democratisation of Machine Learning Technology
Machine Learning(ML) is a subfield of the much broader field Artificial Intelligence(AI). In recent times, the meteoric rise of AI has been driven by the breakthroughs in Deep Leaning, a subset of ML. Deep learning uses neural network algorithms to solve some tricky problems like image recognition, pattern discovery. AI powered by ML is undoubtedly one of the finest achievements of mankind. Some even call it the final frontier!! Thankfully Skynet is still a stuff of fiction.
Given the potential of ML to touch and enhance every aspect of our lives, it can be argued that the knowledge and tools of ML should be accessible to everyone. Thanks to the the open source revolution and the brilliant, innovative offerings by leading Cloud Service Providers(CSP), these aspirations are not completely Utopian.
Democratisation of machine learning refers to universal accessibility of ML technology, harnessing the ubiquitousness and cost effectiveness of public cloud and potentially usable by anybody for the betterment of quality of life of the individual or the community.
Before we delve into details of democratisation of ML as defined above, let's recap the concept of ML itself
What is the model in Machine Learning ?
Model is the representation of what a machine learning system has learned from the training data. Model training is the process where the weight of the input features are determined by trial and error methods leveraging clever algorithms. if we zoom into the Training Process step, the following picture emerges! This give a glimpse of what goes under the hood of making a ML model.
Let's admit it. ML is complex. The above diagram captures just a fraction of what really goes into building a scalable machine learning model which depends on the algorithm used and the type of learning(supervised, unsupervised, reinforced).
Access to machine learning tools, techniques, frameworks and libraries
Barring some advanced research not yet available in public domain, the majority of ML frameworks, tools, techniques, libraries are open source and readily available to anybody who cares to use it. Some of the popular open sources frameworks and libraries like SparkML, Scicit-learn, XGBoost, Pytorch, Keras, Tensorflow, MXnet, Pandas, Numpy are readily available for download. All you need is a laptop and (preferably) some python skills and a Juypter notebook. The Cloud Service Providers already incorporate some of these frameworks in their service offerings.
Access to cloud based resources for practice, visualisation
While downloading the libraries locally and using is for learning is a great first step, one needs access to the next level of learning resources to play around and experiment. AWS provides Sagemaker Studio Lab (announced in Reinvent 2021) while Tensorflow Playground is a phenomenal resource to experiment and visualize ML learning.
The free tier provided by AWS and GCP ($300 credit) can be be utilised towards learning ML tools and services in the respective platforms. AWS has just announced access to ML for developers while Google provides AI scholarship via Deepmind program.
It is also important to note the steps before and after model training activity. Data Preparation and Deploying a model in production and operationalising is sometimes more challenging than building the model itself. It is said that 70% of time in ML is spent on data ingestion, data prep activity and exploratory data analysis. It is hoped that CSPs will make their free tier services for learning ML more expansive .
Access to cost effective ML infrastructure and higher level abstraction and services
Building a model in a laptop can only take you so far! The real model training needs lots of compute power, memory, network bandwidth. CSPs provide this and more e.g. specialised hardware like Tensor Processing Unit (TPU) by GCP in addition to CPU and GPU, custom ML chips from AWS.
AWS and GCP also provide services which abstracts out the complexity of building machine learning models, thereby making it more accessible. The fundamental democratisation principle is, ML should be available not only to those who have PhDs or Masters Degree or to somebody who is an expert in python. It should be available to everyone. Imagine you are an expert in SQL but you want to use machine learning for predictive analytics. BigQuery ML (the AWS equivalent is Redshift ML) is an excellent tool for achieving his. The following example shows how easy it is to create a linear regression model using the data from the specified table.
Recommended by LinkedIn
CREATE OR REPLACE MODEL [model name]
OPTIONS
(model_type='linear_reg',
input_label_cols=[column name]) AS
SELECT
*
FROM
[table name where data is stored]
WHERE
[include approrpiate condition]
Some other features of BigQuery ML which can be leveraged using SQL code are as follows
CSPs actually have taken this into next level of abstraction. They have made pre-trained models available, which are readily usable. Alternatively, you can provide your own labeled data to train your model. GCP calls this service AutoML.
Both AWS and GCP provide ML services for image recognition, natural language processing, sentiment analysis, speech to text, text to speech, translation service.
Amazon SageMaker and GCP Vertex AI provide services for end to end life cycle of machine learning data and model. Recently AWS has announced the availability of Sagemaker Canvas as a no-code ML building tool.
Applying ML in daily life so that it benefits all.
The ambitious idea is to make ML so common place that almost anyone can use it. Imagine a farmer being able to track all his livestock using facial recognition of animals (for example using AutoML and deploying the model on an edge device) or being able to use predictive analytics to find out the best productive crops to sow at the right time, taking into account all the variables. When it comes to using ML for making our lives better, we are only limited by our imagination.
Summary
The democratisation of Machine Learning touches upon four important aspects
ML literacy of all stakeholders will go a long way in making this a reality.
References
[1]https://playground.tensorflow.org
[2]https://cloud.google.com/bigquery-ml/docs
[3]https://www.un.org/en/chronicle/article/towards-ethics-artificial-intelligence
[4]https://www.ibm.com/blogs/systems/ai-machine-learning-and-deep-learning-whats-the-difference/
[5]https://aws.amazon.com/machine-learning/scholarship/
[6]https://deepmind.com/scholarships
[7]https://aws.amazon.com/blogs/aws/announcing-amazon-sagemaker-canvas-a-visual-no-code-machine-learning-capability-for-business-analysts/
Spelling? Juypter notebook