Machine Learning Using SAS Viya

MANOJ PAUL

Published Jul 17, 2019

Machine learning is a branch of artificial intelligence which builds systems based on learning iteratively from data and make predictions in an automated fashion with minimal human intervention. The main characteristics of machine learning are automation, customization, and acceleration. Machine learning runs based on algorithms which pass through each data, identifies patterns and make predictions based on behaviour while continuously learn and improve.

Businesses are facing lots of challenges for storing, analyzing and making predictions with large volumes and complex data for making business decisions in real time. Therefore, an effective business decisions require software like SAS Viya for predictive modeling techniques to manage the analytical models for optimal performance. There are few of many areas predictive modeling can be used such as

· Fraud or anomaly detection - involving financial/non-financial transactions where identifying patterns in the data which is outside of normal behaviour

· Target marketing or direct marketing - identifying people who have a more probability of responding to the marketing action and designing a campaign to target them. Predictive modeling helps to identify Who should I reach to? What can I offer? When and how can I make an offer?

· Customer Churn - identifying which customers are more likely to churn in the coming months or next year. Acquiring a new customer is more expensive than retaining the current one. This is a challenge for businesses how to retail the current clients and identifying the root-cause of the churn rate in advance before they leave.

· Financial Risk management models helps business managers to predict credit default, false insurance claims, probabilities in loan default or not complying with regulatory compliance. Adopting various financial risk management models will help businesses to approve loans or deals while maintaining proper hedging models using forward contracts, swaps or options to maintain a healthy liquidity position and capital adequacy on the balance.

· Process monitoring applications to identify deviations from the normal business process in manufacturing or security breach

· Pattern detection in medical diagnostics

· Text mining analysis for customer sentiment based on words are used in their comments

· Recommender systems make recommendation based on users’ rating on movies, songs or recent purchasing/searching behaviour

· Predictive asset maintenance for an effective retirement planning

· Network optimization model to identify and resolve network bottlenecks to maintain the quality of the service and optimize the network for the most valuable customers

The analytics life cycle has three phases: Data, Discovery and Deployment. In the data phase, we need to explore and prepare data for analysis. The discovery phase helps us to detect something which we did not know before that exists and we build multiple models based on our business objectives to fit and deploy. In the deployment stage, we put the model to work. We use the model to the new data and the process is called scoring. The value of machine learning is evident in the entire analytics life cycle which provides actionable insights at each phase.

Predictive modeling generally known as supervised prediction or supervised learning starts with a training data set. The variables include inputs and a target.

In a telecommunication business to identify which customers are going to churn, inputs (known as predictors, features, explanatory variables or independent variables) are generally the contract age, customer’s age, technical support satisfaction, device user or manufacturer.

The target (also known as response, outcome, or dependent variable) indicated the customer churned or not.

A variable can be numeric known also as interval variables. The numeric variable could be classifies into

Continuous (such as income) or

Discrete (counts, number of items purchased)

There are variable can be categorical (qualitative values representing a group or category). Categorical can be Nominal which have no particular order such as occupation or

Ordinal variables have categories with an inherent order for an example the shoe size or

Binary (yes or no).

The purpose of the training data is to construct a predictive model based on the association of the inputs to the target.

A predictive model can generate one of three types of predictions: decisions (classifying a case as a churn or a no-churn), rankings (rank the high-value cases higher than the low-value cases) and estimates (applicable in mathematical expression such as probability of an event is combined with an estimate of profit or loss).

Data preparation is a noisy and time consuming process as there is more possibility of encountering inconsistencies, incomplete records, duplication, and merging problems. Good data is important to lead to good models. Business managers need to make sure that their data is clean, appropriate and reduced to the optimal size for analysis.

Data preparation steps include data collection, exploration, data division, rare events identification, managing missing values, replacing incorrect values, adding unstructured data, extracting features, managing extreme or unusual values and selecting useful inputs.

A complex model may look like flexible but it may have a chance of over fitting (accommodating nuances of the random noise in the sample). The goal of the good model neither it overfits or underfits the data. The right amount of flexibility will give the best generalization. Therefore, data needs to be partitioned into two or three non-overlapping sets.

The first partition is the training set which is used to build models. Once the model is build up on training data, you can assess the performance of each model on the second partition of the data which is the validation set.

The validation data set is used to optimize the complexity of the model and find the sweet spot between bias and variance. You can tune the models to determine additional training data required or not.

The test data sets will give the honest, unbiased estimates of the model's performance. The test data set does provide the final output to see how the model performs on the real new data before put into the production environment.

Under the discovery phase, there are few essential tasks need to be followed such as

Selecting an algorithm

Improving the model

Optimizing the complexity of the model

Regularize and tuning the hyper parameters of the model

Build ensemble model

It is crucial to meet the expectation of the consumer how explainable the model should be. You can choose a support vector machine, neural network model or any flavor of ensemble model to achieve a highly accurate and generalizable model if an uninterpretable prediction is acceptable. You can use decision trees or regression techniques where interpretability or explainable documentation is important.

The Basic template creation under Logistics Regression model for class target in SAS Viya Pipeline

To view or add a comment, sign in

Machine Learning Using SAS Viya

MANOJ PAUL

More articles by MANOJ PAUL

Others also viewed

Solving Difficult Systemic Problems in Processes with Machine Learning Using Alarm Datasets

Machine Learning Model evaluation Metrics

Feature Engineering: Shape Data in its Raw Form for Powerful Machine Learning.

The Importance of Statistics in Machine Learning: A Comprehensive Guide

Statistical and Machine Learning Modelling for the Rest of Us

How To Do Classification And Evaluation in Machine Learning?

ML Models and Lifecycle: A Starter Guide

Choosing the Right Machine Learning Algorithm

Linear Regression with Six Jars of ML

Machine Learning Basics 1: Linear Regression or Decision Trees or Clustering?

Machine Learning Models For Healthcare Predictive Analytics

Best Practices For Evaluating Predictive Analytics Models

Machine Learning Models for Breast Cancer Risk Assessment

Machine Learning in Marketing Analytics

Explore content categories

More articles by MANOJ PAUL

Unlocking Data Insights with SAS SQL: How I Analysed Lung Cancer Data by Age and Gender

Predictive Modelling for Lung Cancer Risk Assessment Using Logistic Regression With SAS SQL

Collective Fairness With Ethical Algorithm

Human Resource Analytics

The SIR Model for Globally Confimed Cases of COVID-19 - The Differential Equation Model

Smart Cities - A General Introduction

LIDAR Technologies - Spatial Data Analysis

Data Visualization With Power BI

USEFUL DATA ANALYSIS EXPRESSIONS (DAX) FUNCTIONS FOR BEGINNERS

Data Modeling techniques With Power Query,Power Pivot & DAX