Machine Learning Using SAS Viya

Machine Learning Using SAS Viya

Machine learning is a branch of artificial intelligence which builds systems based on learning iteratively from data and make predictions in an automated fashion with minimal human intervention. The main characteristics of machine learning are automation, customization, and acceleration. Machine learning runs based on algorithms which pass through each data, identifies patterns and make predictions based on behaviour while continuously learn and improve.

Businesses are facing lots of challenges for storing, analyzing and making predictions with large volumes and complex data for making business decisions in real time. Therefore, an effective business decisions require software like SAS Viya for predictive modeling techniques to manage the analytical models for optimal performance. There are few of many areas predictive modeling can be used such as

·        Fraud or anomaly detection - involving financial/non-financial transactions where identifying patterns in the data which is outside of normal behaviour

·        Target marketing or direct marketing - identifying people who have a more probability of responding to the marketing action and designing a campaign to target them. Predictive modeling helps to identify Who should I reach to? What can I offer? When and how can I make an offer?   

·        Customer Churn - identifying which customers are more likely to churn in the coming months or next year. Acquiring a new customer is more expensive than retaining the current one. This is a challenge for businesses how to retail the current clients and identifying the root-cause of the churn rate in advance before they leave. 

·        Financial Risk management models helps business managers to predict credit default, false insurance claims, probabilities in loan default or not complying with regulatory compliance. Adopting various financial risk management models will help businesses to approve loans or deals while maintaining proper hedging models using forward contracts, swaps or options to maintain a healthy liquidity position and capital adequacy on the balance.

·        Process monitoring applications to identify deviations from the normal business process in manufacturing or security breach

·        Pattern detection in medical diagnostics

·        Text mining analysis for customer sentiment based on words are used in their comments

·        Recommender systems make recommendation based on users’ rating on movies, songs or recent purchasing/searching behaviour

·        Predictive asset maintenance for an effective retirement planning

·        Network optimization model to identify and resolve network bottlenecks to maintain the quality of the service and optimize the network for the most valuable customers

The analytics life cycle has three phases: Data, Discovery and Deployment. In the data phase, we need to explore and prepare data for analysis. The discovery phase helps us to detect something which we did not know before that exists and we build multiple models based on our business objectives to fit and deploy. In the deployment stage, we put the model to work. We use the model to the new data and the process is called scoring. The value of machine learning is evident in the entire analytics life cycle which provides actionable insights at each phase.

Predictive modeling generally known as supervised prediction or supervised learning starts with a training data set. The variables include inputs and a target.

In a telecommunication business to identify which customers are going to churn, inputs (known as predictors, features, explanatory variables or independent variables) are generally the contract age, customer’s age, technical support satisfaction, device user or manufacturer.

The target (also known as response, outcome, or dependent variable) indicated the customer churned or not.

 A variable can be numeric known also as interval variables. The numeric variable could be classifies into

 Continuous (such as income) or

 Discrete (counts, number of items purchased)

There are variable can be categorical (qualitative values representing a group or category). Categorical can be Nominal which have no particular order such as occupation or

Ordinal variables have categories with an inherent order for an example the shoe size or

Binary (yes or no).

The purpose of the training data is to construct a predictive model based on the association of the inputs to the target.

A predictive model can generate one of three types of predictions: decisions (classifying a case as a churn or a no-churn), rankings (rank the high-value cases higher than the low-value cases) and estimates (applicable in mathematical expression such as probability of an event is combined with an estimate of profit or loss). 

Data preparation is a noisy and time consuming process as there is more possibility of encountering inconsistencies, incomplete records, duplication, and merging problems. Good data is important to lead to good models. Business managers need to make sure that their data is clean, appropriate and reduced to the optimal size for analysis.

Data preparation steps include data collection, exploration, data division, rare events identification, managing missing values, replacing incorrect values, adding unstructured data, extracting features, managing extreme or unusual values and selecting useful inputs.

A complex model may look like flexible but it may have a chance of over fitting (accommodating nuances of the random noise in the sample). The goal of the good model neither it overfits or underfits the data. The right amount of flexibility will give the best generalization. Therefore, data needs to be partitioned into two or three non-overlapping sets.

The first partition is the training set which is used to build models. Once the model is build up on training data, you can assess the performance of each model on the second partition of the data which is the validation set. 

The validation data set is used to optimize the complexity of the model and find the sweet spot between bias and variance. You can tune the models to determine additional training data required or not.

The test data sets will give the honest, unbiased estimates of the model's performance. The test data set does provide the final output to see how the model performs on the real new data before put into the production environment.

Under the discovery phase, there are few essential tasks need to be followed such as

Selecting an algorithm

Improving the model

Optimizing the complexity of the model

Regularize and tuning the hyper parameters of the model

Build ensemble model

It is crucial to meet the expectation of the consumer how explainable the model should be. You can choose a support vector machine, neural network model or any flavor of ensemble model to achieve a highly accurate and generalizable model if an uninterpretable prediction is acceptable. You can use decision trees or regression techniques where interpretability or explainable documentation is important.

The Basic template creation under Logistics Regression model for class target in SAS Viya Pipeline

No alt text provided for this image
No alt text provided for this image


To view or add a comment, sign in

More articles by MANOJ PAUL

Others also viewed

Explore content categories