Testing of ML(Machine Learning) Model Performance and various options to do so

Girish Chiriki V

Published Aug 30, 2017

Once you have defined your problem and prepared your data, you need to apply machine learning algorithms to the data in order to solve your problem. You can spend a lot of time in choosing, running and tuning algorithms. You want to make sure you are using your time effectively to get closer to your goal.

Test Harness

We need to define a test harness. The test harness is the data we will train and test an algorithm against and the performance measure we will use to assess its performance. It is important to define test harness well so that we can focus on evaluating different algorithms and thinking deeply about the problem.

Goal: The goal of the test harness is to be able to quickly and consistently test algorithms against a fair representation of the problem being solved. The outcome of testing multiple algorithms against the harness will be an estimation of how a variety of algorithms perform on the problem against a chosen performance measure. You will know which algorithms might be worth tuning on the problem and which should not be considered further.

Test and Train Datasets

From the transformed data (after Data Processing), you will need to select a test set and a training set. An algorithm will be trained on the training dataset and will be evaluated against the test set. This may be as simple as selecting a random split of data (66% for training, 34% for testing) or may involve more complicated sampling methods.

A trained model is not exposed to the test dataset during training and any predictions made on that dataset are designed to be indicative of the performance of the model in general.

Weka: We can use weka to test the performance of any ML Model.

We can also compare the Performance of Algorithms
We can estimate Model Performance
Descriptive Stats and Visualization
Baseline Performance: When we start evaluating multiple machine learning algorithms on dataset, we need a baseline for comparison.
A baseline result gives you a point of reference to know whether the results for a given algorithm are good or bad, and by how much

PredicT-ML: It is an automated version of Weka, with added support for automated temporal aggregation. Mainly used for clinical data.

PredicT-ML performs more tests systematically and can produce models achieving accuracy closer to the theoretical limit
This is to automate building machine learning predictive models with big clinical data and to support fast iterative machine learning.
The software will enable healthcare administrators and researchers to rapidly ask a series of what-if questions when probing opportunities to use predictive models to improve outcomes and reduce costs for various diseases and patient populations. Existing machine learning tools cannot do this.

To view or add a comment, sign in

Testing of ML(Machine Learning) Model Performance and various options to do so

Girish Chiriki V

Test Harness

Test and Train Datasets

More articles by Girish Chiriki V

Others also viewed

Applying Machine learning to solve business problems

9-Step Guide to Building Machine Learning Models

Training Data vs Test Data in Machine Learning - Essential Guide

Which is more important in ML - data or intelligent algorithms?

Step-by-Step Guide to Cross Validation in Machine Learning

4 steps in building effective machine learning models

Designing Robust Machine Learning Systems for Real-World Data

Artificial Intelligence, Machine learning and Data Science

Approaches to Machine Learning

How to Optimize Machine Learning Performance

How Quantization is Transforming Model Performance

Tips for Creating a Machine Learning Experimentation Environment

Best Practices For Evaluating Predictive Analytics Models

Machine Learning Models For Healthcare Predictive Analytics

Explore content categories

Test Harness

Test and Train Datasets

More articles by Girish Chiriki V

Amazon Databases 's: Relational DBMS's, Key Value DB, Document DB, In-memory DB, Graph DB, Time series DB & Ledger DB

Key Factors require to Monitor Machine Learning Model Performance

Pull data from ADOBE Data Workbench - using API

key findings/observations: Object based storage (Amazon S3) Vs Block storage (Amazon Elastic Block Storage)

SIMPLE UNDERSTANDING: Difference between Data Engineer, Data Analyst and Data Scientist

Sqoop export optimization

Web Scraping using R-Selenium

Automation of HIVE and MongoDB using VB and JAVA/Eclipse

Others also viewed

Applying Machine learning to solve business problems

9-Step Guide to Building Machine Learning Models

Training Data vs Test Data in Machine Learning - Essential Guide

Which is more important in ML - data or intelligent algorithms?

Step-by-Step Guide to Cross Validation in Machine Learning

4 steps in building effective machine learning models

Designing Robust Machine Learning Systems for Real-World Data

Artificial Intelligence, Machine learning and Data Science

Approaches to Machine Learning

Similar topics

How to Optimize Machine Learning Performance

How Quantization is Transforming Model Performance

Tips for Creating a Machine Learning Experimentation Environment

Best Practices For Evaluating Predictive Analytics Models

Machine Learning Models For Healthcare Predictive Analytics

Explore content categories