Random to “Rotation Forest”: A new classifier ensemble method

Saikat Chakraborty

Published May 12, 2017

Rotation Forest is a method for generating classifier ensembles based on feature extraction. To create the training data for a base classifier, the feature set is randomly split into K subsets (K is a parameter of the algorithm) and Principal Component Analysis (PCA) is applied to each subset. All principal components are retained in order to preserve the variability information in the data. Thus, K axis rotations take place to form the new features for a base classifier. The idea of the rotation approach is to encourage simultaneously individual accuracy and diversity within the ensemble. Diversity is promoted through the feature extraction for each base classifier. Decision trees were chosen here because they are sensitive to rotation of the feature axes, hence the name "forest." Accuracy is sought by keeping all principal components and also using the whole data set to train each base classifier. Rodríguez JJ, Kuncheva and LI, Alonso CJ. examined the Rotation Forest ensemble on a random selection of 33 benchmark datasets from the UCI repository and compared it with Bagging, AdaBoost, and Random Forest. The results were favourable to Rotation Forest to a great extent. So, it is worth a try!

It has been implemented in R through the package “rotationForest”. The data I have used here is a set of around 30,000 observation contained in a credit default data set taken from https://archive.ics.uci.edu/ml/datasets.html

The implementation is very easy and super-fast! It is implemented using the following command in R.

rotationForest (x, y, K = 3, L = 10)

Arguments:

1) X: A data frame of predictors (numeric, or integer). Categorical variables need to be transformed to indicator (dummy) variables.

2) Y: A factor containing the response vector. Only {0,1} is allowed.

3) K: The number of variable subsets.

4) L: The number of base classifiers. The default is 10.

This is followed by the S3 method "predict" to generate a prediction.

Predict (object, new data)

Below is a screen shot from R-Studio. The accuracy level I got in the first run is in the order of ~85%. Not bad!

Adrián Arnaiz Rodríguez 6y

is it implemented in a default library of R, in a library of CRAN, a customized library..? A good dataset for testing the Rotation Forest is the spiral Dataset.In this dataset the default random forest is not able to make good partitions only based in vertical and horizontal cuts. With random forest you improve much better the accuracy and the partitions. So interesting!

2 Reactions

To view or add a comment, sign in

More articles by Saikat Chakraborty

Are LLMs Actually Intelligent? A Critical, Sourced Look at the Most Contested Question in AI

Apr 22, 2026

Are LLMs Actually Intelligent? A Critical, Sourced Look at the Most Contested Question in AI

1. The Mechanistic Reality — and Why It Matters Let us start with what the critics get right.

3 Comments
The Needle-in-a-Haystack Problem in Large Language Models

Feb 24, 2026

The Needle-in-a-Haystack Problem in Large Language Models

A group of data scientists recently performed an interesting experiment. They took all the seven Harry Potter books…

2 Comments
Prompt Drift: The Silent Killer of Production AI Systems

Nov 9, 2025

Prompt Drift: The Silent Killer of Production AI Systems

Every great AI project starts with a perfect prompt. You test it.

4 Comments
Hallucinations in LLMs and Agentic Systems

Oct 30, 2025

Hallucinations in LLMs and Agentic Systems

We have all encountered AI Hallucinations. AI hallucination happens when an AI system generates information that sounds…

11 Comments
The Magical Machine at the Calcutta Book Fair, 1989(or How I Met My First Computer and Lived to Tell the Tale!)

Oct 6, 2025

The Magical Machine at the Calcutta Book Fair, 1989(or How I Met My First Computer and Lived to Tell the Tale!)

It was the Calcutta Book Fair of 1989, that grand annual carnival of paper, dust, and dreams. My father and I were…
Few Lessons That Shaped My Life

Sep 5, 2025

Few Lessons That Shaped My Life

Teachers are not just people who stand in front of a blackboard; they are architects of the soul. Looking back on my…

20 Comments
The Lottery Ticket Hypothesis: How Large and even Larger language Models generalize so well?

Jun 8, 2025

The Lottery Ticket Hypothesis: How Large and even Larger language Models generalize so well?

Large Language Models are getting better and so are the number of parameters. For example, GPT-3: Approximately 175…

8 Comments
Optimizing Business Transformation Initiatives

Feb 5, 2025

Optimizing Business Transformation Initiatives

In their 2023 article published in Harvard Business Review,” What’s Derailing Your Company’s Transformation?”, Scott D.…
How DeepSeek Works? The Mixture of Experts Architecture

Jan 29, 2025

How DeepSeek Works? The Mixture of Experts Architecture

Opinions expressed in this short article are mine and has no connection to the organization I work for. DeepSeek works…

6 Comments
Multi Agent Orchestration using Autogen to create sequential data processing & demand forecasting

Dec 10, 2024

Multi Agent Orchestration using Autogen to create sequential data processing & demand forecasting

All opinions and contents expressed in this article are mine and not of the organization I work for AutoGen is an…

See all articles

Random to “Rotation Forest”: A new classifier ensemble method

Saikat Chakraborty

More articles by Saikat Chakraborty

Others also viewed

Anatomy of a Good Graph

Curse of Dimensionality

Fourier Transforms – Another Way to Look at Longitudinal Data

I ran 580 model-dataset experiments to show that, even if you try very hard, it is almost impossible to know that a model is degrading just by looking

Multiple Linear Regression - Beyond the line

Extend GEV ARIs with Curve Fitting

Puncturing the Hype with Graphs

Background for Deep Copy – 2

Can Likert Scale Data ever be Continuous?

Explore content categories

More articles by Saikat Chakraborty

Are LLMs Actually Intelligent? A Critical, Sourced Look at the Most Contested Question in AI

The Needle-in-a-Haystack Problem in Large Language Models

Prompt Drift: The Silent Killer of Production AI Systems

Hallucinations in LLMs and Agentic Systems

The Magical Machine at the Calcutta Book Fair, 1989(or How I Met My First Computer and Lived to Tell the Tale!)

Few Lessons That Shaped My Life

The Lottery Ticket Hypothesis: How Large and even Larger language Models generalize so well?

Optimizing Business Transformation Initiatives

How DeepSeek Works? The Mixture of Experts Architecture

Multi Agent Orchestration using Autogen to create sequential data processing & demand forecasting

Others also viewed

Anatomy of a Good Graph

Curse of Dimensionality

Fourier Transforms – Another Way to Look at Longitudinal Data

I ran 580 model-dataset experiments to show that, even if you try very hard, it is almost impossible to know that a model is degrading just by looking

Multiple Linear Regression - Beyond the line

Extend GEV ARIs with Curve Fitting

Puncturing the Hype with Graphs

Background for Deep Copy – 2

Can Likert Scale Data ever be Continuous?

Explore content categories