Random to “Rotation Forest”: A new classifier ensemble method
Rotation Forest is a method for generating classifier ensembles based on feature extraction. To create the training data for a base classifier, the feature set is randomly split into K subsets (K is a parameter of the algorithm) and Principal Component Analysis (PCA) is applied to each subset. All principal components are retained in order to preserve the variability information in the data. Thus, K axis rotations take place to form the new features for a base classifier. The idea of the rotation approach is to encourage simultaneously individual accuracy and diversity within the ensemble. Diversity is promoted through the feature extraction for each base classifier. Decision trees were chosen here because they are sensitive to rotation of the feature axes, hence the name "forest." Accuracy is sought by keeping all principal components and also using the whole data set to train each base classifier. Rodríguez JJ, Kuncheva and LI, Alonso CJ. examined the Rotation Forest ensemble on a random selection of 33 benchmark datasets from the UCI repository and compared it with Bagging, AdaBoost, and Random Forest. The results were favourable to Rotation Forest to a great extent. So, it is worth a try!
It has been implemented in R through the package “rotationForest”. The data I have used here is a set of around 30,000 observation contained in a credit default data set taken from https://archive.ics.uci.edu/ml/datasets.html
The implementation is very easy and super-fast! It is implemented using the following command in R.
rotationForest (x, y, K = 3, L = 10)
Arguments:
1) X: A data frame of predictors (numeric, or integer). Categorical variables need to be transformed to indicator (dummy) variables.
2) Y: A factor containing the response vector. Only {0,1} is allowed.
3) K: The number of variable subsets.
4) L: The number of base classifiers. The default is 10.
This is followed by the S3 method "predict" to generate a prediction.
Predict (object, new data)
Below is a screen shot from R-Studio. The accuracy level I got in the first run is in the order of ~85%. Not bad!
is it implemented in a default library of R, in a library of CRAN, a customized library..? A good dataset for testing the Rotation Forest is the spiral Dataset.In this dataset the default random forest is not able to make good partitions only based in vertical and horizontal cuts. With random forest you improve much better the accuracy and the partitions. So interesting!