Tuning Supervised Machine Learning Models

Umi Vilbig

Published Feb 3, 2025

As a continuation the previous two posts, I will go into detail on the methods I used to fine tune my Congress stock purchasing AI.

I decided to use supervised machine learning models because they can leverage historical data to identify patterns. This enables a more data driven approach and systematic decision making process. By training on past price movements of historical politicians trades, the model attempts to make a decision tree for a given trade. By splitting the large dataset into 85% training and 15% testing, the model optimizes for accuracy in determining if the stock will be up or down after a period of time.

Choosing the right machine learning model was a crucial decision for this project. I prioritized speed, understanding, and reliability and thus landed on a Random Forest Classifier.

For the rapid development, I wanted a model which was fast to train and iterate. Unlike XGBoost or deep learning models, Random Forest trains decision trees in parallel which speeds up the training process.

I wanted to apply my knowledge of fundamental analysis to determine the health of a company. A model like a neural network would be difficult to understand why a model made certain decisions. The Random Forest classifier gives a clear importance metric for each of my inputs which allows me to understand how my model works better.

Other models like Gradient Boosting can sometimes overfit without careful tuning, and deep learning models require a large datasets to be effective which not possible only a few thousand transactions since 2014. Once again, the Random Forest Classifier was a solid middle ground which balanced accuracy, stability, and low maintenance.

Once the model was selected, the next step was to tune the model with the data collected in my second article. This process was mostly straight forward by splitting the categorical and numerical values from my dataset. Once this process was complete, the data was then thrown into a Column Transformer, which is a preprocessor for the data.

Recommended by LinkedIn

Climbing up the trees to a robust machine learning…

Samir Paul 1 year ago

AI, Machine Learning and challenges ahead of us.

Somnath Mukherjee 9 years ago

Machine Learning vs Predictive Modeling

Anindya Dey 9 years ago

preprocessor = ColumnTransformer(
    transformers=[
        ("cat", categorical_transformer, categorical_features),
        ('num', numerical_transformer, numerical_features)
    ]
)

The classifier was calculated by determining if (1) the asset was above the stocks price on the date the filing was made, or (0) if the asset was below.

Next, the training and testing data was made by splitting the X input (the categorical and numerical data), and the Y (classifier variable) into the prementioned 85-15 split.

X = data[categorical_features + numerical_features]
# 1 if profitable after 7 days, 0 otherwise
y = data['Profitable7']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=69)

The model was then passed into a Pipeline which handles the preprocessor and the classifier which is the Random Forest Classifier.

model = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('classifier', RandomForestClassifier(n_jobs=-1))
])

Once the model was trained, I used the GridSearchCV parameter tuning to determine the optimal input values for the classifier model. To avoid over tuning for the initial stage, I only looked to tune using the n estimators, max depth, and min samples split parameters for the model.

# hyperparam tuning
param_grid = {
    'classifier__n_estimators': [100, 250, 500],  
    'classifier__max_depth': [3, 5, 7, 9, 15, 20],
    'classifier__min_samples_split': [2, 4, 6, 8],
}

grid_search = GridSearchCV(
    estimator=model,
    param_grid=param_grid,
    scoring='accuracy',
    cv=3,
    refit='f1',
    # verbose=2,
    n_jobs=-1,
)

grid_search.fit(X_train, y_train)

print("Best parameters found:")
print(grid_search.best_params_)

print("Best score found:")
print(grid_search.best_score_)

best_model = grid_search.best_estimator_

test_score = best_model.score(X_test, y_test)
print(f"Test score: {test_score}")

Through careful tuning of the parameter grid, I was able to boost the model’s accuracy from 63% to 68%. This improvement highlights the power of fine-tuning and reinforces the value of continuously iterating to achieve better results.

To view or add a comment, sign in

Tuning Supervised Machine Learning Models

Umi Vilbig

Recommended by LinkedIn

More articles by Umi Vilbig

Others also viewed

What is needed to run Machine Learning at Scale ?

Hyperparameter Tuning - Optimizing Machine Learning Models

Artificial Intelligence No 52: An introduction to causal machine learning

Machine Learning Algorithms: A Concise Overview of the Most Popular and Effective Ones

Pick Perfect Models: Mastering Machine Learning Algorithm Selection

No Free Lunch in Machine Learning – Why No Single Model Works for Everything

A quick overview of different Machine Learning Models

The Power of AI: Exploring Machine Learning and Deep Learning in IT Engagements

Machine Learning: Shaping the Future

Machine learning in field service

Explore content categories

Recommended by LinkedIn

More articles by Umi Vilbig

Gathering Additional data for The Congress AI

Tracking Congress's Stock Trades with AI

Others also viewed

What is needed to run Machine Learning at Scale ?

Hyperparameter Tuning - Optimizing Machine Learning Models

Artificial Intelligence No 52: An introduction to causal machine learning

Machine Learning Algorithms: A Concise Overview of the Most Popular and Effective Ones

Pick Perfect Models: Mastering Machine Learning Algorithm Selection

No Free Lunch in Machine Learning – Why No Single Model Works for Everything

A quick overview of different Machine Learning Models

The Power of AI: Exploring Machine Learning and Deep Learning in IT Engagements

Machine Learning: Shaping the Future

Machine learning in field service

Similar topics

How to Optimize Machine Learning Performance

How To Fine-Tune AI Models On Small Datasets

Tips for Machine Learning Success

Explore content categories