Using simple machine learning approaches to combine trading alphas

Harish Devarajan

Published Jul 12, 2017

You have likely read about how the field of machine learning (ML) is making serious inroads into how Wall Street is operating, with firms like BlackRock, Bridgewater, D. E Shaw & Co. among many others all actively using ML and hiring experts in that field. If you haven’t read about it, well, welcome back! How long was your coma?!

In this post we show an approach to using ML to combine trading alphas. If you have seen that sort of thing happen close up, you probably know that combining signals is quite a manual and researcher-heavy undertaking. What we are able to show is that even “out of the can” using ML algorithms available off the shelf do decently at combining signals.

We use scikit-learn, the open-source (and widely-used) machine learning library written in Python, and find that ML isn’t a bad starting point, especially as we don’t muck with the dials and knobs that scikit-learn sets as a default. And remember, that package, like lots of statistical software, is built by persons unaccountably obsessed with iris petal lengths [1] and such. So that is neat and surprising.

What we show is that off-the-shelf machine learning, and especially support vector machines, are a defensible starting point for automatedly combining alphas, over purely linear and simple regression-based approaches.

Other (clustering and boosting) approaches also show promise, while needing work and calibration to deal with respectively high dimensionality, and high noise.

Overall, there is both promise, and unique challenges with using ML-based approaches -- for example, ML-based approaches come with opacity, and the ability of ML models to evolve with the markets can hamper an understanding of what they are doing. The points, to put it somewhat tersely, are that backtests are more path-dependent so less parallelizable; and the adaptability that is a strength has a downside in making underperformance attribution harder.

Here is the link to the full report, that elaborates. As always, questions or comments are welcome, and if you find it helpful and work in the field, drop us a note to say hello!

[1] https://en.wikipedia.org/wiki/Iris_flower_data_set#Use_of_the_data_set

Sean Slotterback 8y

I remember the Quant Research team at Deutsche put out a similar review almost 10 years ago and I believe their preferred method was Adaboost.

Dr. Debashis Dutta 8y

Good use of Scikit Harish!

Jonathan Larkin 8y

Nice write up Harish. You can see a full coded workflow in Python for alpha combination with ensemble learning (in this case, AdaBoost) on Quantopian at https://www.quantopian.com/posts/machine-learning-on-quantopian-part-3-building-an-algorithm .

2 Reactions

Indranil Gayen 8y

Great article Harish. Together python, R also may be used. Maybe at least for analysis, as it provides diverse set of cutting edge packages on ML. Recent developments in R is gaining popularity in an outstanding manner.

Using simple machine learning approaches to combine trading alphas

Harish Devarajan

More articles by Harish Devarajan

Others also viewed

The "Magical" Hammer Fallacy: Why LLMs Aren't Your Go-To Calculator (and That's Okay)

Detecting anomalies in quasi-periodic noisy signals using scikit-dsp-comm

Sprint with Applied AI or in other News, PyFixest 0.50.0 is on PyPi!

Binary Search Demystified

Rust for AI/ML Agent Inference: A Practical Cheat Sheet

Is the "Vector" in RAG becoming optional?

The Song of AI and Fire

Using Machine Learning to Measure Effective Stringency Index

Is News Sentiment Still Adding Alpha?

Explore content categories