Using simple machine learning approaches to combine trading alphas

Using simple machine learning approaches to combine trading alphas

You have likely read about how the field of machine learning (ML) is making serious inroads into how Wall Street is operating, with firms like BlackRock,  Bridgewater, D. E Shaw & Co. among many others all actively using ML and hiring experts in that field. If you haven’t read about it, well, welcome back! How long was your coma?!

In this post we show an approach to using ML to combine trading alphas. If you have seen that sort of thing happen close up, you probably know that combining signals is quite a manual and researcher-heavy undertaking.  What we are able to show is that even “out of the can” using  ML algorithms available off the shelf do decently at combining signals.

We use scikit-learn, the open-source (and widely-used) machine learning library written in Python, and find that ML isn’t a bad starting point, especially as we don’t muck with the dials and knobs that scikit-learn sets as a default.  And remember, that package, like lots of statistical software, is built by persons unaccountably obsessed with iris petal lengths [1] and such. So that is neat and surprising.

What we show is that off-the-shelf machine learning, and especially support vector machines, are a defensible starting point for automatedly combining alphas, over purely linear and simple regression-based approaches.

Other (clustering and boosting) approaches also show promise, while needing work and calibration to deal with respectively high dimensionality, and high noise.

Overall, there is both promise, and unique challenges with using ML-based approaches -- for example, ML-based approaches come with opacity, and the ability of ML models to evolve with the markets can hamper an understanding of what they are doing.  The points, to put it somewhat tersely, are that backtests are more path-dependent so less parallelizable; and the adaptability that is a strength has a downside in making underperformance attribution harder.  

Here is the link to the full report, that elaborates. As always, questions or comments are welcome, and if you find it helpful and work in the field, drop us a note to say hello!

[1] https://en.wikipedia.org/wiki/Iris_flower_data_set#Use_of_the_data_set



I remember the Quant Research team at Deutsche put out a similar review almost 10 years ago and I believe their preferred method was Adaboost.

Like
Reply

Good use of Scikit Harish!

Like
Reply

Nice write up Harish. You can see a full coded workflow in Python for alpha combination with ensemble learning (in this case, AdaBoost) on Quantopian at https://www.quantopian.com/posts/machine-learning-on-quantopian-part-3-building-an-algorithm .

Great article Harish. Together python, R also may be used. Maybe at least for analysis, as it provides diverse set of cutting edge packages on ML. Recent developments in R is gaining popularity in an outstanding manner.

To view or add a comment, sign in

More articles by Harish Devarajan

Others also viewed

Explore content categories