Detecting Commodity Forward Price Anomaly Using Deep Learning Autoencoder
Forward price curves play such an important role in commodity trading and risk management. These prices directly impacts MtM P&L, risk measures such as VaR, which are used as a basis of trading and management decision making.
And it’s a large dataset. Considering number of commodities, forward months, and time snapshots, total number of forward prices may easily climb up to tens of thousands a day. And consider historical data for analysis and VaR. It’s not uncommon for a trading firm to use hundreds of thousands prices (and derived quantities) for daily closing.
Here comes the problem of data validation. These data comes from various sources, using various means including auto interface, excel upload and manual input, and there are many reasons you have incorrect data in your database ranging from hardware failure to human laziness.
In the picture, I presented four sample curves. Horizontal axis denote forward months, vertical axis is price of range $90. Blue is yesterday’s price, and red is today’s price (reversed in (C).. yes I’m lazy!). I’ll explain the numbers later. Then we can see that
- A : normal
- B : far forwards (M > 6) are not updated
- C : point error at M=4
- D : two prices for M=2,3 are not updated
Our current validation logic calculates various measures, such as price level, time spread, daily change, then compare these measures with threshold values derived from historical data. If the measure exceeds threshold, then the system flags the curves as errors. However, there are difficulties with this approach. Each measure is designed to detect a specific pattern of anomaly. However, the world is so creative in creating errors, and we cannot detect ‘unseen’ errors, and frequently have to update validation logic. Also, the parameters should be re-calculated and updated upon market change. So it’s not so effective, and not that intelligent and we are struggling to validate our validation logic.
So we would like to use something more intelligent, and automatic. Here comes “Machine Learning(ML)” – let the machine learn by itself, on how to find out anomaly. Here comes “Autoencoder”, a type of neural network which is designed to reproduce input data as its output. It sounds trivial, but the catch is, autoencoder echoes back input if the input is a familiar pattern (trained data), and when it’s not, autoencoder produce a result which is quite different from its input.
Practically, I follow these steps.
- prepare historical data of forward curves
- design autoencoder, and train it with historical data
- input a test forward curve to the autoencoder, and observe output. Calculate the difference between input and output data as a single number (e.g. pair-wise difference)
If the forward curve looks normal, then the output just looks like the input, and the difference number would be close to zero. If the autoencoder is presented with a data it has not seen before (perhaps an anomaly), then it fails to reproduce the curve, and the input/output difference would be large.
That’s the number in the figures (A) to (D). Typical number range for historical data is 5~10. And we can see large numbers for the cases of anomalies in (B) to (D). Note that in the case of (C) the error is marginal, but it can be enhanced by changing neural network architecture. Pattern like (D) is extremely difficult to validate using conventional (statistical) means, but the autoencoder successfully sorted it out.
We are just starting, so we don’t have statistics on its performance. Even at this stage, we enjoy benefits using “Machine Learning” approach.
- It can detect ‘unseen’ errors.
- You don’t have to tell it how to detect errors. It just learns by itself, and evolves with time
- It’s fast, simple to code and maintain (there are many good ML libraries)
- Does not require specific domain knowledge, can be applied to wide variety of data set
- Fully scalable
Some comments:
- It can also be used for (blind) error detection in your historical data – dollar or cents, 0 value, premium instead of the final price, barrel instead of ton price, wrong file uploaded, …
- Some of the anomalies detected are a real market anomaly (market disruption). So it can be used to detect market disruption
- It can be best used as a component of your validation toolkit, along with statistical/human checks
Last comment – I have seen many people talking about ML and Big Data, but most of the time they are interested in price prediction. I can predict it can’t predict. However, there are many more potential applications of ML, although not so glamorous, so why don’t you start teaching your machine – but be careful, it might take up your seat!
Thanks for sharing Aftab Khan
Great post Wongyu Choe .
Great idea! And the topic is something I need to solve very soon!
Wow what a great post!