Detecting Commodity Forward Price Anomaly Using Deep Learning Autoencoder

Detecting Commodity Forward Price Anomaly Using Deep Learning Autoencoder

Forward price curves play such an important role in commodity trading and risk management. These prices directly impacts MtM P&L, risk measures such as VaR, which are used as a basis of trading and management decision making.

 And it’s a large dataset. Considering number of commodities, forward months, and time snapshots, total number of forward prices may easily climb up to tens of thousands a day. And consider historical data for analysis and VaR. It’s not uncommon for a trading firm to use hundreds of thousands prices (and derived quantities) for daily closing.

 Here comes the problem of data validation. These data comes from various sources, using various means including auto interface, excel upload and manual input, and there are many reasons you have incorrect data in your database ranging from hardware failure to human laziness.

 In the picture, I presented four sample curves. Horizontal axis denote forward months, vertical axis is price of range $90. Blue is yesterday’s price, and red is today’s price (reversed in (C).. yes I’m lazy!). I’ll explain the numbers later. Then we can see that

 - A : normal
- B : far forwards (M > 6) are not updated
- C : point error at M=4
- D : two prices for M=2,3 are not updated

Our current validation logic calculates various measures, such as price level, time spread, daily change, then compare these measures with threshold values derived from historical data. If the measure exceeds threshold, then the system flags the curves as errors. However, there are difficulties with this approach. Each measure is designed to detect a specific pattern of anomaly. However, the world is so creative in creating errors, and we cannot detect ‘unseen’ errors, and frequently have to update validation logic. Also, the parameters should be re-calculated and updated upon market change. So it’s not so effective, and not that intelligent and we are struggling to validate our validation logic.

 So we would like to use something more intelligent, and automatic. Here comes “Machine Learning(ML)” – let the machine learn by itself, on how to find out anomaly. Here comes “Autoencoder”, a type of neural network which is designed to reproduce input data as its output. It sounds trivial, but the catch is, autoencoder echoes back input if the input is a familiar pattern (trained data), and when it’s not, autoencoder produce a result which is quite different from its input.

 Practically, I follow these steps.

- prepare historical data of forward curves
- design autoencoder, and train it with historical data
- input a test forward curve to the autoencoder, and observe output. Calculate the difference between input and output data as a single number (e.g. pair-wise difference)

 If the forward curve looks normal, then the output just looks like the input, and the difference number would be close to zero. If the autoencoder is presented with a data it has not seen before (perhaps an anomaly), then it fails to reproduce the curve, and the input/output difference would be large.

 That’s the number in the figures (A) to (D). Typical number range for historical data is 5~10. And we can see large numbers for the cases of anomalies in (B) to (D). Note that in the case of (C) the error is marginal, but it can be enhanced by changing neural network architecture. Pattern like (D) is extremely difficult to validate using conventional (statistical) means, but the autoencoder successfully sorted it out.

 We are just starting, so we don’t have statistics on its performance. Even at this stage, we enjoy benefits using “Machine Learning” approach.

 - It can detect ‘unseen’ errors.
- You don’t have to tell it how to detect errors. It just learns by itself, and evolves with time
- It’s fast, simple to code and maintain (there are many good ML libraries)
- Does not require specific domain knowledge, can be applied to wide variety of data set
- Fully scalable

Some comments: 

- It can also be used for (blind) error detection in your historical data – dollar or cents, 0 value, premium instead of the final price, barrel instead of ton price, wrong file uploaded, …
- Some of the anomalies detected are a real market anomaly (market disruption). So it can be used to detect market disruption
- It can be best used as a component of your validation toolkit, along with statistical/human checks

 Last comment – I have seen many people talking about ML and Big Data, but most of the time they are interested in price prediction. I can predict it can’t predict. However, there are many more potential applications of ML, although not so glamorous, so why don’t you start teaching your machine – but be careful, it might take up your seat!

Great idea! And the topic is something I need to solve very soon!

Like
Reply

Wow what a great post!

Like
Reply

To view or add a comment, sign in

More articles by Wongyu Choe

  • 객체인식(NER) 을 이용한 데이터 익명화 (Anonymized Data)

    ChatGPT 와 같은 인공지능 서비스는 많은 가능성과 함께 새로운 문제도 보여주고 있는데, 그중 하나가 보안입니다. 회사의 직원들이 민감한 정보나 회사 고객의 정보를 ChatGPT 에게 질의하는 것을 어떻게…

  • ChatGPT, ELIZA, AI-UX

    여러분들 ChatGPT 많이 쓰고 계신가요? 저도 꾸준히 사용을 하고 있습니다. 처음에는 순전히 '어느 정도까지 할 수 있을까?' 하는 테스트 목적으로, 그리고 그 다음에는 제가 원하는 결과를 얻기 위해 이런 저런…

  • 전문가 되기, 전문가처럼 보이기

    제가 그럴 자격이 있는지는 모르겠지만, 어쨌든 이 분야를 오래 해 와서 그런지 제게 "데이터 분석이나 인공지능의 전문가가 되려면 어떻게 해야 할까요?" 하고 묻는 분들이 자주 있습니다. 저는 뭐라고 답을 할까요?…

    2 Comments
  • ChatGPT 소감, 일하는방식의 변화

    ChatGPT 테스트를 해 보면서, 정말 AI 가 가까이 왔다는 것을 저도, 그리고 주변 분들도 많이 느끼게 되었습니다. 그리고 내년의 할 일들을 생각할 때도 추가적인 고민을 하게 되네요.

    1 Comment
  • Understanding your position: Crack Representation

    Introduction In risk modeling and systems context, positions are reported as swaps, namely, price exposure at vertices…

  • IT Portfolio management for CTRM: functional scalability

    Should we buy a solution or go in-house? Which language and tools? How can we match business requirements with…

  • A Software Framework for Risk Management using R

    Risk management is a complex process which requires usages of many different software products and IT tools. One may…

    2 Comments
  • Managing Daily VaR in Commodity Trading

    VaR plays an important role as a risk measure, and many traders state that they are employing VaR as primary risk…

    3 Comments
  • Price Outlier Detection using Cluster Analysis

    Previously, I wrote about an autoencoder based price outlier detection algorithm. In this post, I will introduce…

  • Risk and Hedging with Blending Optimization

    Introduction Blending Optimization is widely used by refineries and traders for scheduling and product blending…

    1 Comment

Others also viewed

Explore content categories