Self-Organizing Maps

Introduction

 The Self-Organizing Map (SOM), developed by Teuvo Kohonen in 1982, exhibit the interesting and non-trivial ability of emergence through self-organization. SOM is an ideal tool for clustering and visualizing high-dimensional data. This convenience is particularly important when one can only effectively recognize some patterns by visual inspection rather than based on mathematical descriptions.

 What it is used for?

Despite from classical approaches that focus on the monovarietal series, SOMs will be trained with the vectors or n-dimensional arrays. For example, base data set is S&P 500 constituents, where each has vector with components of the returns on Open, High, Low, and Close (OHLC) prices. We even can increase it by addition of target return, 1M, 3M and 6M return. We start with vectors of features for all sample data points. These are then grouped together according to how similar their vectors are.

 The method for visualize distances between all classes. The map becomes a representation of the raw database: observations classified in the same unit - or in the neighbored units - are supposed to share similarities or patterns. The originality of the method rests in the organization of classes on a map according to a neighborhood notion.

 The principal goal of the SOM is to transform an incoming signal pattern of arbitrary dimension into a one- or two-dimensional discrete map where each cell in the SOM is a vector

Step by Step

Note that initially we identify the number of patterns (the map size in SOM) when implementing the trajectory domain model. However, the number of patterns might differ from different markets and should be decided by the market itself. The map size and the map structure are automatically determined.

 (1)  The first branch performs the training of SOMs with the in–sample set. In this way, each returns bar of the input set is associated to a corresponding pattern or the Best Matching Unit: 

   map = np.random.random_sample(size=(Rows,Cols,Dim)) 

Next, a random data item is selected and the best matching unit map node/cell is determined:

    t = np.random.randint(len(data_x))
    (bmu_row, bmu_col) = closest_node(data_x, t, map, Rows, Cols)
No alt text provided for this image

Measure the distance between item n and each of the K cell (group). The k for which distance is shortest

min |X[n] – V(k)|

is called Best Matching Unit (BMU) for the data point n.

 Once the convergence threshold is reached, the second part of the procedure activates.

 Machine Learning

Initially we assumed we know what Groups V(s) are. Now we are going to train the system to find them by distributing arrays into square array:

V(k) + beta * (X[n] – V(k)) -> V(k)

(2)  Next, each node/cell of the SOM is examined.

    for i in range(Rows):
      for j in range(Cols):
        if distance(bmu_row, bmu_col, i, j) < curr_range:
          map[i][j] = map[i][j] + learning_rate * (data_x[t] - map[i][j])

 The update moves the current node vector closer to the current data item using the learning_rate as a value of adjustment, which slowly decreases with each iteration.

No alt text provided for this image

  On the example of industry classification sectors, we can plot performance patterns and build map of patterns where each benchmark node/cell is multi-dimensional vector.

Each node in this case corresponds to performance of benchmark node. Pattern on the map will reflect number and positioning of stocks from each industry classification. Some sectors will be spread out and others more focused on around specific node:

No alt text provided for this image


The data points are mapped to a two-dimensional grid so we can visualize which data points have similar characteristics. Final visualization can be clustered in financial sectors where SOM provides rough idea of what is going on especially if it is contour map. Some sectors can be spread out and others focused and concentrated around cells.

Note that initially we identify the number of patterns (the map size in SOM) when implementing the trajectory domain model. However, the number of patterns might differ from different markets and should be decided by the market itself.

 Self-organization means the ability of a system to adapt its internal structure to structures sensed in the input of the system. This adaptation should be performed in such a way that firstly, no intervention from the environment is necessary (unsupervised learning) and secondly, the internal structure of the self-organizing system represents features of the input-data that are relevant to the system.


References

[1] Methodology. The Kohonen Algorithm

[2] Time series forecasting with SOM and non–linear models

[3] Machine Learning. An Applied Mathematics Introduction; Paul Wilmott

To view or add a comment, sign in

More articles by Lesya Berbeka

  • Anomaly Detection Framework

    Machine Learning Identifying defective components is a crucial aspect of large-scale industrial manufacturing. This…

  • Forecasting methods. Overview.

    Model 1 Drift method for forecasting, allowing for a gradual increase or decrease in forecasts over time. This method…

  • k-Means Clustering in Price Trend Prediction

    K Means Clustering is unsupervised-learning technique and very simple method of grouping individual data points into…

  • Cluster Analysis Approach

    Fundamental industry classifications such as GICS, BICS, ICB, NAICS, SIC, TRBC, etc. are widely used in a variety of…

  • Trend Detection Modeling

    Social media trends not only reflect real-world events, but also drive offline behavior. By identifying trending…

  • Intro to Conic Finance

    Various models exist that aim to determine whether or not an investor should invest in some uncertain opportunity. In…

  • Asian Options in Python

    Introduction Here is to share simple solution for a complex valuation problem of the Asian Options under lognormal…

    1 Comment
  • Analytics with Julia Computing

    The amount of data in the world is growing exponentially, and as a result quality of data has changed, too. (Ref.

  • Fixed Income in Python. Make-whole call bond.

    Introduction. Here is to share best practices of methodology writing on the example of callable bond with make-whole…

  • Algorithmic trading model in Python. Intro to Quantopian Platform.

    In this article I will target to explain steps for implementation of the portfolio trading model. Python resolves…

    1 Comment

Others also viewed

Explore content categories