Build an Algorithmic Trading Strategy Fama-French 5-factor model in Python

Build an Algorithmic Trading Strategy Fama-French 5-factor model in Python

Don’t worry about what the markets are going to do, worry about what you are going to do in response to the markets. - Michael Carr

In 1993, Eugene Fama and Kenneth French developed a 5 factor model to describe stock returns. Fama and French were professors at the University of Chicago. In 2013, Fama shared the Nobel Memorial Prize in Economic Sciences. This linear 5-factor model is used to quantify the relationship between an asset's return and the risk of the returns. Each factor risk carries a premium, and the total asset return can be expected to correspond to a weighted average of these risk premia. The five factors are (1) market risk, (2) the outperformance of small versus big companies, and (3) the outperformance of high book/market versus small book/market companies, (4) profitability and (5) investment. 

Importantly, the 5-factor model can be used to assess active management of a portfolio versus selectively picking assets and timing the market. If performance can be explained by known return drivers as described by the model, the strategy can be replicated as a low-cost, algorithmic trading strategy.

The Fama French factors are obtained by sorting stocks into three size groups and then into two for each of the remaining three firm-specific factors. The factors involve three sets of value-weighted portfolios formed as 3 x 2 sorts on size and book-to-market, size and operating profitability, and size and investment. The risk factor values computed as the average returns of the portfolios (PF) as outlined in the following table:

No alt text provided for this image

The Fama-French 5 factors are based on the 6 value-weight portfolios formed on size and book-to-market, the 6 value-weight portfolios formed on size and operating profitability, and the 6 value-weight portfolios formed on size and investment.

Fama and French regularly publish updated risk factor and research portfolio data that can be downloaded from their website for free.

Let's jump into the code. Shout out to my classmates at Columbia University and Machine Learning for Algorithmic Trading. The code snippets below are generously borrowed from my teammates at the Columbia Data Science Bootcamp and others.

First, import pandas, numpy, matplotlib, and seaborn.

import pandas as pd
import numpy as np
 
from statsmodels.api import OLS, add_constant
import pandas_datareader.data as web
 
from linearmodels.asset_pricing import LinearFactorModel
 
import matplotlib.pyplot as plt
import seaborn as sns

Then, obtain monthly returns for the period 2010 – 2017 as follows:

ff_factor = 'F-F_Research_Data_5_Factors_2x3'
ff_factor_data = web.DataReader(ff_factor, 'famafrench', start='2010', end='2017-12')[0]


ff_factor_data.info()
<class 'pandas.core.frame.DataFrame'>
PeriodIndex: 96 entries, 2010-01 to 2017-12
Freq: M
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Mkt-RF  96 non-null     float64
 1   SMB     96 non-null     float64
 2   HML     96 non-null     float64
 3   RMW     96 non-null     float64
 4   CMA     96 non-null     float64
 5   RF      96 non-null     float64
dtypes: float64(6)
memory usage: 5.2 KB


Stats from ff_factor_data

ff_factor_data.describe()

No alt text provided for this image
No alt text provided for this image

Portfolios

Fama and French make available numerous portfolios that we can illustrate the estimation of the factor exposures, as well as the value of the risk premia available in the market for a given time period. We can use a panel of the 17 industry portfolios at a monthly frequency.

Subtract the risk-free rate from the returns because the factor model works with excess returns:

ff_portfolio = '17_Industry_Portfolios'
ff_portfolio_data = web.DataReader(ff_portfolio, 'famafrench', start='2010', end='2017-12')[0]
ff_portfolio_data = ff_portfolio_data.sub(ff_factor_data.RF, axis=0)


<class 'pandas.core.frame.DataFrame'>
PeriodIndex: 96 entries, 2010-01 to 2017-12
Freq: M
Data columns (total 17 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Food    96 non-null     float64
 1   Mines   96 non-null     float64
 2   Oil     96 non-null     float64
 3   Clths   96 non-null     float64
 4   Durbl   96 non-null     float64
 5   Chems   96 non-null     float64
 6   Cnsum   96 non-null     float64
 7   Cnstr   96 non-null     float64
 8   Steel   96 non-null     float64
 9   FabPr   96 non-null     float64
 10  Machn   96 non-null     float64
 11  Cars    96 non-null     float64
 12  Trans   96 non-null     float64
 13  Utils   96 non-null     float64
 14  Rtail   96 non-null     float64
 15  Finan   96 non-null     float64
 16  Other   96 non-null     float64
dtypes: float64(17)
memory usage: 13.5 KB

Now, obtain equity data from the quandl wiki prices for US equities.

with pd.HDFStore('../data/assets.h5') as store:
    prices = store['/quandl/wiki/prices'].adj_close.unstack().loc['2010':'2017']
    equities = store['/us_equities/stocks'].drop_duplicates()

Create a pandas dataframe for equity sectors and prices

sectors = equities.filter(prices.columns, axis=0).sector.to_dict()
prices = prices.filter(sectors.keys()).dropna(how='all', axis=1)

Calculate returns using pct_change and drop na.

returns = prices.resample('M').last().pct_change().mul(100).to_period('M')
returns = returns.dropna(how='all').dropna(axis=1)

Align data using loc function and assign to ff_factor_date and to ff_portfolio.

ff_factor_data = ff_factor_data.loc[returns.index]
ff_portfolio_data = ff_portfolio_data.loc[returns.index]

Compute excess returns from ff_factor_data and assign to excess_returns df.

excess_returns = returns.sub(ff_factor_data.RF, axis=0)
excess_returns.info()
excess_returns = excess_returns.clip(lower=np.percentile(excess_returns, 1),
                                     upper=np.percentile(excess_returns, 99))

Step 1: Factor Exposures

Implement the first stage to obtain the 17 factor loading estimates as follows:

betas = []
for industry in ff_portfolio_data:
    step1 = OLS(endog=ff_portfolio_data.loc[ff_factor_data.index, industry], 
                exog=add_constant(ff_factor_data)).fit()
    betas.append(step1.params.drop('const'))

betas = pd.DataFrame(betas, 
                     columns=ff_factor_data.columns, 
                     index=ff_portfolio_data.columns)
betas.info()

<class 'pandas.core.frame.DataFrame'>
Index: 17 entries, Food  to Other
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Mkt-RF  17 non-null     float64
 1   SMB     17 non-null     float64
 2   HML     17 non-null     float64
 3   RMW     17 non-null     float64
 4   CMA     17 non-null     float64
dtypes: float64(5)
memory usage: 1.4+ KB

Step 2: Risk Premia

Run 96 regressions of the period returns for the cross section of portfolios on the factor loadings

lambdas = []
for period in ff_portfolio_data.index:
    step2 = OLS(endog=ff_portfolio_data.loc[period, betas.index], 
                exog=betas).fit()
    lambdas.append(step2.params)

lambdas = pd.DataFrame(lambdas, 
                       index=ff_portfolio_data.index,
                       columns=betas.columns.tolist())
lambdas.info()

<class 'pandas.core.frame.DataFrame'>
PeriodIndex: 95 entries, 2010-02 to 2017-12
Freq: M
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Mkt-RF  95 non-null     float64
 1   SMB     95 non-null     float64
 2   HML     95 non-null     float64
 3   RMW     95 non-null     float64
 4   CMA     95 non-null     float64
dtypes: float64(5)
memory usage: 10.2 KB


lambdas.mean().sort_values().plot.barh(figsize=(12, 4))
sns.despine()
plt.tight_layout();
No alt text provided for this image
t = lambdas.mean().div(lambdas.std())

t

Mkt-RF    0.342748
SMB      -0.011482
HML      -0.262373
RMW      -0.046212
CMA      -0.175708
dtype: float64

Results

window = 24  # months
ax1 = plt.subplot2grid((1, 3), (0, 0))
ax2 = plt.subplot2grid((1, 3), (0, 1), colspan=2)
lambdas.mean().sort_values().plot.barh(ax=ax1)
lambdas.rolling(window).mean().dropna().plot(lw=1,
                                             figsize=(14, 5),
                                             sharey=True,
                                             ax=ax2)
sns.despine()
plt.tight_layout()

No alt text provided for this image
window = 24  # months
lambdas.rolling(window).mean().dropna().plot(lw=2,
                                             figsize=(14, 7),
                                             subplots=True,
                                             sharey=True)
sns.despine()
plt.tight_layout()

No alt text provided for this image

Fama-Macbeth with the LinearModels library

The linear_models library extends statsmodels with various models for panel data and also implements the two-stage Fama—MacBeth procedure:

mod = LinearFactorModel(portfolios=ff_portfolio_data, 
                        factors=ff_factor_data)
res = mod.fit()
print(res) 
                      LinearFactorModel Estimation Summary                      
================================================================================
No. Test Portfolios:                 17   R-squared:                      0.6889
No. Factors:                          5   J-statistic:                    17.081
No. Observations:                    95   P-value                         0.1466
Date:                  Wed, Jun 17 2020   Distribution:                 chi2(12)
Time:                          14:12:24                                         
Cov. Estimator:                  robust                                         
                                                                                
                            Risk Premia Estimates                             
==============================================================================
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
Mkt-RF         1.2294     0.4076     3.0161     0.0026      0.4305      2.0282
SMB           -0.0452     0.8661    -0.0522     0.9584     -1.7427      1.6524
HML           -1.0782     0.6886    -1.5658     0.1174     -2.4278      0.2714
RMW           -0.1397     0.8304    -0.1682     0.8664     -1.7672      1.4879
CMA           -0.6245     0.5075    -1.2305     0.2185     -1.6193      0.3702
==============================================================================

Covariance estimator:
HeteroskedasticCovariance
See full_summary for complete results

To view or add a comment, sign in

More articles by Roger Hahn

Others also viewed

Explore content categories