Build an Algorithmic Trading Strategy Fama-French 5-factor model in Python
Don’t worry about what the markets are going to do, worry about what you are going to do in response to the markets. - Michael Carr
In 1993, Eugene Fama and Kenneth French developed a 5 factor model to describe stock returns. Fama and French were professors at the University of Chicago. In 2013, Fama shared the Nobel Memorial Prize in Economic Sciences. This linear 5-factor model is used to quantify the relationship between an asset's return and the risk of the returns. Each factor risk carries a premium, and the total asset return can be expected to correspond to a weighted average of these risk premia. The five factors are (1) market risk, (2) the outperformance of small versus big companies, and (3) the outperformance of high book/market versus small book/market companies, (4) profitability and (5) investment.
Importantly, the 5-factor model can be used to assess active management of a portfolio versus selectively picking assets and timing the market. If performance can be explained by known return drivers as described by the model, the strategy can be replicated as a low-cost, algorithmic trading strategy.
The Fama French factors are obtained by sorting stocks into three size groups and then into two for each of the remaining three firm-specific factors. The factors involve three sets of value-weighted portfolios formed as 3 x 2 sorts on size and book-to-market, size and operating profitability, and size and investment. The risk factor values computed as the average returns of the portfolios (PF) as outlined in the following table:
The Fama-French 5 factors are based on the 6 value-weight portfolios formed on size and book-to-market, the 6 value-weight portfolios formed on size and operating profitability, and the 6 value-weight portfolios formed on size and investment.
Fama and French regularly publish updated risk factor and research portfolio data that can be downloaded from their website for free.
Let's jump into the code. Shout out to my classmates at Columbia University and Machine Learning for Algorithmic Trading. The code snippets below are generously borrowed from my teammates at the Columbia Data Science Bootcamp and others.
First, import pandas, numpy, matplotlib, and seaborn.
import pandas as pd import numpy as np from statsmodels.api import OLS, add_constant import pandas_datareader.data as web from linearmodels.asset_pricing import LinearFactorModel import matplotlib.pyplot as plt import seaborn as sns
Then, obtain monthly returns for the period 2010 – 2017 as follows:
ff_factor = 'F-F_Research_Data_5_Factors_2x3' ff_factor_data = web.DataReader(ff_factor, 'famafrench', start='2010', end='2017-12')[0] ff_factor_data.info() <class 'pandas.core.frame.DataFrame'> PeriodIndex: 96 entries, 2010-01 to 2017-12 Freq: M Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Mkt-RF 96 non-null float64 1 SMB 96 non-null float64 2 HML 96 non-null float64 3 RMW 96 non-null float64 4 CMA 96 non-null float64 5 RF 96 non-null float64 dtypes: float64(6) memory usage: 5.2 KB
Stats from ff_factor_data
ff_factor_data.describe()
Portfolios
Fama and French make available numerous portfolios that we can illustrate the estimation of the factor exposures, as well as the value of the risk premia available in the market for a given time period. We can use a panel of the 17 industry portfolios at a monthly frequency.
Subtract the risk-free rate from the returns because the factor model works with excess returns:
ff_portfolio = '17_Industry_Portfolios' ff_portfolio_data = web.DataReader(ff_portfolio, 'famafrench', start='2010', end='2017-12')[0] ff_portfolio_data = ff_portfolio_data.sub(ff_factor_data.RF, axis=0) <class 'pandas.core.frame.DataFrame'> PeriodIndex: 96 entries, 2010-01 to 2017-12 Freq: M Data columns (total 17 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Food 96 non-null float64 1 Mines 96 non-null float64 2 Oil 96 non-null float64 3 Clths 96 non-null float64 4 Durbl 96 non-null float64 5 Chems 96 non-null float64 6 Cnsum 96 non-null float64 7 Cnstr 96 non-null float64 8 Steel 96 non-null float64 9 FabPr 96 non-null float64 10 Machn 96 non-null float64 11 Cars 96 non-null float64 12 Trans 96 non-null float64 13 Utils 96 non-null float64 14 Rtail 96 non-null float64 15 Finan 96 non-null float64 16 Other 96 non-null float64 dtypes: float64(17) memory usage: 13.5 KB
Now, obtain equity data from the quandl wiki prices for US equities.
with pd.HDFStore('../data/assets.h5') as store:
prices = store['/quandl/wiki/prices'].adj_close.unstack().loc['2010':'2017']
equities = store['/us_equities/stocks'].drop_duplicates()
Create a pandas dataframe for equity sectors and prices
sectors = equities.filter(prices.columns, axis=0).sector.to_dict() prices = prices.filter(sectors.keys()).dropna(how='all', axis=1)
Calculate returns using pct_change and drop na.
returns = prices.resample('M').last().pct_change().mul(100).to_period('M')
returns = returns.dropna(how='all').dropna(axis=1)
Align data using loc function and assign to ff_factor_date and to ff_portfolio.
ff_factor_data = ff_factor_data.loc[returns.index] ff_portfolio_data = ff_portfolio_data.loc[returns.index]
Compute excess returns from ff_factor_data and assign to excess_returns df.
excess_returns = returns.sub(ff_factor_data.RF, axis=0)
excess_returns.info()
excess_returns = excess_returns.clip(lower=np.percentile(excess_returns, 1),
upper=np.percentile(excess_returns, 99))
Step 1: Factor Exposures
Implement the first stage to obtain the 17 factor loading estimates as follows:
betas = []
for industry in ff_portfolio_data:
step1 = OLS(endog=ff_portfolio_data.loc[ff_factor_data.index, industry],
exog=add_constant(ff_factor_data)).fit()
betas.append(step1.params.drop('const'))
betas = pd.DataFrame(betas,
columns=ff_factor_data.columns,
index=ff_portfolio_data.columns)
betas.info()
<class 'pandas.core.frame.DataFrame'>
Index: 17 entries, Food to Other
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Mkt-RF 17 non-null float64
1 SMB 17 non-null float64
2 HML 17 non-null float64
3 RMW 17 non-null float64
4 CMA 17 non-null float64
dtypes: float64(5)
memory usage: 1.4+ KB
Step 2: Risk Premia
Run 96 regressions of the period returns for the cross section of portfolios on the factor loadings
lambdas = []
for period in ff_portfolio_data.index:
step2 = OLS(endog=ff_portfolio_data.loc[period, betas.index],
exog=betas).fit()
lambdas.append(step2.params)
lambdas = pd.DataFrame(lambdas,
index=ff_portfolio_data.index,
columns=betas.columns.tolist())
lambdas.info()
<class 'pandas.core.frame.DataFrame'>
PeriodIndex: 95 entries, 2010-02 to 2017-12
Freq: M
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Mkt-RF 95 non-null float64
1 SMB 95 non-null float64
2 HML 95 non-null float64
3 RMW 95 non-null float64
4 CMA 95 non-null float64
dtypes: float64(5)
memory usage: 10.2 KB
lambdas.mean().sort_values().plot.barh(figsize=(12, 4)) sns.despine() plt.tight_layout();
t = lambdas.mean().div(lambdas.std()) t Mkt-RF 0.342748 SMB -0.011482 HML -0.262373 RMW -0.046212 CMA -0.175708 dtype: float64
Results
window = 24 # months
ax1 = plt.subplot2grid((1, 3), (0, 0))
ax2 = plt.subplot2grid((1, 3), (0, 1), colspan=2)
lambdas.mean().sort_values().plot.barh(ax=ax1)
lambdas.rolling(window).mean().dropna().plot(lw=1,
figsize=(14, 5),
sharey=True,
ax=ax2)
sns.despine()
plt.tight_layout()
window = 24 # months
lambdas.rolling(window).mean().dropna().plot(lw=2,
figsize=(14, 7),
subplots=True,
sharey=True)
sns.despine()
plt.tight_layout()
Fama-Macbeth with the LinearModels library
The linear_models library extends statsmodels with various models for panel data and also implements the two-stage Fama—MacBeth procedure:
mod = LinearFactorModel(portfolios=ff_portfolio_data,
factors=ff_factor_data)
res = mod.fit()
print(res)
LinearFactorModel Estimation Summary
================================================================================
No. Test Portfolios: 17 R-squared: 0.6889
No. Factors: 5 J-statistic: 17.081
No. Observations: 95 P-value 0.1466
Date: Wed, Jun 17 2020 Distribution: chi2(12)
Time: 14:12:24
Cov. Estimator: robust
Risk Premia Estimates
==============================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
------------------------------------------------------------------------------
Mkt-RF 1.2294 0.4076 3.0161 0.0026 0.4305 2.0282
SMB -0.0452 0.8661 -0.0522 0.9584 -1.7427 1.6524
HML -1.0782 0.6886 -1.5658 0.1174 -2.4278 0.2714
RMW -0.1397 0.8304 -0.1682 0.8664 -1.7672 1.4879
CMA -0.6245 0.5075 -1.2305 0.2185 -1.6193 0.3702
==============================================================================
Covariance estimator:
HeteroskedasticCovariance
See full_summary for complete results