Real Portfolio Optimization w/ Python, Pandas, DataFrame Concatenation, Excel Data Solver

John Hupperts

Published Mar 31, 2019

An optimized portfolio is one in which capital is distributed among the portfolio's assets according to the objective of minimum portfolio variance or maximum Sharpe ratio. It can be the highest risk/reward allocation of capital among the efficient frontier of the portfolio constituents. Portfolio optimization is typically an academic practice because it is extremely cumbersome, difficult, data-dependent, and geometrically increases in number of calculations by assets. However, if you really did have access to historical data and could automate the process of retrieving the portfolio assets' relative data as well as calculate some important metrics, you could actually optimize a real portfolio of assets, despite the process being practically humanly impossible. This was the goal of my most recent Python project.

The beauty of the module I have coded is that you can augment your intuition with computational brute force. You can input any list of equities that you are considering, whether it be from your magic 8-ball, gut-feeling, equity research, the entire S&P 500, or Jim Cramer, and my module will quickly and effectively optimize your portfolio. You could even use my previous equity risk module to discover the most attractive equities according to their Sharpe ratio and create a list to be optimized. Since much of the logic in this module is transferable from my previous article, I'll try to skate over most of the technical nonsense so your eyes don't glaze over.

After importing a few libraries and functionalities, the module begins with the assignment of a list of equity securities. This is where you enter the equity tickers you'd like to have in your portfolio. You could even download and format a json of another portfolio or the entire S&P 500 for example, as a list if you didn't even have an idea of the stocks you wanted. Then, I format this list into a dictionary and finally, into a data frame. Next, I create some empty lists for expected returns and standard deviations of equities in your portfolio. We'll need these later as variables in creating a correlation matrix.

Now, the data retrieval and calculations begin here. For each symbol in your list of stock symbols, a data frame is created from the symbol's IEX 5-year daily data api json. The expected return and variability of the symbol is calculated from the average and standard deviation of the differences of the stock's daily return and the daily risk-free return. These two calculations are appended to the empty lists created at the beginning.

Then, still within this symbol's for-loop, another for-loop begins. The purpose of this loop is to calculate the symbol's correlation with each of the other symbols in the portfolio. So, I create an empty correlation list to be filled with this for-loop. For each symbol in the subordinate stocksymbols list, create a dataframe of its IEX 5-year daily data api json. Calculate the expected return and standard deviation of the subordinate symbol. Then, calculate the co-variance of the symbol and the subordinate symbol by calculating the average of the product of (daily returns minus the average of daily returns) of both. Next, calculate the correlation by dividing the co-variance by the product of the standard deviations of the daily returns of the symbol and the subordinate symbol. Append the empty correlationlist with the correlation. This loop continues until all correlations between the symbol and the subordinate symbols have been calculated, at which point, it concatenates the dataframe of the correlationlist to the bigdataframe, which is becoming our correlation matrix, exits the subordinate loop, and the exterior for-loop begins everything all over again.

After all the correlations between each symbol and all the stocksymbols' other symbols have been concatenated into a bigdataframe, our returnslist and standdevlist from the outer for-loop are converted to dataframes, and concatenated to the bigdataframe. I rename the columns of the bigdataframe using a blank, stocksymbols list, "Returns", and "Standard Deviation". Then, dfbig.to_excel('CorrelationMatrix.xlsx') creates an Excel spreadsheet for me. os.startfile opens it up for me.

Now, depending on the length of your portfolio, the module likely does all of this in a very short matter of time. However, like I said earlier, because this involves a for-loop within a for-loop to create a correlation matrix of all the symbols in your list, you can see how this could quickly and geometrically get out of hand. But, for the calculations of any reasonably diversified portfolio, you wouldn't have to wait too long. My 8725 symbol for-loop from my last project took about an hour and a half so, if you had the same for-loop within the for-loop like (8725 for-loops within each of the 8725 for-loops) it would take around maybe 114,188,438 hours or 4,757,852 days for my computer to construct the correlation matrix.

But anyways, now that you've got the correlation matrix, standard deviations, and expected returns of the symbols in your portfolio in an Excel spreadsheet, you can calculate the optimal portfolio or minimum variance portfolio using Excel's datasolver add-in. Here are the parameters for the datasolver: for the objective of minimizing the portfolio variance (which is the product of the transposed weights matrix by (the product of the correlation matrix and the weight matrix)), change the weights subject to the constraint that the sum of them is equal to one or 100% of your portfolio. You could even invest on margin however, and alter this constraint to assume your margin requirement if there was an even more optimized portfolio that required more exposure. But, that's really it. The solver will fill in the weights so that you know what proportion of your capital to invest in each asset within the portfolio so that you receive the lowest variance by minimizing variance or the highest amount of return relative to your risk by maximizing the Sharpe ratio. And now that I've spent the time coding the module, the next time I have a portfolio idea, I can simply insert my list, click once to run the module, spend 30 seconds writing in my optimization parameters, and I've got an optimized portfolio!

Just as a bonus, I think this could be a great idea for a smart-beta optimization weighted exchange traded fund maybe using the S&P 500 companies. It would be attractive to investors because there would be no speculative adjustment, rather just an empirically observed optimal allocation of capital exposure to the broad market. You could have the fund re-balance according to the optimization maybe once a month. Assuming history is indicative of the future, but without putting all your eggs in one basket, you sort of get the best of both worlds and I bet it'd outperform a basic market-capitalization-weighted ETF of the S&P 500.

Real Portfolio Optimization w/ Python, Pandas, DataFrame Concatenation, Excel Data Solver

John Hupperts

More articles by this author

Explore content categories

My First VBA Macro at IBM

Jun 8, 2019

PortfolioOptimization2.0: No Microsoft!

Apr 29, 2019

Data Frame Slicing w/ Pandas For Equity Risk Arbitrage

Mar 27, 2019

Web Scraping W/ Python For Yield Arbitrage

Feb 21, 2019

Discovering the Quantity Theory of Money

Jan 31, 2019

Monetary System Visual and a Fundamental Flaw of Decentralized Cryptocurrencies

Jan 1, 2019

Valuating Bitcoin W/ Logarithmic Regression

Dec 19, 2018

Deal or No Deal: A Game of Equity

Dec 16, 2018

Explore content categories