Web Scraping W/ Python For Yield Arbitrage
If you've ever tried to look up a corporate bond quote on the fly without a broker or a Bloomberg, you know that it's incredibly annoying and difficult. Even if you have these resources available to you, you probably still find it tedious to analyze data as you'd like. I'd often wondered why bond quotes weren't accessible to the general public like stock quotes were.
When I first became interested in financial markets one of my favorite websites was finviz.com. Finviz is an incredible free resource that allows you to filter and analyze equities with seemingly endless criteria to choose from. However, when I became interested in exploring credit, I found that there was no such equivalent. I couldn't even find yields for corporate bonds.
After scouring the internet for some time, I found a lead at finra.org, the Financial Industry Regulatory Authority's website. FINRA is an organization authorized by Congress to regulate brokerages and exchanges. They operate TRACE, the Trade Reporting and Compliance Engine, which was created in 2002. This engine supposedly enhances the integrity of the market by providing fixed-income transactions history access to individuals but, you'd never find it unless you knew exactly what you were looking for. FINRA's TRACE is also painfully tedious to use and basically worthless for learning anything about the market although, it did have the raw data that I wanted.
Recalling some of my Python skills from this past summer of playing around with basic simple moving average and sentiment based stock-trading algorithms, I got to work on a new module. I just needed a few libraries and a webdriver to start: Pandas, BeautifulSoup, and Selenium. Selenium allowed me to pop up a new web browser to be operated by my code.
I had webdriver.Chrome access FINRA's bond center through the browser. Then, I used time.sleep(5) to have the module wait 5 seconds before proceeding to the next step to better hide the fact that the browser was being operated by my module and not a human and to seem as if the module was reading the user agreement about to be violated. According to the agreement, FINRA may track my use and might just terminate my access to the database, which wouldn't be the end of the world. Then, using a few driver.find_element_by_xpath lines, I had the module click agree, navigate to the advanced search form, select corporate bonds, select the ratings I desired, and click search to return the results with time.sleeps all throughout.
The results for Moody's A1-Aaa with S&P's AAA-A- were 5,071 bonds or 254 pages. So, I wrote a few more lines instructing the module to prepare the raw data to be exported to Excel while sifting through each page slowly like a human. The logic behind this step was: while waiting 2 seconds to click the next button, parse the html for each container of bond information, and append the data until the next button can no longer be clicked. Then, I had each container of bond data organized by its html headers: issuer name, symbol, callable, sub-product type, coupon, maturity, ratings, price, and yield. Thanks to openpyxl, df.to_excel('data.xlsx') printed the data into an Excel spreadsheet titled data. os.startfile('data.xlsx') opened it up for me.
In theory, bonds with consistent duration and ratings should have relatively similar yields. Sorting and filtering my new data allows me to find outlying excessive yields for possible arbitrage opportunities. Knowing categorical yields aids me in having a better understanding of the corporate bond market.
done, this is awsome!
Its get me to parse hub and finish with no results
Is the code available ?
This is a really fascinating article, John!