Web Scraping W/ Python For Yield Arbitrage

John Hupperts

Published Feb 21, 2019

If you've ever tried to look up a corporate bond quote on the fly without a broker or a Bloomberg, you know that it's incredibly annoying and difficult. Even if you have these resources available to you, you probably still find it tedious to analyze data as you'd like. I'd often wondered why bond quotes weren't accessible to the general public like stock quotes were.

When I first became interested in financial markets one of my favorite websites was finviz.com. Finviz is an incredible free resource that allows you to filter and analyze equities with seemingly endless criteria to choose from. However, when I became interested in exploring credit, I found that there was no such equivalent. I couldn't even find yields for corporate bonds.

After scouring the internet for some time, I found a lead at finra.org, the Financial Industry Regulatory Authority's website. FINRA is an organization authorized by Congress to regulate brokerages and exchanges. They operate TRACE, the Trade Reporting and Compliance Engine, which was created in 2002. This engine supposedly enhances the integrity of the market by providing fixed-income transactions history access to individuals but, you'd never find it unless you knew exactly what you were looking for. FINRA's TRACE is also painfully tedious to use and basically worthless for learning anything about the market although, it did have the raw data that I wanted.

Recalling some of my Python skills from this past summer of playing around with basic simple moving average and sentiment based stock-trading algorithms, I got to work on a new module. I just needed a few libraries and a webdriver to start: Pandas, BeautifulSoup, and Selenium. Selenium allowed me to pop up a new web browser to be operated by my code.

I had webdriver.Chrome access FINRA's bond center through the browser. Then, I used time.sleep(5) to have the module wait 5 seconds before proceeding to the next step to better hide the fact that the browser was being operated by my module and not a human and to seem as if the module was reading the user agreement about to be violated. According to the agreement, FINRA may track my use and might just terminate my access to the database, which wouldn't be the end of the world. Then, using a few driver.find_element_by_xpath lines, I had the module click agree, navigate to the advanced search form, select corporate bonds, select the ratings I desired, and click search to return the results with time.sleeps all throughout.

The results for Moody's A1-Aaa with S&P's AAA-A- were 5,071 bonds or 254 pages. So, I wrote a few more lines instructing the module to prepare the raw data to be exported to Excel while sifting through each page slowly like a human. The logic behind this step was: while waiting 2 seconds to click the next button, parse the html for each container of bond information, and append the data until the next button can no longer be clicked. Then, I had each container of bond data organized by its html headers: issuer name, symbol, callable, sub-product type, coupon, maturity, ratings, price, and yield. Thanks to openpyxl, df.to_excel('data.xlsx') printed the data into an Excel spreadsheet titled data. os.startfile('data.xlsx') opened it up for me.

In theory, bonds with consistent duration and ratings should have relatively similar yields. Sorting and filtering my new data allows me to find outlying excessive yields for possible arbitrage opportunities. Knowing categorical yields aids me in having a better understanding of the corporate bond market.

Diego Rebissoni 6y

done, this is awsome!

1 Reaction

Diego Rebissoni 6y

Its get me to parse hub and finish with no results

Diego Rebissoni 6y

Is the code available ?

Jess L. 7y

This is a really fascinating article, John!

1 Reaction

See more comments

To view or add a comment, sign in

Web Scraping W/ Python For Yield Arbitrage

John Hupperts

More articles by John Hupperts

Others also viewed

🧠 When Research Turns Real: Building a Working Agent from the CodeAct Paper

When Prompts Can't Enforce: Why Deterministic Validation Is Replacing LLM Constraint Requests

Weighting optimisation: from Markowitz's theory to practice with Python

Is Gradient Decent Optimizer An Alchemy of Loss?

Building Agents 101

Human-in-the-Loop RAG

A Funny Thing Happened on the Way to the Release

NucleusIQ v0.6.0 — Build AI Agents in Python Without the Complexity

Two-Pointer Technique: The Simple Trick That Solves Complex Problems

Parallel Web Scraping and API connection — a way to save lots of time (Part I — R).

Explore content categories

More articles by John Hupperts

My First VBA Macro at IBM

PortfolioOptimization2.0: No Microsoft!

Real Portfolio Optimization w/ Python, Pandas, DataFrame Concatenation, Excel Data Solver

Data Frame Slicing w/ Pandas For Equity Risk Arbitrage

Discovering the Quantity Theory of Money

Monetary System Visual and a Fundamental Flaw of Decentralized Cryptocurrencies

Valuating Bitcoin W/ Logarithmic Regression

Deal or No Deal: A Game of Equity

Others also viewed

🧠 When Research Turns Real: Building a Working Agent from the CodeAct Paper

When Prompts Can't Enforce: Why Deterministic Validation Is Replacing LLM Constraint Requests

Weighting optimisation: from Markowitz's theory to practice with Python

Is Gradient Decent Optimizer An Alchemy of Loss?

Building Agents 101

Human-in-the-Loop RAG

A Funny Thing Happened on the Way to the Release

NucleusIQ v0.6.0 — Build AI Agents in Python Without the Complexity

Two-Pointer Technique: The Simple Trick That Solves Complex Problems

Parallel Web Scraping and API connection — a way to save lots of time (Part I — R).

Similar topics

Bond Yield Analysis

Explore content categories