Reverse-Engineering Screaming Frog: What I Learned Building a Python API Around Crawl Data
For years, my workflow with Screaming Frog looked the same:
It worked. But it was deeply manual. Last month I started asking a simple question:
What if crawl data didn’t have to leave code at all? This article is a summary of what I learned during the first month of building a Python library to read and automate Screaming Frog crawl data programmatically.
Not a launch post. Just a technical breakdown of the journey so far.
1. The Real Limitation Wasn’t Configuration — It Was Output
I had already built sf-config-tool, which lets you configure Screaming Frog crawls programmatically. That solved half the problem. But even using the CLI, the only way to extract data was through predefined exports: tab/filter combinations as flat CSV files.
That’s when I looked at the .dbseospider file format more closely.
2. What’s Inside a .dbseospider File?
A .dbseospider file is essentially:
That changes everything. Instead of exporting CSVs, you can connect directly to the database from Python.
Inside that database:
Even better, derby contains fields not surfaced in default GUI views/exports. Accessing it was the first breakthrough.
3. Raw Data Is Not Usable Data
Once the database was accessible, a new problem appeared. The internal column names are not the ones users recognize.
Instead of:
You get:
To make this usable, I exported every possible tab and filter from Screaming Frog — 628 CSV files — to understand how internal schema maps to user-facing labels. This mapping layer is now what allows querying crawl data in Python using familiar column names.
4. The Link Graph Is the Real Power
One of the most powerful discoveries was the internal link graph. Screaming Frog stores every relationship:
In the GUI, you explore this manually. In code, it becomes a graph you can traverse.
That enables:
Recommended by LinkedIn
All scriptable. All reproducible.
5. Computed Fields Required Rerunning calculations
Not all values in the GUI are stored directly. “Indexability,” for example, is computed from:
To reproduce GUI-consistent results, I had to reconstruct mappings and computed logic from Screaming Frog’s decision logic and replicate it in SQL and Python. Accessing raw data isn’t enough. You need to reconstruct the tool’s internal business logic.
6. The Format Problem
Screaming Frog has three formats:
The CLI can export CSVs and save .seospider, but not .dbseospider directly. That’s a major constraint.
The workaround:
Now any crawl can become a portable, queryable database.
7. Full Automation
With:
It becomes possible to:
All from Python. No GUI. No manual exports.
Real World Use Case:
A monthly enterprise crawl (~80k URLs) where the SEO team needs to answer, every time:
Before: manual GUI filtering + multiple CSV exports + spreadsheet joins. Now: one Python workflow that queries Derby directly and outputs a prioritized fix list in minutes, fully reproducible month to month.
8. Alpha and Early Feedback
The first alpha batch is now running on real sites.
Some early feedback:
The most interesting part isn’t speed.
It’s that workflows that were previously manual can now be automated end-to-end.