Separating Workflow from Computation: A Lesson from Building a Backtesting Engine
I’m deeply grateful to the leaders and senior engineers at Amazon who taught me invaluable design principles. One of the most impactful was separating workflow from computation. This lesson became clear when my team had to build a simulator for Amazon’s retail pricing systems. The original system tightly coupled core logic with data sources, forcing us to duplicate logic when creating the simulator. Maintaining two codebases quickly became a nightmare. Every time we ran tests, the results from the simulator often didn’t match reality, and we had to spend countless hours investigating why. Was it a bug in the simulator? Or was it a discrepancy in the logic between the two systems? Each mismatch required careful debugging and examination of the same logic in both codebases. It was a vicious cycle of inefficiency and frustration.
The real pain point was the duplication of effort. Writing a test wasn’t enough—we had to reimplement the same logic in production to ensure consistency. This not only doubled the work but also introduced the risk of subtle differences creeping in over time. The longer we maintained two codebases, the harder it became to keep them in sync. It wasn’t until we refactored the system—making the logic stateless and decoupling data access—that we unified the codebase, creating a maintainable and scalable solution.
The principle is simple: isolate the how (core logic) from the what (data workflow). This allows the same logic to be reused across contexts—backtesting, live trading, or simulation—without duplication. It’s a design philosophy that has since become a cornerstone of my approach to building robust, scalable systems.
A Simple Example
To illustrate this principle, let’s consider a trading system that calculates a stock’s average price over a time window. In a tightly coupled design, the logic for calculating the average price might be intertwined with the code that fetches the stock data. Here’s what that might look like:
# Tightly coupled design
def calculate_average_price(stock_ticker, start_time, end_time):
stock_data = fetch_stock_data_from_database(stock_ticker, start_time, end_time) # Workflow
total_price = sum(data['price'] for data in stock_data) # Computation
return total_price / len(stock_data)
In this example, the calculate_average_price function is responsible for both fetching the data and performing the computation. This makes it difficult to reuse the core logic in other contexts, such as a backtester that uses historical data or a simulator that generates synthetic data.
Now, let’s refactor this to separate the workflow (data fetching) from the computation (average price calculation):
# Separated design: Core logic
def calculate_average_price(stock_data):
return sum(data['price'] for data in stock_data) / len(stock_data)
# Separated design: Workflow
def fetch_and_calculate_average_price(stock_ticker, start_time, end_time):
stock_data = fetch_stock_data_from_database(stock_ticker, start_time, end_time)
return calculate_average_price(stock_data)
In this refactored version, the calculate_average_price function is now purely focused on the computation and knows nothing about where the data comes from. The fetch_and_calculate_average_price function handles the workflow, fetching the data and passing it to the core logic. This separation allows us to reuse the calculate_average_price function in different contexts, such as a backtester or simulator, without duplicating code.
For example, in a backtester, we might use historical data stored in a CSV file:
def fetch_and_calculate_average_price_from_csv(stock_ticker, start_time, end_time):
stock_data = fetch_stock_data_from_csv(stock_ticker, start_time, end_time)
return calculate_average_price(stock_data)
By separating workflow from computation, we’ve created a more modular, reusable, and testable system. This principle is especially powerful when building complex systems like backtesters and live execution engines, where the same core logic often needs to operate in multiple contexts.
Recommended by LinkedIn
Applying the Principle to Backtesting
When designing our backtesting engine, we faced a critical decision that highlighted the importance of separating workflow from computation. We had two potential approaches:
Initially, the team leaned toward the vectorized approach. After all, it appeared to be the more optimized solution for speed, especially when dealing with large datasets. However, we soon realized that this approach came with a significant downside: it made it nearly impossible to reuse the same logic between the backtester and the live trading system. In live trading, data isn’t available all at once—it’s streamed tick by tick. If the backtester relied on preloaded data and vectorized computations, we’d have to maintain two separate codebases, duplicating the core logic and introducing potential inconsistencies.
This is where the principle of separating workflow from computation guided our decision-making. By choosing the event-based approach, we ensured that the core logic remained independent of how data was fed into it. Whether the data came from a preloaded dataset in the backtester or a live stream in production, the same logic could be reused without modification. This not only reduced maintenance overhead but also ensured that the backtester behaved as closely as possible to the live environment, improving the reliability of our testing.
Of course, the concern with the event-based approach was performance. Would processing data tick by tick be too slow for practical use? To address this, we profiled the system and discovered something surprising: the speed bottleneck wasn’t in the logic itself but in the data loading process. Armed with this insight, we focused our optimization efforts on building a fast-loading cache for the backtester. The result was a system that combined the best of both worlds—speed comparable to a vectorized approach and the reusability of an event-based design.
The Hidden Benefit: Faster Iteration
Over time, we realized an additional, significant benefit of our chosen approach: faster iteration. Because the same code used in the backtester could be seamlessly deployed to shadow production and then live trading, we were able to move from analysis to backtesting to live execution within days. Once the backtest code was written, it could be deployed to shadow trading the next day and go live soon after. This rapid iteration cycle allowed us to test and deploy strategies faster than ever before.
In hindsight, the ability to save a few seconds in a single backtest run paled in comparison to the competitive advantage of being able to bring new strategies to market quickly. By prioritizing modularity, reusability, and alignment with live trading, we built a system that not only performed well but also enabled us to adapt and innovate at speed.
Conclusion
This experience reinforced the value of making architectural decisions that prioritize long-term flexibility and operational efficiency over short-term gains. By separating workflow from computation, we created a system that was not only fast and reliable but also adaptable to the ever-changing demands of live trading. It’s a testament to the power of thoughtful design principles and the importance of profiling to identify true performance bottlenecks.
I’m grateful to have learned these lessons and principles from the leaders and colleagues at Amazon. Their guidance has been instrumental in shaping my approach to building systems, and I’m excited to pass on this knowledge so it can benefit more people. As we continue to build and refine our systems, I’m reminded of the incredible value of designing with reusability and modularity in mind. It’s a principle that continues to guide my work today, and I’m excited to see how it will shape the systems of tomorrow.
"One-Code-Base - BTL" occasionally haunts my dreams.
Hi Jimmy, completely resonate with your thought on separating logic/worflow from data. You might like our thoughts on similar lines for an enterprise usage of data for business intelligence, analytics and AI. https://www.garudax.id/pulse/why-cloud-native-applications-replacing-traditional-data-bi-bcigf/?trackingId=ud%2BksSTSLApOwJYgSFu04w%3D%3D
I just hope Guice is less central in your trading system then in the pricing simulator