Trading Made Botter: Mastering Backtesting: Demystifying the Backbone of Algorithmic Trading

Backtesting is often hailed as the cornerstone of algorithmic trading, yet it remains one of the most misunderstood steps in the entire process. I remember poring over countless articles and books on the subject, only to be left with vague concepts and little practical clarity.

Man asking himself if the curve will go up

In this article, we’ll dive deep into the concept of backtesting, shedding light on its theory and uncovering the seven most common pitfalls to avoid. Let’s unravel this critical yet elusive part of quantitative finance together!

Backtesting, What Is It?

Backtesting is like watching a replay of your trading strategy to see how it would have performed on past data. Basically, you feed your strategy with historical data to check if it would’ve been a winner in the past. As E. Chan puts it in Algorithmic Trading:

"Backtesting is the process of feeding historical data to your trading strategy to see how it would have performed."

In simple terms, it’s about checking how your strategy played in the past, hoping it’ll keep shining in the future. It’s a super crucial step (arguably the most important one) when you’re building your trading bot.

There are several ways to approach backtesting, which can be split into two main schools of thought:

The time-based method: You work with historical data and see how the strategy would’ve performed over different time periods.
The statistical or combinatorial method: Here, you create scenarios using historical data to test whether your strategy holds up in various contexts.

Both methods share the same goal: ensuring the strategy can stand the test of time. Now, there’s always a bit of debate between fans of the two approaches, but the statistical method is often preferred by the pros. And it’s not just about showing off. There are legit reasons behind it (I’ll write a separate article comparing them in detail, but that’s for another time).

That said, the time-based method is still the most popular because it’s quick to set up and less of a headache. But don’t get too carried away with your backtest, even if it looks perfect. As M.L. De Prado says in Advances in Financial Machine Learning:

"Even if your backtest is flawless, it is probably wrong."

Translation? Stay humble—no matter how shiny your backtest results look, there’s a good chance they’re misleading!

Python Frameworks for Backtesting: Navigating the Jungle of Tools

Alright, now that we know what backtesting is, let’s get down to business: which tools should you use to test your strategies in Python? Here’s a handy guide to help you navigate the jungle of frameworks.

Table of the Backtesting Libraries for Python

My Personal Choice

Personally, I no longer rely on prebuilt backtesting libraries. Why? Because it allows me to control everything—absolutely everything—in my code. No unnecessary features, I can get exactly the metrics I want, and I have the freedom to run my own tests. On the flip side, it means I have to be super careful about coding errors, and it does take a bit more time. But honestly, I think the flexibility and complete control over the process more than make up for it.

And let’s be real—it forces me to truly understand what’s happening under the hood, especially when it comes to calculating PnLs (profit and loss). And honestly, that deep understanding of trading is incredibly valuable.

Finally, as you’ll see, the methodology for backtesting isn’t all that complicated. With a bit of practice, you’ll even be able to replicate it quite easily.

The Temporal Backtesting Method

There are two main approaches to temporal backtesting, each suited to specific situations:

The Classic Train-Test Split
Walk Forward Testing

1. Train-Test Split: The Classic Method

The basic idea here is to divide your dataset into two independent parts: one for training your strategy (train set) and the other for validation (test set). Then, you train your model on the train set and test its performance on the test set, assuming that the relationship learned in the early data will hold for future data.

The traditional split is typically 80/20, where you use 80% of the data for training and 20% for testing. This is a common choice, somewhat based on the Pareto principle (80% of the results come from 20% of the efforts). But be careful: in trading, overfitting is way too easy. The model might fit the training data perfectly but fail when confronted with unseen situations.

Personally, I prefer a more balanced split, 50/50. Why? Because the more data you have for testing your model, the better you can evaluate its robustness in real-world situations. Plus, you reduce the risk of overfitting and increase the likelihood that your test set is truly representative of future market conditions.

The issue with the 80/20 split is that sometimes the remaining 20% doesn't cover the full variety of future market scenarios. On the other hand, with a 50/50 split, you increase your chances of having a test sample that captures the market’s fluctuations more effectively, especially over long periods. This is really my first step when exploring a concept and testing how it reacts to real market movements.

Pro tip: Also, consider adding a "gap" between your train and test sets to avoid lookahead bias (the influence of future data on past data). In trading, information tends to persist, and the results at the end of the train set might influence the test set. A small gap of 10 data points, for instance, can be enough to remove this bias.

2. Walk Forward Testing: For Adaptive Strategies

When it comes to adaptive strategies, the traditional train-test split becomes a bit too rigid. Here, your model can't stay fixed on a set training window, as it needs to constantly adjust to market changes. That's where Walk Forward Testing comes into play.

The idea? Divide your data into several samples, and then, for each sample, perform a series of optimizations. In practice:

Optimize the parameters of your strategy on the training set of a sample,
Test that strategy on the test set of the same sample,
Move on to the next sample, and repeat the process.

The advantage of walk forward over the classic train-test split is that it allows your strategy to adapt to market changes, with parameters evolving at each iteration. This gives you a better sense of how robust your model is over time.

But be cautious, as this method comes with its own challenges:

You’re reducing the size of your train and test sets, so you need a lot of data to ensure each sample is representative.
Since each test set may be relatively small, you risk getting biased results if your samples don't cover a broad range of market conditions. This is where a 50/50 split becomes crucial at each iteration! It maximizes data diversity and minimizes the risk of bias.

Important: Make sure the sample windows are fixed in length, and that it’s the entire set of samples that moves. If you only vary the test window, you increase the chances of having training data that dominate your results, which can lead to overfitting. You should also use a gap between your train-test split, just like in the simple case.

To Conclude ...

Remember, the goal of backtesting is not to find an edge on your train set (that’s easy with an 80-20 split, or with a too-permissive walk forward, but that’s a trap). No, the goal is to validate a real edge on a test set that’s as large and independent as possible, because that’s where the reality of the market unfolds. Trust me, you’ll thank me later when your strategy becomes stronger and less prone to unpredictable market fluctuations. That’s the beauty of the 50/50 split: You’re not just building a model that performs well on the past, you’re building a strategy that holds up in the future. And that’s far more valuable than any flashy result on your train set.😁

The Pitfalls of Backtesting: Beware of False Hopes

As you’ve probably realized, backtesting can be an extremely powerful tool. But once you start analyzing your strategies on historical data, there are several traps to avoid so you don’t fall into illusions that could make you lose control of your approach. There might be some repetition here… but that's the essence of pedagogy! Here are some of the most common pitfalls:

1. Overfitting the Model: When Too Much Performance Kills Performance

Overfitting is a classic trap that occurs when a model is too finely tuned to past data, creating the illusion of exceptional performance. But this illusion is often misleading because markets are constantly evolving. A model that’s overly optimized will learn anomalies or noise that won’t repeat in the future, making its predictions unreliable. It’s crucial to test the model's robustness over different periods and with fresh data. In other words, “Backtesting while researching is like drinking and driving. Do not research under the influence of a backtest.” (De Prado) This means a backtest should not be the sole driver of analysis, without considering rigorous tests on diverse and unbiased data.

2. Ignoring Transaction Costs: A Costly Trap

Another classic trap: forgetting to factor in transaction costs in your backtest. Between the spreads, brokerage commissions, and slippage (the gap between the order and its actual execution), these costs can seriously eat into your profits. Sometimes, you’re so focused on the buy and sell signals that you forget the real costs associated with the trades. These fees might seem small, but over a large number of trades, they can make a big difference. If you don't include them in your backtest, you might think your strategy is more profitable than it really is.

3. Survivorship Bias: Watch Out for Fake Winning Results

Survivorship bias appears when you backtest a strategy multiple times, tweaking parameters or changing certain features. After testing and adjusting so much, you might end up with a strategy that looks incredibly profitable. But beware, these good results might just be a product of chance or overfitting the data. In reality, it’s crucial to consider how many tests you ran before judging a strategy’s validity. The more tests you perform, the higher the chance of getting an exceptional result, but that doesn’t guarantee it’ll be reproducible. As M.L. De Prado says, “Every backtest result must be reported in conjunction with all the trials involved in its production. Absent that information, it is impossible to assess the backtest’s ‘false discovery’ probability.” In practice, it’s essential to always report backtest results alongside details of all the strategies and their outcomes, so you can calculate the probability of a false discovery. It’s laborious, but it’s the right approach. However, I must admit that I don’t always do this, but keeping this in mind helps me qualitatively assess my results. (I’ve switched to a statistical approach, which has been much more helpful in this regard!)

4. The Lookahead Bias: Beware of the Future

Lookahead bias occurs when you use future information to make decisions about the past. For example, you might look at today's company results while testing your strategy on past periods. This is a mistake because, in reality, you wouldn’t have had access to that data at the time. This bias allows you to make unrealistic choices, and if you base your decisions on it, you'll get backtest results that are far too optimistic.

5. An Unrepresentative Growth Phase: Beware of Exploding Stocks

Imagine you're backtesting a strategy on stocks of companies that are experiencing rapid growth during your analysis period, but don't maintain that performance once you go live. The danger here is getting tricked by temporary phases of high growth that aren't reflective of the future performance of those stocks. If you don’t ensure that past performance is stable and sustainable, you might end up betting on a strategy that looks brilliant on paper but fails in the long run.

6. Delisted Assets: An Easy Trap to Avoid

Another pitfall to watch out for involves assets that no longer exist. If you backtest a strategy on a portfolio that includes stocks that have been delisted, you risk creating an illusion of performance. These stocks are no longer available, meaning you can't actually buy or sell them in the future. In other words, you might base your strategy on information that's no longer actionable.

7. Liquidity: The Reality of the Market

Liquidity is another factor that's easy to overlook in a backtest. When testing a strategy, you may end up backtesting an asset with low liquidity. The result? If you were to place an order in reality, you might have to pay a much higher price than expected or fail to buy/sell at the price you wanted. In practice, trading stocks or assets with low liquidity will introduce additional costs, and your backtest results will be distorted if you don't account for this reality.

Finally, to Conclude

Backtesting is an essential step to test and validate your trading strategy, but be careful not to fall into common traps. Here are the key points to keep in mind:

Overfitting: Don’t be blinded by perfect results on your historical data; this could be overfitting or a false positive.
Transaction costs: Don’t forget to include real-world costs in your backtests for more realistic results.
Biases: Be vigilant about survivor bias, lookahead bias, and periods of unsustainable growth that can skew your analysis.

With these points in mind, you can avoid illusions and work on a more robust strategy that can actually be applied in the real market.

However, as M.L. De Prado says:

"Backtesting is not a research tool. Feature importance is."

Backtesting is a powerful tool, but it shouldn’t be confused with a strategy in itself. It’s meant to test and refine ideas, not to replace the design work. Just like in my previous article about trading bots: a bot without a solid design won’t be profitable. De Prado advocates an approach where market understanding comes through features, and that’s how we make this understanding comprehensible to an algorithm.

To dive deeper, I invite you to check out my article on backtesting explained step by step, with concrete code, to better integrate these concepts and apply them effectively to your trading strategies.

Don’t hesitate to comment, share, and most importantly, code!
I wish you an excellent day and lots of success in your trading projects!
La Bise et à très vite! ✌️

References:

"Algorithmic Trading: Winning Strategies and Their Rationale", Ernie Chan, May 2013.
"Advances in Financial Machine Learning", Marcos López de Prado, May 2018.

Mastering Backtesting: Demystifying the Backbone of Algorithmic Trading