Timing Luck in Factor Investing
How robust are your backtests, really?
SUMMARY
- The starting date matters for a backtest
- Annual returns can vary more than 10%, challenging robustness
- Averaging into portfolios can reduce the timing luck
INTRODUCTION
The objective of quantitative investing is to separate skill from luck by applying a rigorous, evidence-based approach to strategy evaluation. In practice, this means conducting backtests over long horizons that span multiple market regimes, using a realistic universe of tradable instruments, incorporating transaction and market impact costs, and validating results through out-of-sample tests or across different markets.
All strategies are sensitive to modeling assumptions, though some are more consequential than others. For example, high-turnover approaches such as statistical arbitrage are particularly exposed to transaction costs, whereas lower-turnover equity strategies like quality investing are less affected, as their underlying fundamentals evolve more slowly.
One assumption that is often overlooked – but still materially important – is deceptively simple: the choice of starting date. In this article, we examine the impact of timing luck, using the momentum factor as a case study.
TIMING LUCK & THE MOMENTUM FACTOR
We construct a long–short momentum index for the U.S. equity market by ranking stocks based on their past 12-month returns, excluding the most recent month. The top 20% form the long portfolio, while the bottom 20% form the short portfolio. The strategy is rebalanced monthly.
The first portfolio is initiated on Monday, February 25, 2002, and additional portfolios are launched on each subsequent trading day. Comparing the performance of the five portfolios formed during the first week reveals remarkably similar trajectories over the roughly 25-year period from 2002 to 2026. Nevertheless, small differences persist: the portfolio initiated on Monday delivers the weakest performance, while the one initiated on Friday performs the best.
However, when we extend the comparison to all portfolios initiated during the first month, the dispersion becomes much more pronounced. While the performance paths remain highly similar – each capturing common features such as the momentum crash during the Global Financial Crisis in 2009 – the terminal values diverge significantly.
Comparing the CAGRs of the 21 momentum indices from 2002 to 2026 reveals a range of 2.4% to 3.7%. These figures are reported before transaction costs, market impact, and management fees. Once such frictions are incorporated, the lowest-performing portfolio would likely have delivered little to no net return. In real terms – after adjusting for inflation – this would translate into a materially negative outcome.
We can extend the analysis by examining the year-by-year dispersion of returns. In certain periods, such as 2021, the differences are particularly striking: the best-performing momentum portfolio generated a positive return, while both the average and worst-performing portfolios recorded losses.
The average spread between the worst- and best-performing momentum portfolios was 5.2% over the 2002 – 2026 period. Intuitively, one might expect the largest dispersion to occur during the Global Financial Crisis; however, the widest gaps instead appear during the COVID-19 period in 2021 and the subsequent bear market in 2022.
Source: Finominal
FURTHER THOUGHTS
Scientific investing aims to produce results that are robust and, in principle, reproducible under similar assumptions and data. But if a backtest yields return differences of more than 10% simply due to the choice of starting date, how scientific are those results?
The answer is: not very. While quantitative research has become increasingly sophisticated, relatively little attention has been paid to the impact of timing luck, except perhaps by Corey Hoffstein (read Rebalance Timing Luck: The (Dumb) Luck of Smart Beta). This is a meaningful omission, as many quantitative strategies operate with thin margins and could appear far less attractive once timing effects are properly accounted for.
Fortunately, this issue is straightforward to mitigate. Rather than relying on a single start date, quant researchers can average into strategies across multiple entry points, thereby reducing sensitivity to timing and producing more robust outcomes.
RELATED RESEARCH
Quant Strategies: Theory vs Reality
Impact of Lookback Period on Momentum Factor
Factor Construction with Different Lookbacks
Factor Construction: Portfolio Scenarios
Smart Beta ETF Construction: High versus Low Factor Exposures
Factor Investing Is Dead, Long Live Factor Investing!
RELATED PAPERS
Hoffstein et al: Rebalance Timing Luck: The (Dumb) Luck of Smart Beta, 2020.
ABOUT THE AUTHOR
Nicolas Rabener is the CEO & Founder of Finominal, which empowers professional investors with data, technology, and research insights to improve their investment outcomes. Previously he created Jackdaw Capital, an award-winning quantitative hedge fund. Before that Nicolas worked at GIC and Citigroup in London and New York. Nicolas holds a Master of Finance from HHL Leipzig Graduate School of Management, is a CAIA charter holder, and enjoys endurance sports (Ironman & 100km Ultramarathon).
Connect with me on LinkedIn or X.