1Wk·

Double the performance with half the risk? Modified 2xSPYTIPS

After I @Epi post (https://getqu.in/kIkZh6/) about 2xSPYTIPS and was looking for strategy diversification anyway, I put a small percentage into this strategy. Now I would like to invest more, but it doesn't look very robust to me. That's why I've tried a lot and would like to present my results to you.

I already wrote a similar post 3 months ago, but I will go into much more depth here and at the end there will also be a strategy.


What is it all about?

The mentioned trend following strategy of @Epi includes two indices and their moving averages: S&P 500 (e.g. $SPY (+0.21%) ) and TIPS ($TIP (+0.15%) ). The idea is based on the momentum factor. In simple terms:

Markets move in cycles and you want to be in on the upside, but not on the downside. Trends often continue and the strategy tries to follow them well. As soon as a trend reversal occurs, you reallocate (between cash and the investment).

This prevents you from taking deep drawdowns (losses from the last high).

This significantly reduces the risk (which is only defined here as the maximum drawdown). @Epi The S&P 500 (hereinafter referred to as "SPY") suggests going leveraged (2x) in the event of a buy signal.


Starting position (abstract 2xSPYTIPS):

"Invest in the 2xSPY if the SPY and TIPS close above their SMA (simple moving average) at the end of the month, otherwise hold cash"

The SMA value was suggested by @Epi as 200 as it is a good and round value. However, 150 seemed to work better, which is why I use the strategy with 150 SMA as a starting point. I also use daily data from 2003 (and not monthly data from @Epi monthly data from 2000), as I only have data for the TIPS from 2003 onwards.

An additional assumption is that "cash" does not yield interest. The SPY index used is the S&P 500 Total Return Index, which includes dividends.

Furthermore, I assumed that the end of the month is nothing special and that the strategy could always be executed on the 15th, for example. The abstract version is the following: "Check the signal every n trading days and act accordingly". A month has approx. 21 trading days (approx. 251 / 12). Now you just have to define a start day. There are 21 possibilities for this (except for isomorphism).

Example 1: Start day 5 -> check signal on trading day 5, 26, 47, 68, etc.

Example 2: Start day 17 -> check signal on trading day 17, 38, 59, 80, etc.

What if you test all 21 possible start days?

attachment

This is the data from December 2003 to May 2025. The metrics tested are:

  • CAGR: annual percentage return
  • Max DD: maximum drop from the last high
  • Volatility: measure of the average daily fluctuation
  • Beta: complicated volatility, puts fluctuations in relation to a benchmark, ignores outliers (1 = fluctuates the same as benchmark, 2 = twice as strong fluctuations)
  • Alpha: annual risk-adjusted excess return compared to a benchmark in percent, uses beta for risk adjustment
  • Sharpe: simple measure for risk-adjusted excess return, uses volatility, meaningless without comparison


The benchmark is of course the SPY and the red lines show how the SPY performs. The blue lines are average values over all possible starting days.

@Epi indicates 16.2% CAGR with approx. -20% (monthly) drawdown. The minimum (daily) drawdown here is greater than -30% while the average is a miserable -50%. The 16.2% CAGR is also rather best case.

After seeing this, it was clear to me that I would not continue with this strategy. But what now?


Problems and solutions for 2xSPYTIPS

The main problem is that the strategy often reacts too slowly during periods of weakness. Especially during the Covidcrash, it can take a whole month to react.

attachment

Here we also see the 60% drawdown. The month-end strategy was pretty close to the best case and thus the strategy seemed good.

attachment

This is the strategy this year. You can also see huge differences here.

Often you are also late to the top.

Various modifications have been suggested here in the forum. For example, weekly or daily execution. However, this leads to the "problem" that many more trades are created and eat up fees (and effort). Especially when a price swings around the SMA. One way to tackle this problem is to add hysteresis (a two point switching system) where you buy when the price is e.g. 1% above SMA and sell when the price is 1% below SMA. This only solves the problem for small fluctuations and introduces additional complexity. I have opted for a cooldown approach. This involves waiting at least k days after each trade before trading again.


First interim result

2xSPYTIPS is not robust. Nevertheless, the idea of using the TIPS and SPY indicator seems interesting. My modified strategy is as follows:

"Buy a 2xlev SPY when TIPS and SPY price are above their SMA at the end of a period of n days, as long as the last trade is at least k days away. Switch to cash if TIPS or SPY are below their SMA at the end of a period of n days, as long as the last trade is at least k days away."

Now there are 3 parameters that can be optimized:

  • n - the investment interval in days (for 2xSPYTIPS this was approx. 21)
  • k - the number of cooldown days after a trade (for 2xSPYTIPS 0)
  • the value of the SMA (for 2xSPYTIPS 200 or 150 depending on the variant)


In the following simulations, I always looked at the minimum alpha and the maximum drawdown over all possible start days. Quasi as performance and risk metrics.


First simulations

First, I realized that n should probably be 1. The following is just one of many simulations that confirm this assumption. It also makes sense, as this is the quickest way to react to signal changes.

attachment

As I said, there were many other tests that led to this result, which is why I set n to 1 for the other simulations.

This means that if there is currently no cooldown, action is taken immediately when the signal changes. I have now been able to test one more parameter: SPY and TIPS SMA are now two parameters.

I was able to recognize an interesting pattern for TIPS:

attachment

There seems to be a general sweet spot here between 200 and 250 SMA. There was no such clear picture for SPY SMA. This seems to confirm once again that the TIPS indicator is extremely important for performance.

There were also two sweet spots for the cooldown days: at around 10 and around 50

Below you can see the final scatter simulation. The 3 parameters on the axes. The color represents the alpha value of the strategy. Purple is the weakest and yellow is the best. It should be noted that purple is also a top strategy, as the poor strategies are already hidden here.

attachment

There are 3 clusters (of seemingly good strategies):

  • approx. 150 SPY SMA, 200 TIPS SMA, 10 cooldown days
  • approx. 270 SPY SMA, 200 TIPS SMA, 10 cooldown days
  • approx. 270 SPY SMA, 250 TIPS SMA, 50 cooldown days

It is also important to mention that these are not individual outliers, but compact clusters with similar parameters, which represent a similarly good strategy. It can therefore be assumed that the general strategy is reasonably consistent and that a point in the middle of a lump represents a relatively robust strategy.

However, various robustness tests follow, which some strategies do not pass...


Robustness?

After this step (and a few other simulations), I picked out 12 strategies that are surrounded by other good strategies as far as possible.


The first small robustness test was a simulation of the strategies, not only from 2003-2023, as before, but from 2003-2013 and 2013-2023.

This means that the range is not just divided into two 10-year periods. In particular, the 2008 financial crisis and the 2020 Covidcrash were separated. The strategies had to show a positive alpha in both phases. 3 failed to do so, leaving 9.


Monte Carlo is Formula 1, isn't it?

It is important to avoid overfitting in order to ensure robustness. But how do you test for overfitting? This is where the Monte Carlo permutation test comes into play.

The basic idea is that the price trend of the indicator (or indicators) is "diced" anew. The daily percentage changes are randomly rearranged. The result is a permutation whose statistical properties (performance, risk, variance, etc.) correspond almost exactly to the original indicator.

Here is an example of such permutations:

attachment

Same start, same end, different progression.

The optimization is then carried out on the new course. The best parameters for the permutation process are searched for, so to speak. If the strategy found generates a higher alpha with the same or lower drawdown, then it is considered better. Otherwise it is worse.

A strategy is robust if optimized random permutations perform worse than the optimization on the original data. This proves that the strategy is really finding patterns in the data and not just learning the data. I tested over 1000 permutations between 2003-2023 and calculated a p-value for each of the 9 strategies (indicates how large the proportion of better strategies was). This should therefore be as low as possible (>0.1 away with it, <0.1 okay, <0.05 good, <0.025 very good)


PS1: Two indicators have to be permuted for our strategy (SPY and TIPS). The daily changes of both indicators were mixed up with the same (random) pattern. Thus the correlation of the indices is maintained.

PS2: long memory and volatility clustering are not preserved by permutations. This is not a problem, as the procedure recognizes overfitting, but does not prove the absence of overfitting. Since the trend following strategy is based on these properties, we can safely discard all strategies that do not pass the test, since the test is easily in our favor if these properties are missing


Second interim result

4 of the 9 strategies had a p-value > 0.1 and were eliminated. Here is an overview of the 5 strategies for the "final":

attachment

Performance on test data

Next, the strategies from 2024 - May 2025 were tested. Admittedly, this is quite a small test area, but again it is intended to be used to exclude strategies rather than to confirm robustness.

Here, all SPYTIPS strategies perform poorly in terms of alpha, as the period is relatively volatile. By way of comparison, the traditional 2xSPYTIPS generated an average alpha of -14.4 and even -22.5 in the word case during this period. The maximum drawdown was -25.3%. The benchmark market (SPY) had a drawdown of -18.8% (and an alpha of 0). Now the finalists:

attachment

The second and fourth strategies deliver weak values, while the first delivers surprisingly good values.


Second round Monte Carlo

The Monte Carlo permutation test is now performed again on the test data. This time, however, the permutation strategy is not optimized, but the same strategy is used that was used on the real data. It is therefore checked whether the strategy would also have worked on random permutations. If so, then the strategy has not been trained correctly on the patterns in the real data. The search is again on for better permutations. This time, a permutation is better if the strategy produces a higher alpha with the same or lower drawdown on the synthetic data. Again, a lower p-value is better. However, there are no good guidelines here.

(Since I ran into the picture limit, here are the values in writing:)

  • 0.109
  • 0.625
  • 0.259
  • 0.697
  • 0.205

The test once again confirms the poor performance of strategies 2 and 4, which perform even worse than random with >0.5.


Winners and a few statistics

Strategy 1 is obviously the best. It comes out on top in all tests and is even more stable and profitable on test data (measured by drawdown) than the benchmark and especially than 2xSPYTIPS. But how good is the strategy now?

attachment

It even clearly beats the best case scenario of the 2xSPYTIPS strategy. It achieves almost double the SPY performance with half the maximum drawdown.

However, it not only requires almost 5 trades per year, but you usually have to react to the indicators on a daily basis, which makes it one of the more active strategies (and thus contradicts the idea that it should be a simple strategy with little effort. You could also test weekly execution again here, as all the data is available on Sunday, for example, and you can trade on Monday morning. That would probably come at the expense of risk). I have called it 2xSPYTIPS-Cool because it essentially adopts the idea of the 2xSPYTIPS strategy, but extends this concept to include the cooldown.


Conclusion

Although overfitting and non-robustness could not be proven by my tests, further testing is still required. Especially on longer test data periods. Nevertheless, the results look promising.

It should also be mentioned that the parameters are also very round numbers (150, 200, 15), which indicates patterns in the market, as many market participants pay attention to these numbers.

However, I will perform other robustness tests as the drawdown can be significantly increased (to about -40%) by small changes in the strategy. The (270,270, 16) and (270, 200, 16) strategy also looked very promising.

I will discuss this in a separate post, as this one has become quite long.

I hope you were able to take something away and it wasn't too complicated and technical. If anything is unclear, please feel free to ask.

Interesting ideas and suggestions for further tests / strategies / parameters are also very welcome.

64
45 Comments

profile image
Wow! Great backtest that illustrates how complex optimizing a strategy can be!
I follow your research very closely, because the potentially high drawdowns when trading at unfavorable times also bother me a bit. Your test has confirmed this.

Can I link to your article in the strategy presentation?
11
profile image
@Epi I would love to. Thank you for inspiring me with your strategy :)
3
Hello @SemiGrowth, thank you very much for your detailed testing, reporting and your meticulousness on this topic, which @Epi has introduced us to. Now the question arose for me as to how you set up or program such a bot. Would it perhaps be possible for you to write a post about it? That would make me and I'm sure many others in the forum very happy. Thanks in advance!
5
profile image
@gorehammer I can certainly do that sometime. Remind me again if I forget :)
3
profile image
An absolutely impressive piece of work! I can't wait to see what you end up implementing.
Apart from the fact that I can't technically implement such a backtest, the practicality of a strategy has always been the decisive factor for me. This also includes only having to trade at predictable times and at reasonable intervals. So: at the turn of the month 😅
I try to remedy the shortcomings of SPYTIPS through strategy diversification, whereby each strategy has certain characteristics and takes on certain tasks in the portfolio. Without resorting to ratios, this can be described as follows:
* 1xGTAA (+ 40%) is the stability anchor, decent performance with very low drawdowns, tendency to underperform in bull markets.
* 3xGTAA (max. 30%) as a performance-optimized return booster with noticeable but controlled risk.
* SPYTIPS (max. 30%, enriched with unleveraged gold) as a wildcard, less good key figures but outperformance in bull markets.
1xGTAA runs for me from 1/23, 3x GTAA from 6/24 and SPYTIPS from 3/25. It is becoming apparent that a systematic outperformance can be achieved compared to the global equity market.
PS: I use SMA175 for all assets in SPYTIPS, the rules are the same as yours, but before going into cash, gold is held as long as it has positive momentum.

What other strategies are you pursuing? Or should SPYTIPS be the only one?
4
profile image
@randomdude cool idea you have there. In general, I'm not yet pursuing any strategies (apart from SPYTIPS). But I will look into GTAA again in the future. I also have other strategies on my radar. Nevertheless, I don't want to completely move away from single stocks in the long term because I simply enjoy them and have been relatively successful with them, at least so far. I also have a stable ETF core, which will probably not be reduced.
I'm just going to start putting new money into strategies and thus significantly increase the proportion in the long term.
I also tried crazy things like going short 2xlev instead of holding cash xD. Due to the 2008 crisis, the strategy could make 25%pa, but it was just a little gimmick for now
2
profile image
@SemiGrowth It was similar for me. I continue to hold my "old" ETFs, but no longer expand them. When it became clear after a while that the active strategies were performing as the backtests suggested, Buy&Hold suddenly looked very, very old.
1
profile image
@randomdude maybe I feel the same way
3
profile image
Insanely interesting contribution! May I ask which programs you use to do such back tests?
3
profile image
@mhu Plain Python. So open the editor and write Python code :)
The data is pulled via a financial API.
For this test you only need the daily closing prices of S&P 500 and TIPS. I built the leveraged version synthetically (following the same pattern as Epi). It even performs slightly worse than the real leveraged ETF (e.g. $CL2 )
PS: $CL2 is technically not an S&P 500 but an MSCI USA, but they run almost identically and also have almost the same composition
2
profile image
@SemiGrowth Yes, but be careful with the fund currency! The $CL2 has performed so well in recent years because the fund currency is EUR.
profile image
@randomdude In the end, we can only invest in euros anyway. The simulations were all in dollars. Of course, you have to consider the currency risk, but you always have that when you invest in foreign shares
1
profile image
Interesting article - I would be interested to know how this is really implemented. Do you need trading robots that then buy for you when the corresponding indicator is reached or do you simply wait to see if an indicator is reached on a site like TradingView, for example, and then buy manually? In the case of automation, which brokers actually do this, are there special brokers who provide APIs for this?
2
profile image
@TheStoic Good question, I have written a bot for my strategies that sends me a push notification. I still have to trade manually.
I haven't dealt with fully automated trading yet
2
profile image
@SemiGrowth Thanks for your answer! Where do you get the data for the bot? Is it freely available, e.g. via Yahoo Finance or TradingView, or do you collect it yourself?
I had already written to the former operator of CashCalender about this and he actually collected all the relevant data himself using website crawlers (e.g. after the publication of quarterly reports)
profile image
@TheStoic Yahoo finance API is usually sufficient. More exotic data is sometimes also available from FRED. So far I have not needed more
2
profile image
Thank you very much for the article and the work you put into it. I am at GQ 👍🏻 exactly for such contributions
2
Can you already tell from your work and all the tests how you would have performed in a defined period, e.g. the last 3, 5 or 10 years compared to an MSCI World or other index?
2
profile image
@Migu11 Yes, I can already see that. I'll write another post about it later. But in general it looks very good. The investment would have increased 48-fold since 2003. However, the S&P 500 has only increased about 10-fold in the same period. If you still get interest on the cash, things will go even better
3
@SemiGrowth The indicators of (150,200,15) could soon trigger a buy signal. Have you had a chance to carry out further tests? Or do you think the strategy is reliable enough to enter now?
2
profile image
@Redfox77 I have actually just received the entry signal. I ran another overfitting test, which didn't turn out so well. However, it only tests the strategy itself and not the specific parameters. Also, the test was actually a bit faulty and I would like to rewrite it. According to this test, the probability of overfitting is around 50%. I am still working on a possible third indicator, but that will probably take some time.
I probably won't jump straight in tomorrow morning, but will carry out the tests and rough calculations with the third indicator.

But it should be clear that the strategy is in any case much safer than the traditional SPYTIPS
1
@SemiGrowth Many thanks for your efforts! I look forward to further test results.
1
Extremely interesting article, thank you very much!

Just to make sure I have understood this correctly: Does the "cool down" period (15 days) start on the day of purchase or the next day?
1
profile image
@Spouh you hold the investment for at least k days. So if you are still trading after the market, then you only start the next day and if you are trading before the market, then on the current day
@SemiGrowth Thank you very much for your answer!
So if I have understood this correctly, the deadline starts on the day of purchase.
For example, if I bought on May 2, 2025, I would not check until May 23 (15 working days).
Is that correct?
profile image
@Spouh So let's assume that the indicators are on buy at the close on May 1st, then you buy early on May 2nd and only look at the indicators again at the close on the 22nd (and then trade early on the 23rd if there is a sell signal).
1
profile image
@Spouh if you buy right at the close, then you buy directly on May 1 after the US close and can then trade again on the 22nd directly after the US close
@SemiGrowth I had actually understood that, thanks for the confirmation!

Regarding the buy/sell signal, the supports used are the SP500 Index (SPX) and not the ETF and the TIP ETF from ishares in USD (ISIN: US4642871762)?
profile image
@Spouh the S&P 500 Total Return index (^SP500TR) and the (TIP). I only have the tickers in my head but it should be your TIP ETF
profile image
@Spouh it is the TIP ETF, but in dollars and with dividends
@SemiGrowth Thanks again for your feedback.
Does it really change anything in your backtest if you take ^SPXTR instead of ^SPX?
@SemiGrowth TIP ETF would then be $IDTP (USD, accumulating)?
2
profile image
1
View all 8 further answers
Boy I hope you do this for a living.
1
profile image
@TopperHarley What do you mean by that?
By that I mean that this level of professionalism, knowledge and methodological expertise goes far beyond the average user here and investor, and I hope that you sell these skills well.
2
profile image
@TopperHarley Ah, I see. Thank you in any case. I'm still studying (IT). I have business as a kind of "hobby". Maybe I'll go in the direction of business informatics. I definitely enjoy it, which means I'm constantly learning new things.
2
Join the conversation