This means the validation is strong enough to progress, but the next smart step is a structured paper-trading phase instead of moving straight to live capital.
This example currently behaves as a selective trend-following breakout model rather than an all-weather system that should be expected to stay equally credible across all market conditions.
The strongest fit is in orderly higher-timeframe trend conditions where continuation logic stays clean. It is not framed as a strategy that must remain active in every market phase.
One shared profile can still be a reasonable starting point, but we highlight when separate market-specific profiles would be safer than pretending one setup fits every target market with equal credibility.
Keep the configuration fixed and move to a structured paper-trading phase before any live capital is considered. Validation strength does not remove the need for implementation discipline.
BTC is used here as a benchmark-first starting point because it is one of the most liquid and decision-relevant crypto markets. The goal is not to showcase a random symbol. The goal is to show how we start where decision quality is strongest, then widen only if the strategy earns broader validation.
This sample shows the first validation step on a benchmark market before widening to more assets. The point is to establish whether the strategy has credible edge in a decision-relevant market before claiming broader portability.
This example is intentionally framed as paper-trade next because validation strength and live readiness are different questions. Signal handling, execution discipline, and forward confidence still matter after a strong historical result.
This sample uses a benchmark-first crypto market and a single-benchmark-first validation path before any broader expansion claim is made.
Return here is measured relative to starting capital, so it can exceed 100%. A 100% return means capital doubled. A 300% return means it became four times the starting capital.
5 of 7 splits pass (71.00%). The 2022-2023 split fails with PF 0.71, -44.30% drawdown, and negative Sharpe. The strategy has a genuine edge in trending conditions but lacks a mechanism to avoid sustained bearish market phases.
Both parameters are stable across +/-20% variation. No overfitting detected through parameter sensitivity.
Failed the 2022 full bear year: -44.30% drawdown, 5 consecutive losses. The trend-following signal generates bullish flips during relief rallies within macro downtrends. Add a 200-day EMA filter to block entries when macro trend is bearish.
With only 38 trades over the full period, the Monte Carlo p-value of 0.47 means 47% of randomly shuffled sequences match or beat the observed return. The strategy's edge is real (positive PF, consistent win rate) but the low trade count limits statistical confirmation. This is a known limitation of low-frequency trend strategies - not a disqualifying finding, but a risk to note.
Add: only enter long trades when close is above the 200-period EMA. This single change would have blocked all 5 losing trades in the 2022 bear year, converting a -44.30% period to flat. Estimated improvement: max DD drops to ~25-28%, walk-forward pass rate increases to 7/7. Re-submit for re-audit at €49.
Scale position size inversely to current ATR. Smaller sizing during high-volatility market phases reduces portfolio-level drawdown without changing signal quality. Estimated improvement on 2018 period: DD from -30.8% to ~18-22%.
The 2022 bear year failure and -44.30% drawdown exceed the deployment threshold of <=35.00%. Apply the 200 EMA macro filter and re-submit. Expected verdict after fix: PASS.
This example strategy currently behaves more like a selective trend-following breakout model than a broad all-weather system. The edge appears when market structure is orderly enough for continuation logic to remain credible.
The same core logic can remain viable across several assets without behaving identically on all of them. We use this section to explain when one market looks cleaner than another and why future refinements should be tested market by market instead of assumed universal.
When the evidence suggests that one shared configuration is no longer the most honest answer across all target markets, the report can recommend separate market-specific profiles rather than forcing one setup to serve every asset with equal credibility.
Each test in this report answers a different question about your strategy. This guide explains what each test measures, what the numbers mean, and how to use the results to improve your strategy before deploying real capital.
What it measures: The overall performance of your strategy across the full historical dataset - return, drawdown, profit factor, win rate, trade count, and time in market.
Why it matters: This is the starting point. If the full-period numbers look strong but the other tests don't confirm them, the strategy is likely curve-fitted to history. If the full-period numbers are weak, no further validation is needed.
Profit Factor >= 1.5 - the minimum threshold for a tradeable edge. PF of 2.0+ is strong. PF of 3.0+ is excellent.
Max Drawdown <= 35% - the maximum loss from peak to trough. Above 35% means most traders would abandon the strategy mid-drawdown.
Total Trades >= 8 - fewer than 8 trades means the results aren't statistically meaningful.
Time in market - how long you're actually exposed. Lower is better if returns are similar - less time at risk.
Sharpe ratio >= 1.0 - risk-adjusted return. Below 0.5 means returns don't justify the volatility taken.
What it measures: Whether your strategy works on data it has never seen before. The full dataset is divided into 7 sequential time windows. The strategy is tested on each window independently - not on the full period at once.
Why it matters: A strategy that only looks profitable because it was optimised on historical data will fail walk-forward validation. It cannot fake results on unseen data. This is the closest simulation to what will happen in live trading.
5+ of 7 splits passing means the strategy has genuine, repeatable edge across different market conditions.
A failing split tells you exactly which market condition breaks the strategy - use that to identify what fix is needed.
All 7 passing is the gold standard. Combined with other tests, this is what makes a strategy PASS-ready.
Consecutive failing splits (e.g. 2022 and 2023 both fail) suggests market-condition dependency - the strategy only works in certain market types.
What it measures: What happens to performance when each parameter is varied by +/-20% from its default value - one at a time. For example, if your ATR period is 10, we test 8 and 12.
Why it matters: A strategy that only works at exactly one specific parameter value was probably curve-fitted to that value. A robust strategy maintains its edge when parameters are slightly adjusted - because the underlying logic is sound, not the exact numbers.
STABLE - the profit factor stays above 1.5 across all variants. The edge is robust to parameter choice.
REJECT - one or more variants drops below 1.5 or turns negative. The strategy is fragile at this parameter.
A "banana shape" PF curve (one peak, drops sharply either side) is the classic overfitting signature - avoid deploying.
If all parameters are STABLE, you have good evidence the strategy edge comes from the logic, not lucky parameter selection.
What it measures: How your strategy performs during the 7 most hostile named market events in crypto history - 2018 crash, COVID, LUNA collapse, FTX fraud, and more. Each period is tested in isolation.
Why it matters: A strategy that only shows strong results because of the 2020-2021 bull run is dangerous to deploy. Most bear markets and crashes are preceded by exactly the type of signal a trend strategy generates. This test determines if your strategy survives when conditions turn hostile.
AVOIDED - 0 trades. Best case. The strategy correctly identified a non-tradeable environment and stayed flat.
PROFITABLE - took trades and made money despite hostile conditions. Exceptional.
SURVIVED - took trades with drawdown under 35%. Acceptable.
FAILED - drawdown exceeded 35%, or multiple consecutive losses. This event is the primary risk to live deployment.
If FAILED: identify which condition triggered entries during the crash and add a filter to block it (typically a macro trend filter).
What it measures: Whether your strategy's results could have happened by chance. We run 1,000 simulations where your exact trades are shuffled into random order. If most random sequences produce similar returns, the edge may be luck rather than skill.
Why it matters: With only 10-20 trades, it's possible to look profitable purely by luck. A strategy with 100+ trades and a p-value below 0.05 has statistically proven edge. A strategy with 12 trades and p-value of 0.40 may simply have been lucky on a few large winners.
p-value < 0.05 - statistically significant. Less than 5% of random sequences beat your strategy. Strong confidence.
p-value 0.05-0.20 - borderline. Some confidence, but not conclusive. Monitor live carefully.
p-value > 0.20 - not statistically significant. The results may be luck. Needs more trades before deployment.
The fix: more trades = lower p-value. Either run on more assets, shorter timeframe, or extend the test period.
Note: low-frequency strategies (1D, swing) will almost always show borderline p-values due to low trade count. This is expected - weight this finding accordingly.
What it measures: Whether the same strategy logic produces positive results on other timeframes (4H, 1H, 15m). A genuinely robust edge should be visible on at least 2 timeframes.
Why it matters: If a strategy only works on exactly one timeframe (e.g. 1D) but fails on 4H and 1H, the edge may be an artefact of that specific data granularity rather than a real market phenomenon.
Passes 2+ timeframes - strong indicator of genuine, durable edge.
Only passes 1 timeframe - not disqualifying for daily strategies (1D is structurally different from intraday), but note the limitation.
Fails all other timeframes - treat as a warning. Deploy on the tested timeframe only, with extra caution.
Intraday strategies should pass at least 2 intraday TFs (e.g. 1H strategy should also work on 4H) to be considered robust.