TradeAudit - Independent Strategy Validation
Report #TA-2026-0047
Issued: April 3, 2026
Engine v2.1 - tradeaudit.org

Trend-Following Strategy v1 - BTC Market Index - Daily (1D)

Walk-forward validated 2017-2026 - 7 splits - 7 bear events tested - 1,000 Monte Carlo simulations
CAUTION

Strategy has real edge but fails 2022 bear market stress test

PF 2.92 - 63.00% win rate - 38 trades. The strategy failed 2022 full bear year with -44.30% drawdown and 5 consecutive losses. The trend filter cannot distinguish sustained downtrends from short-term bounces. The benchmark comparison against Buy & Hold remains available in the performance summary. Not recommended for live deployment without a 200-day EMA macro filter.

82
Audit score
This layout works best as a fast orientation layer. It helps the reader see where confidence is strongest and why a strong report can still lead to a paper-trading-first recommendation.
Statistical edge
88%
Walk-forward stability
79%
Parameter robustness
74%
Market-condition survival
83%
Implementation readiness
58%

Current Recommendation

Paper-trade next

This means the validation is strong enough to progress, but the next smart step is a structured paper-trading phase instead of moving straight to live capital.

What This Strategy Is

This example currently behaves as a selective trend-following breakout model rather than an all-weather system that should be expected to stay equally credible across all market conditions.

Where It Fits Best

The strongest fit is in orderly higher-timeframe trend conditions where continuation logic stays clean. It is not framed as a strategy that must remain active in every market phase.

Configuration Choice

One shared profile can still be a reasonable starting point, but we highlight when separate market-specific profiles would be safer than pretending one setup fits every target market with equal credibility.

Immediate Next Move

Keep the configuration fixed and move to a structured paper-trading phase before any live capital is considered. Validation strength does not remove the need for implementation discipline.

Benchmark-first starting point

BTC is used here as a benchmark-first starting point because it is one of the most liquid and decision-relevant crypto markets. The goal is not to showcase a random symbol. The goal is to show how we start where decision quality is strongest, then widen only if the strategy earns broader validation.

Single benchmark first

This sample shows the first validation step on a benchmark market before widening to more assets. The point is to establish whether the strategy has credible edge in a decision-relevant market before claiming broader portability.

Historical strength still needs implementation discipline

This example is intentionally framed as paper-trade next because validation strength and live readiness are different questions. Signal handling, execution discipline, and forward confidence still matter after a strong historical result.

Crypto benchmark

This sample uses a benchmark-first crypto market and a single-benchmark-first validation path before any broader expansion claim is made.

Net return
+3842%
Full period
Profit factor
2.92
Threshold >=1.5
Max drawdown
-44.30%
Threshold <=35.00%
Total trades
38
4.2 per year
Win rate
63.20%
24W / 14L
Avg win
+31.40%
Per trade
Avg loss
-12.10%
R:R 2.60
Sharpe ratio
0.84
Risk-adjusted
Time in market
34%
Avg 47 days/trade
Avg trade dur.
47
Calendar days
Buy & Hold
+6100%
BTC passive benchmark
Walk-forward
5 / 7
Splits passed

How to read large returns

Return here is measured relative to starting capital, so it can exceed 100%. A 100% return means capital doubled. A 300% return means it became four times the starting capital.

Equity curve - strategy vs buy & hold (2017-2026, indexed to 100)
4000% 3000% 2000% 1000% 0% 2017 2019 2020 2021 2022 2024 2026 2022 bear Strategy +3842% Buy & Hold +6100%
Underwater equity (drawdown from peak, 2017-2026)
-50% -25% 0% -44.30% max 2017 2020 2022 2026
Profit factor per walk-forward split (threshold: 1.5)
1.5 3.41 17-18 4.12 18-19 2.89 19-20 5.33 20-21 1.61 21-22 0.71 22-23 3.84 23-26
Individual trade P&L - 38 trades - 26% win rate, avg winner +21.4% vs avg loser -5.8%
+50% 0% -20% 28 losing trades - avg -5.8% 10 winning trades - avg +21.4%
Individual trade P&L distribution (38 trades)
+50% 0% -20% 20 losing trades (avg -5.8%) 10 winning trades (avg +21.4%)
PeriodReturnProfit FactorMax DDTradesSharpeVerdict
2017-2018+187%3.41-31.2%61.12Pass
2018-2019+241%4.12-28.4%51.34Pass
2019-2020+156%2.89-19.7%60.98Pass
2020-2021+612%5.33-22.1%72.41Pass
2021-2022+44%1.61-38.9%50.51Caution
2022-2023-31.00%0.71-44.30%5-0.62Fail
2023-2026+389%3.84-24.6%41.81Pass

Walk-forward finding

5 of 7 splits pass (71.00%). The 2022-2023 split fails with PF 0.71, -44.30% drawdown, and negative Sharpe. The strategy has a genuine edge in trending conditions but lacks a mechanism to avoid sustained bearish market phases.

ParameterBase-20%+20%PF rangeStability
ATR period108 -> PF 2.7112 -> PF 3.042.71-3.04Stable
Multiplier3.02.4 -> PF 2.383.6 -> PF 3.112.38-3.11Stable

Sensitivity finding - positive

Both parameters are stable across +/-20% variation. No overfitting detected through parameter sensitivity.

2018 Crypto Winter
Jan -> Dec 2018
Survived
4 trades - -30.8% - BTC -83%
2020 COVID Crash
Feb -> Apr 2020
Profitable
1 trade - +17.4% - caught recovery
2021 May Crash
May -> Jul 2021
Avoided
0 trades - no exposure
2021 H2 Chop
Nov 2021 -> Jan 2022
Avoided
0 trades - no exposure
2022 LUNA Collapse
May -> Jun 2022
Avoided
0 trades - no exposure
2022 FTX Collapse
Nov 2022
Avoided
0 trades - no exposure
2022 Full Bear Year
Jan -> Dec 2022
Failed
5 trades - -44.30% - all losses

Bear market finding - critical

Failed the 2022 full bear year: -44.30% drawdown, 5 consecutive losses. The trend-following signal generates bullish flips during relief rallies within macro downtrends. Add a 200-day EMA filter to block entries when macro trend is bearish.

TimeframeReturnPFMax DDTrades/yrSharpeVerdict
1D (submitted)+3842%2.92-44.30%4.200.84Caution
4H+2,104%2.41-51.2%14.80.67Fail
1H+487%1.63-68.4%42.10.31Fail
Observed return
+3842%
Simulation median
+3,791%
p-value
0.47
Significance
Borderline

Monte Carlo finding

With only 38 trades over the full period, the Monte Carlo p-value of 0.47 means 47% of randomly shuffled sequences match or beat the observed return. The strategy's edge is real (positive PF, consistent win rate) but the low trade count limits statistical confirmation. This is a known limitation of low-frequency trend strategies - not a disqualifying finding, but a risk to note.

1 - Add 200-day EMA macro filter (critical)

Add: only enter long trades when close is above the 200-period EMA. This single change would have blocked all 5 losing trades in the 2022 bear year, converting a -44.30% period to flat. Estimated improvement: max DD drops to ~25-28%, walk-forward pass rate increases to 7/7. Re-submit for re-audit at €49.

2 - ATR-based position sizing (recommended)

Scale position size inversely to current ATR. Smaller sizing during high-volatility market phases reduces portfolio-level drawdown without changing signal quality. Estimated improvement on 2018 period: DD from -30.8% to ~18-22%.

3 - Do not deploy in current form

The 2022 bear year failure and -44.30% drawdown exceed the deployment threshold of <=35.00%. Apply the 200 EMA macro filter and re-submit. Expected verdict after fix: PASS.

Strategy Behavior

This example strategy currently behaves more like a selective trend-following breakout model than a broad all-weather system. The edge appears when market structure is orderly enough for continuation logic to remain credible.

Cross-Market Fit

The same core logic can remain viable across several assets without behaving identically on all of them. We use this section to explain when one market looks cleaner than another and why future refinements should be tested market by market instead of assumed universal.

Configuration Recommendation

When the evidence suggests that one shared configuration is no longer the most honest answer across all target markets, the report can recommend separate market-specific profiles rather than forcing one setup to serve every asset with equal credibility.

Each test in this report answers a different question about your strategy. This guide explains what each test measures, what the numbers mean, and how to use the results to improve your strategy before deploying real capital.

01 Performance Summary Foundation

What it measures: The overall performance of your strategy across the full historical dataset - return, drawdown, profit factor, win rate, trade count, and time in market.

Why it matters: This is the starting point. If the full-period numbers look strong but the other tests don't confirm them, the strategy is likely curve-fitted to history. If the full-period numbers are weak, no further validation is needed.

How to interpret

Profit Factor >= 1.5 - the minimum threshold for a tradeable edge. PF of 2.0+ is strong. PF of 3.0+ is excellent.
Max Drawdown <= 35% - the maximum loss from peak to trough. Above 35% means most traders would abandon the strategy mid-drawdown.
Total Trades >= 8 - fewer than 8 trades means the results aren't statistically meaningful.
Time in market - how long you're actually exposed. Lower is better if returns are similar - less time at risk.
Sharpe ratio >= 1.0 - risk-adjusted return. Below 0.5 means returns don't justify the volatility taken.

02 Walk-Forward Validation Most important test

What it measures: Whether your strategy works on data it has never seen before. The full dataset is divided into 7 sequential time windows. The strategy is tested on each window independently - not on the full period at once.

Why it matters: A strategy that only looks profitable because it was optimised on historical data will fail walk-forward validation. It cannot fake results on unseen data. This is the closest simulation to what will happen in live trading.

How to interpret

5+ of 7 splits passing means the strategy has genuine, repeatable edge across different market conditions.
A failing split tells you exactly which market condition breaks the strategy - use that to identify what fix is needed.
All 7 passing is the gold standard. Combined with other tests, this is what makes a strategy PASS-ready.
Consecutive failing splits (e.g. 2022 and 2023 both fail) suggests market-condition dependency - the strategy only works in certain market types.

03 Parameter Sensitivity (+/-20%) Overfitting check

What it measures: What happens to performance when each parameter is varied by +/-20% from its default value - one at a time. For example, if your ATR period is 10, we test 8 and 12.

Why it matters: A strategy that only works at exactly one specific parameter value was probably curve-fitted to that value. A robust strategy maintains its edge when parameters are slightly adjusted - because the underlying logic is sound, not the exact numbers.

How to interpret

STABLE - the profit factor stays above 1.5 across all variants. The edge is robust to parameter choice.
REJECT - one or more variants drops below 1.5 or turns negative. The strategy is fragile at this parameter.
A "banana shape" PF curve (one peak, drops sharply either side) is the classic overfitting signature - avoid deploying.
If all parameters are STABLE, you have good evidence the strategy edge comes from the logic, not lucky parameter selection.

04 Bear Market Stress Tests Survival check

What it measures: How your strategy performs during the 7 most hostile named market events in crypto history - 2018 crash, COVID, LUNA collapse, FTX fraud, and more. Each period is tested in isolation.

Why it matters: A strategy that only shows strong results because of the 2020-2021 bull run is dangerous to deploy. Most bear markets and crashes are preceded by exactly the type of signal a trend strategy generates. This test determines if your strategy survives when conditions turn hostile.

How to interpret

AVOIDED - 0 trades. Best case. The strategy correctly identified a non-tradeable environment and stayed flat.
PROFITABLE - took trades and made money despite hostile conditions. Exceptional.
SURVIVED - took trades with drawdown under 35%. Acceptable.
FAILED - drawdown exceeded 35%, or multiple consecutive losses. This event is the primary risk to live deployment.
If FAILED: identify which condition triggered entries during the crash and add a filter to block it (typically a macro trend filter).

05 Monte Carlo - Statistical Significance Confidence check

What it measures: Whether your strategy's results could have happened by chance. We run 1,000 simulations where your exact trades are shuffled into random order. If most random sequences produce similar returns, the edge may be luck rather than skill.

Why it matters: With only 10-20 trades, it's possible to look profitable purely by luck. A strategy with 100+ trades and a p-value below 0.05 has statistically proven edge. A strategy with 12 trades and p-value of 0.40 may simply have been lucky on a few large winners.

How to interpret

p-value < 0.05 - statistically significant. Less than 5% of random sequences beat your strategy. Strong confidence.
p-value 0.05-0.20 - borderline. Some confidence, but not conclusive. Monitor live carefully.
p-value > 0.20 - not statistically significant. The results may be luck. Needs more trades before deployment.
The fix: more trades = lower p-value. Either run on more assets, shorter timeframe, or extend the test period.
Note: low-frequency strategies (1D, swing) will almost always show borderline p-values due to low trade count. This is expected - weight this finding accordingly.

06 Cross-Timeframe Robustness Robustness check

What it measures: Whether the same strategy logic produces positive results on other timeframes (4H, 1H, 15m). A genuinely robust edge should be visible on at least 2 timeframes.

Why it matters: If a strategy only works on exactly one timeframe (e.g. 1D) but fails on 4H and 1H, the edge may be an artefact of that specific data granularity rather than a real market phenomenon.

How to interpret

Passes 2+ timeframes - strong indicator of genuine, durable edge.
Only passes 1 timeframe - not disqualifying for daily strategies (1D is structurally different from intraday), but note the limitation.
Fails all other timeframes - treat as a warning. Deploy on the tested timeframe only, with extra caution.
Intraday strategies should pass at least 2 intraday TFs (e.g. 1H strategy should also work on 4H) to be considered robust.