A Timezone Bug Almost Made Me Abandon a Profitable Strategy

−870 Pips from Five Lines of Code

Weeks of work. Walk-forward backtest on 2.5 years of Darwinex tick data: 302 trades, 59.6% win rate, profit factor 1.48, survived five rounds of honest deflation. Deployed live on a VPS with MetaTrader 5.

First week: 1 win, 5 losses, −€504. Five percent drawdown in days.

So I did what you’re supposed to do — ran an independent re-backtest on MT5’s own OHLC bars to check whether the edge was real or a Darwinex-specific artifact. If it reproduced on different data, the live losses were just variance. If not, I’d been curve-fitting.

The re-backtest: 445 trades, 15.5% win rate, −870 pips.

I nearly shut the whole thing down. “Edge only exists on Darwinex data. Classic overfitting.” Walked away from the keyboard.

I was wrong. The re-backtest had a timezone bug that shifted every bar by 2–3 hours. Fixed the parsing, same data: 401 trades, 51.9% WR, +350.6 pips, PF 1.21.

The strategy works. I almost buried it because of five lines of timestamp code.

EET ≠ UTC (Obviously, In Hindsight)

MetaTrader 5 exports CSV timestamps in Darwinex server time — EET/EEST (Eastern European Time). UTC+2 in winter, UTC+3 in summer.

My code parsed them as UTC. Because of course it did.

MT5 CSV says:    2023-06-15 14:00:00
I parsed it as:  2023-06-15 14:00:00 UTC
It actually was: 2023-06-15 14:00:00 EEST = 2023-06-15 11:00:00 UTC

Three hours off in summer. Two in winter. No error. No crash. Clean output with 445 trades. Just the wrong 445 trades.

This is what makes timezone bugs uniquely dangerous — they don’t scream. They whisper. You get plausible-looking results that happen to be complete garbage.

Why Three Hours Kills This Strategy

A moving average doesn’t care about timezones. An EMA of price is an EMA of price regardless of bar labels. But this strategy is built on temporal structure — the signal lives in the clock, not just the price:

Session filters. London Open is 07:00–11:00 UTC. Shift that 3 hours and you’re placing orders during the Asian session lull. The edge clusters at session transitions — we showed this in the NN filter post. Wrong timezone, wrong sessions, no edge.
Candle boundaries. A 30-minute bar starting at 14:00 EET captures different price action than one starting at 14:00 UTC. The multi-bar pattern that generates entries is time-sensitive. Misalign the bars and you detect phantom setups or miss real ones.
Daily range calculations. Multi-timeframe range zones computed from daily bars. Shift the day boundary and every zone shifts with it.
Weekend filtering. Friday close at 23:30 EET looks like Saturday 01:30 UTC. Now the code sees “weekend bars” and either includes ghost candles or throws away valid Friday data.

Net effect: 2.2× more signals detected (noise from misaligned candle boundaries), zero overlap with validated setups, session filter checking completely wrong hours. −870 pips.

The Friday Close Tell

Here’s how I caught it. Friday close bars always end at 23:30 in the CSV — both winter and summer. FX markets close around 21:00–22:00 UTC on Fridays:

23:30 EET = 21:30 UTC (winter)  ✓ — matches FX close
23:30 EEST = 20:30 UTC (summer) ✓ — matches FX close
23:30 UTC = ... 1.5 hours after market close? Impossible.

That “impossible” is what cracked it. Cross-referenced a bar labeled “14:00” in the MT5 CSV against Darwinex’s 11:00 UTC bar — identical OHLC. Three-hour offset. Summer. EEST.

DST transitions confirmed it: offset changes on the last Sunday of March and October, exactly matching EET/EEST rules.

The Fix

from zoneinfo import ZoneInfo

EET_TZ = ZoneInfo("Europe/Bucharest")  # EET/EEST with automatic DST

# Before (wrong):
dt = datetime.strptime(row["timestamp"], "%Y-%m-%d %H:%M:%S")
ts_ms = int(dt.replace(tzinfo=timezone.utc).timestamp() * 1000)

# After (correct):
naive_dt = datetime.strptime(row["timestamp"], "%Y-%m-%d %H:%M:%S")
local_dt = naive_dt.replace(tzinfo=EET_TZ)
utc_dt = local_dt.astimezone(timezone.utc)
ts_ms = int(utc_dt.timestamp() * 1000)

Five lines. −870 pips → +351.

I stared at this diff for a while.

The Numbers

Metric	Broken (parsed as UTC)	Fixed (EET→UTC)	Golden backtest (tick data)
Trades	445	401	302
Win rate	15.5%	51.9%	59.6%
Profit factor	negative	1.21	1.48
Total pips	−870	+350.6	+463.5

The fixed MT5 results don’t match the golden backtest exactly. Expected — and actually reassuring:

Resolution gap. Golden backtest resolves SL/TP on actual ticks at millisecond granularity. MT5 re-backtest uses bar-level resolution — high/low of subsequent bars, always checking SL before TP. It’s biased conservative by design.
Data source differences. MT5 bars are pre-aggregated by the broker; golden backtest aggregates from raw ticks. Minor OHLC differences exist.
Same direction. Both profitable, both positive PF, both show longs outperforming shorts. The bar-level version is a floor, not a ceiling.

If the two backtests had exactly matched, I’d be more suspicious, not less.

Five More Bugs Hiding in Plain Sight

The timezone wasn’t alone. The live pipeline had quietly drifted from the validated config — five parameters, each a “reasonable” deployment shortcut, each shaving off a piece of the edge:

Parameter	Golden backtest	Live (broken)	What it did
Min signal size	3.0 pips	0.5 pips	6× more setups, mostly noise
30m lookback	Full history	500 bars (~10 days)	Missed older but valid signals (strategy uses setups up to 20 days old)
Daily lookback	Full history	90 bars	Insufficient for multi-week range calculations
Weekend bars	Filtered	Included	Ghost candles creating phantom signals
Session filter	Hard reject	Label only	Allowed trades outside London/NY (where the edge doesn’t exist)

Every single one of these made sense at deployment time. “0.5 pips catches more setups.” “500 bars saves memory.” “Let the session filter just tag, not reject — we might learn something.”

No. You ship what you tested. Full stop. This is the quant equivalent of pushing code that doesn’t match what passed CI — and wondering why production is on fire.

Small Signals Are Noise

Parameter sweep on minimum signal size, to verify 3.0 pips isn’t a cherry-picked magic number:

Threshold	Signals	Trades	WR%	PF	Pips
0.5	6,966	467	52.2%	1.14	+256
1.0	5,844	459	52.3%	1.16	+281
2.0	4,273	436	52.1%	1.17	+298
3.0	3,220	401	51.9%	1.21	+351
4.0	2,496	355	52.1%	1.25	+386
5.0	1,934	296	49.3%	1.22	+318

Sub-3-pip signals are market microstructure noise, not meaningful price displacement. The extra trades just dilute the edge. At 4.0 the PF improves but trade count drops to 355. At 5.0 it falls off a cliff. 3.0 is a sensible floor, not an optimum — the sweep validates a threshold, not a holy number.

Two Backtests Agree. Reality Hasn’t Voted Yet.

Two independent backtests now converge: the golden tick-data version (+463 pips, PF 1.48) and the fixed MT5 bar-level version (+351 pips, PF 1.21). Both profitable. Both show the same structural patterns — session edge, longs outperform shorts, Monday dominance.

But backtests agreeing with each other is not backtests agreeing with the market. The strategy hadn’t been re-tested live with the corrected parameters when I wrote this. Week 1’s −€504 had multiple causes: timezone bug, config drift, and possibly just variance on six trades.

The Monte Carlo 5th percentile from the golden backtest was +188 pips. The edge is real but thin. A neural network filter pushes the profit factor from 1.48 to 2.22 by dropping low-confidence setups — but even that hadn’t faced live markets with the corrected config yet.

I’m trading it again. With open eyes and correct timestamps.

What I Learned (Carved in Stone)

Timezone bugs are silent killers. No exceptions, no crashes, no warnings. Clean execution, plausible output. You’d never know unless you checked the Friday close bars. I got lucky that FX has a known weekly close time I could anchor against. Crypto doesn’t.

“Doesn’t work on different data” sometimes means “you broke the data.” When you see −870 pips, the path of least emotional resistance is “it was never real.” Sometimes the actual explanation is boring: you fed the strategy garbage and it returned garbage. Always diff your params and data pipeline before writing the eulogy.

Config drift is a deployment bug. Software engineers have CI/CD pipelines for exactly this. Quants deploy with “close enough” parameters and wonder why live diverges from backtest. Treat trading config like code — version it, hash it, diff it before every deploy.

FX timestamps are a minefield. MT5’s Python API returns UTC epochs. MT5’s CSV export uses server time. Darwinex uses EET/EEST. Other brokers use EST, GMT, or their own local time. Never assume UTC. Verify against a known anchor — Friday close is the easiest.

Cross-validation requires timezone alignment. Using MT5 data to validate Darwinex tick results is a sound idea. My execution introduced a bug worse than what I was trying to catch. The irony wasn’t lost on me.

Previous posts in this series: Starting from the End (Monte Carlo validation), From PyTorch to 27 Megabytes (NN trade filter). Both reference the same strategy before this bug was discovered.