Alpha Hunt Kit: falsify trading ideas before optimizing them

Alpha Hunt Kit is a tiny command-line harness for testing one trading idea without letting it mutate into a pretty backtest.

You give it a hypothesis JSON, an OHLCV CSV, and a cost assumption. It runs one fixed-config backtest, compares the result against a shuffled-timing null, then writes the verdict to disk: keep testing, or bury the idea.

That is the whole product. Not a bot. Not a broker connector. Not a magic strategy repo. A small machine for making your coding agent prove an idea is not obvious garbage before it spends another hour optimizing it.

Repo: github.com/thiagosucupira/alpha-hunt-kit

Agent instructions: docs/agent_prompt.md

The bundled fixture only proves the plumbing works. Real edge still needs real data.

Run it once

git clone https://github.com/thiagosucupira/alpha-hunt-kit.git
cd alpha-hunt-kit
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

alpha-hunt init --demo
alpha-hunt trial experiments/current.json --data fixtures/bars/demo_ohlcv.csv --budget-seconds 30

That first run should create:

.alpha-hunt/runs/<run_id>/metrics.json
.alpha-hunt/runs/<run_id>/report.md
a new row in state/experiments.tsv
KEEP_CANDIDATE or DISCARD_CANDIDATE

If the demo works, the harness works. Nothing more. Fixture data is a smoke test, not an alpha claim.

What a clean agent run looks like

Terminal window showing a real interactive bash session: alpha-hunt init --demo, alpha-hunt trial returning KEEP_CANDIDATE with null-model metrics, and alpha-hunt status reminding that fixture data proves plumbing only — *Real interactive shell, real stdout — command, decision, null-model metrics, status. Fixture data proves plumbing only, not edge.*

The actual loop

One candidate, one gate

The whole loop in 15 seconds: one frozen candidate goes through one trial, one null gate, then either harder tests or the graveyard.

A candidate is not “try these parameters.” A candidate is a falsifiable claim:

source or observation,
hypothesis,
frozen JSON config,
explicit cost assumption,
local bars,
timing-null comparison,
keep or bury.

The built-in null is deliberately cheap: keep the candidate’s own position inventory, shuffle it across timestamps, then ask whether the real timing beats the shuffled 95th percentile. Passing that gate does not mean “trade it.” It means “this is no longer obviously garbage; test it harder.”

Failing it means bury the idea and move on.

alpha-hunt bury <run_id> --reason "failed shuffled timing null"

The graveyard is not housekeeping. It is the compounding asset. Tomorrow’s agent should not waste a run rediscovering yesterday’s corpse.

A serious hunt also needs auditable trade artifacts. This example is from one of our own backtests, not from the bundled toy fixture:

Matplotlib candlestick chart of a USDJPY short backtest trade with entry, stop loss, take profit, and exit annotated — *One backtested USDJPY CE short: entry, SL, TP, and exit on the same chart. This is the kind of artifact a hypothesis should leave behind before anyone believes the story.*

Where ideas should come from

Rank sources by how fast they can become an observable:

Graveyard inversions. If an idea failed for a specific reason, test the smallest change that attacks that reason.
Market microstructure and FX papers. Good papers usually give observables, not strategies. That is enough.
Your own bars. Once plumbing works, fixture data must disappear from the evidence chain.

Bad source: a screenshot, a vibe, or a parameter grid with no thesis.

Bring your own data

The repo expects ordinary OHLCV CSV:

timestamp,open,high,low,close,volume

Run the same candidate against real local bars:

alpha-hunt trial experiments/current.json --data path/to/your_bars.csv --budget-seconds 120

Before trusting a result, write down timezone, spread/cost assumption, missing-bar policy, symbol convention, and exactly what information was available when the signal fired.

Why this exists

Agents are good at producing variations. Trading research mostly needs the opposite: disciplined deletion.

Alpha Hunt Kit makes the deletion path cheaper than the optimization path. If the idea cannot beat a frozen config and a cheap null on local data, it should not get a longer backtest, a nicer chart, or a story.

Clone it, run one candidate, bury one bad idea cleanly. That is already progress.

Alpha Hunt Kit: falsify trading ideas before optimizing them

Run it once

What a clean agent run looks like

The actual loop

Where ideas should come from

Bring your own data

Why this exists

Related Research

Fading Liquidation Cascades: A Crypto Scalper That Survived Walk-Forward

Starting from the End: Prop Firm Challenges as Variance Optimization

A Timezone Bug Almost Made Me Abandon a Profitable Strategy