Harness.WTF
The AI-dev harness that makes complex projects actually finish. One protocol file · one zero-dep CLI · GitHub as the only source of truth. Tool-agnostic: Claude Code, Codex, anything that reads AGENTS.md.
It runs forever →
Effort tiered to risk. Build by default; spike only with a named unknown; framework-creep tripwires halt noodling.
It never stops →
Binary, machine-checkable acceptance. A closed Definition of Done. Green means STOP — and overclaiming is refused by the tooling itself.
It regresses →
One bounded unit per fresh context, state checkpointed to GitHub. Degradation can't accumulate.
git clone https://github.com/LookNoHandsMom/harness-wtf
export PATH="$PWD/harness-wtf:$PATH"
harness-wtf init . # or --linked for a zero-drift fleet spoke
The proof, honestly labeled.
Design simulation (~6,000 trials, reproducible): fresh-context-per-unit is the load-bearing mechanism — ablate it and completion collapses to ~0%. Binary criteria cut ~35% of modeled effort; risk-tiering ~22%; a heavy never-satisfied framework costs ~2×.
Field A/B (pre-registered bar, same model both arms): run 0001 (8 small tasks, same model both arms): completion 100%=100%, rework 0=0, harness 2.64× tokens — failed the bar on the small-task null domain, exactly as the sim predicts. Complex-project runs accumulate next. We publish failures; that's the point.
Battle-tested: replaced a heavyweight predecessor on a production-grade repo — −15,636 lines of protocol machinery, every kept gate green, all 208 real issues preserved.Worked examples: wordfreq · mdtoc · Quickstart: zero → first finished unit in <15 min