Harness.WTF

The AI-dev harness that makes complex projects actually finish. One protocol file · one zero-dep CLI · GitHub as the only source of truth. Tool-agnostic: Claude Code, Codex, anything that reads AGENTS.md.

It runs forever →

Effort tiered to risk. Build by default; spike only with a named unknown; framework-creep tripwires halt noodling.

It never stops →

Binary, machine-checkable acceptance. A closed Definition of Done. Green means STOP — and overclaiming is refused by the tooling itself.

It regresses →

One bounded unit per fresh context, state checkpointed to GitHub. Degradation can't accumulate.

git clone https://github.com/LookNoHandsMom/harness-wtf
export PATH="$PWD/harness-wtf:$PATH"
harness-wtf init .        # or --linked for a zero-drift fleet spoke

The proof, honestly labeled.

Design simulation (~6,000 trials, reproducible): fresh-context-per-unit is the load-bearing mechanism — ablate it and completion collapses to ~0%. Binary criteria cut ~35% of modeled effort; risk-tiering ~22%; a heavy never-satisfied framework costs ~2×.

Field A/B (pre-registered bar, same model both arms): run 0001 (8 small tasks, same model both arms): completion 100%=100%, rework 0=0, harness 2.64× tokens — failed the bar on the small-task null domain, exactly as the sim predicts. Complex-project runs accumulate next. We publish failures; that's the point.

Battle-tested: replaced a heavyweight predecessor on a production-grade repo — −15,636 lines of protocol machinery, every kept gate green, all 208 real issues preserved.

Worked examples: wordfreq · mdtoc  ·  Quickstart: zero → first finished unit in <15 min