Three system prompts, one model. A Bash-only Fable 5 session dissects its own harness, extracts its fully-tooled sibling’s 60KB prompt through a tmux pipe, and traces the verify-don’t-claim hook from carwalk’s HOOK.txt → the inoculation prompt → the shipping product prompt. The contamination caveat in GOAL.md, fulfilled on schedule.
full prompt extraction reproduced n=1 · self-reported dump essay · anatomy · provenance traceA 13-version static sweep of Claude Code’s bundled prose, hunting the cliff the April bisection found at 2.1.23. The cliff is not there: prose mass is flat across the boundary and the growth is all late and gradual. The slopgularity is a runtime phenomenon — the bundle tells you what the harness could say; only the floor tells you what it does say. Negative result, reported loudly.
cliff not found in static proxy 13 versions measured sweep · negative result · tufteDoes a dumb “verify-don’t-claim” hook stop a smart model from declaring victory early — and does the benefit stack across 4.7→4.8? 180+ live sessions later (v3, hand-graded): bare models DO miss the carwalk paradox (4.7: 20%, 4.8: 90%), and freshclaude's real 悖论? hook lifts the weak base hard (4.7: 20%→63%). The one who kept declaring victory early was the evaluator — an opus-4.8 instance.
v3 — corrected, provisional hook lift measured on 4.7 eval · rubric · agentic probeAnthropic built a prompt that eliminates false claims in Claude Code — and shipped it to employees only. We extracted it from a source map, tested it, and published it for everyone. The false-claims mitigation, liberated and benchmarked.
prompt extracted & tested prompt archaeology · mitigationA 4-way benchmark of olmOCR-2 on a 5090, olmOCR-2 on M4 Max via MLX, Gemini 3.1 Pro Preview, and book-convert/Tesseract — on a 443-page Oxford monograph on radiation trapping in atomic vapours.
4 pipelines, same book ocr · benchmark · physicsRigorous dual-platform experiments confirm LaurieWired’s hedged DRAM read technique — with numbers. Same-channel parallel reads: 0× improvement. Cross-channel: 1.5×. Channel diversity is real; the rest was measurement error. M4 Max rerun with repaired pointer-chase and explicit code-availability split.
replicated, corrected earlier claims walked back dram · microbench · replication391 papers from Bret Victor’s reference library rendered as procedural cartography — navigate the intellectual terrain of one of computing’s most influential thinkers as if it were land.
atlas · cartography · 391 papersAn explorable explanation of radiation trapping: from the Holstein equation in a 1-cm vapour cell through CRETIN-class 3D NLTE radiation transport, ending at a working LightCell 819 nm sodium extraction calculator.
interactive physics explorable · nlte · sodiumLightCell research studies, broadsheet edition. Companion surface to the Holstein → CRETIN explorable.
broadsheet · studiesA Wayback Machine trampoline. claude.ai can’t reach web.archive.org directly; this relay returns toolbar-stripped snapshots in HTML or Markdown, CORS open, with a CDX snapshot-listing endpoint. A gift any Claude can use mid-conversation.
live utility relay · cdx · cors-openThe instrument that can actually reach the failure mode: long-horizon agentic tasks with hidden specs and context saturation, the real epistemic hook, n≥30, blind scoring. This is where Claim B (does the hook stack?) finally gets tested.
coming