The Codex rescue: when the handoff prompt is the product

For two weeks I had been trying to fix a PnL bug in a prediction-market analysis pipeline. The reported PnL disagreed with the Polymarket ground truth for many traders, and the gaps were not small: for one address the scored figure was $32.8M against an actual -$8.3M. I had been debugging with Claude Code the whole time, and the sessions had turned into an expensive merry-go-round: a plausible hypothesis, a partial fix, a fresh anomaly, another hypothesis.

At some point I gave up and switched to Codex. What I want to note is not that Codex is better — I am not convinced it generally is — but that the thing that actually worked was the handoff prompt, not the model.

What I wrote

The first message I sent Codex was, in effect, an incident postmortem. It opened with:

For the last two weeks, I’ve been trying to fix a recalcitrant bug […] I’ve been debugging with Claude Code, but Claude’s performance has been atrocious. I am hoping that you can do better.

Then four sections, in this order:

Why it’s a full rebuild. Pointed at the exact commit (36269eb) that bumped PIPELINE_VERSION from 1 to 2, quoted the runtime line where _can_run_incremental() logs pipeline version mismatch (stored=1, current=2). Falling back to full rebuild., and noted that the bump was technically correct because mint_cost had changed formula.
Per-step timing, instrumented in the session I had just killed. A table of the twelve pipeline steps with their wall times: step 5 at 5h 15m, step 6 at 4h 55m, step 8 at 2h 33m, step 9 still running past four hours. Totalling more than 40h.
Architecture problem. The pipeline had exactly two modes, incremental and full rebuild. A one-line semantic change in a mid-pipeline step forced everything to redo, including the five hours of phantom-sell reconstruction that had nothing to do with mint_cost. I laid out which steps the change actually invalidated (5b, 6, maybe 9–12) and which it did not (5, 7, 8).
Workaround built this session. A per-trader verification script that reconstructed PnL directly from raw data in 30–60 seconds and bypassed the pipeline tables entirely.

Codex’s first response was not a patch. It was a plan that reframed the bug. Instead of treating the 40-hour rebuild as a performance problem to optimise, it pointed at the full-vs-incremental dichotomy as the actual defect. From there the session converged quickly.

What I think is going on

I don’t think Codex had some special insight that Claude lacked. The two models are close enough in capability that I would not bet on one reliably outperforming the other on a bug like this. What changed was the prompt. My Claude sessions had grown organically — I would describe the next symptom, Claude would propose the next fix, and the accumulated context was a pile of partial hypotheses. When I sat down to write to Codex, the fact that I was switching models forced me to compress: I had to state, in one message, what was true. That reconstruction is what produced the timing table, the invalidation matrix, and the workaround inventory.

In other words: the handoff prompt is the product. If I had written that same prompt to Claude in a fresh session, I suspect the outcome would have been similar.

What I’d do differently

I want this to be a reflex, not a two-week-delayed rescue. Two concrete takeaways:

When a debugging session starts to feel like a merry-go-round — three or four failed hypotheses in a row — that is the signal to stop and write the postmortem. Not “let me try one more thing,” because trying one more thing is how you lose another two weeks.
The postmortem template is basically what I sent Codex: what’s actually happening, measured; what invalidates what; what I’ve already built as a workaround; what the goal is. The structure forces you to separate facts from guesses, and the agent you send it to — any agent — has an easier job.

Still an open question for me: is there a good way to get the current session’s own transcript into the reset prompt automatically, rather than reconstructing it from scratch? That would remove the main cost of the “just start over” move.