Why flat routers can't catch up on long runs.

Mar 31, 5 min

Routing per call optimizes the wrong thing on long horizons. retries are pure cost and drift compounds. a short note on what we learned.

The temptation

if a cheaper model handles step two well, why pay for the strong model on every step? send the easy ones to the cheap model, route the hard ones up the ladder, save real money. this is the flat-router pitch.

Why it doesn't catch up

it optimizes the wrong thing. on a long run, the cost you pay isn't the per-step cost. it's the cost of recovering from a step that went wrong. the cheap model handles step two fine, then it handles step three in a way that subtly corrupts the working state, and the strong model on step four spends three calls fixing what step three got wrong.

we ran the numbers. flat routing trims a small amount of spend per long run and loses task completion roughly in proportion. there's no free lunch hiding in per-call routing on long-horizon work.

Retry-on-fail has the same shape

retrying a failed task with a stronger model is the cleaner version of the same idea. it works on a one-shot task. on a long run, retry costs you the whole previous run plus the new one, and the failure mode that broke the first run shows up again in the second.

What worked instead

we stopped routing models and started measuring trajectory. once you have a step-level drift signal, you can act on the moment the run starts going wrong rather than the moment it fails. that's where skalpel ended up.

ryan

← docs @ryanndngg