The shape of a failed run.

Mar 18, 6 min

We plotted thousands of failed long-horizon traces. the same silhouette shows up most of the time. the engines target that silhouette.

The experiment

we took every failed long-horizon trace we had from internal runs over the last quarter and plotted the drift signal over time. failed meaning the agent did not produce a working patch on the task it was given.

The silhouette

the same shape shows up in the majority of failed runs. drift sits flat at the noise floor for the first eight to twelve steps. it climbs steadily over the middle third of the run. by the last quarter, the agent is making decisions that an external reader can tell are wrong.

the failed runs that don't fit the silhouette are early failures, usually inside the first three steps, that come from the prompt itself being unsatisfiable.

What the silhouette tells us

if the failure mode has a stereotyped shape, the engines can target the shape. the trajectory engine watches for the climbing-drift signature and intervenes once it crosses a threshold. before the threshold, it does nothing.

How often the engine acts

this is the question we hear from anyone evaluating skalpel for production. the answer: on most long runs, the engine acts somewhere between one and four times across the whole run. on short runs, it usually doesn't act at all. the engine is a watcher most of the time.

ryan

← docs @ryanndngg