long-form agent work.
finished.

skalpel is a set of engines that keep claude code on course over long runs. we intervene at the decisions where agents drift, leave the prompt and the model untouched, and ship with zero measured quality loss across two weeks of evals.

fig.01on-task signal across a long agent run
onoff00102030stepdriftsheldunguidedskalpel
02where we are
zero
quality loss

measured against the same model running unguided on the same tasks. held across two weeks of evals.

14d
eval window

swe-bench plus our internal long-horizon set. every shipping change re-runs the full battery.

3+
engines, growing

trajectory, drift detection, compression. more in flight.

03the engines

not one engine. several, each doing one job.

> 01

trajectory engine.

the primary engine. intervenes at decision boundaries when the run is about to drift. acts only when it should; otherwise it watches.

> 02

drift detector.

produces the on-task signal we steer on. one number per agent decision, conditioned on the run's history.

> 03

compressor.

holds long agent context on-task as the run grows. structurally and semantically. the model sees the same shape of context at step fifty as it did at step five.

04vs others

what happens on a long claude code run.

same model, same prompt, same repo. observed over thirty steps of real engineering work.

approachwhat we observed
claude code, unguideddrifts mid-run
flat per-call routersame drift, no help
retry on faildoubles the bill, still drifts
skalpelholds the trajectory, zero quality loss
05developer questions

what to ask before installing anything.

what about quality?
two weeks of evals across the standard agent benchmarks and our internal long-horizon set. zero quality loss versus the same model unguided. we hold the engines to that bar before shipping.
what does skalpel see?
only what your agent already sends to claude. we optimize the trajectory in flight. nothing is stored.
do you train on my prompts?
no. prompts and completions are never used as training data. contractually, not aspirationally.
what happens if skalpel goes down?
your agent keeps working. nothing we do sits on the critical path.
can i bring my own anthropic key?
yes. byok by default. you keep the relationship with anthropic. skalpel just shapes the run.
how much does it cost?
free.

let the run finish.