01research

engines for long-form agent work.

long agent runs drift. we build small engines that watch for the moments where drift happens and steer the run at the decision boundary, without touching your prompt or your model. zero quality loss across two weeks of evals.

fig.01on-task signal across a long agent run
onoff00102030stepdriftsheldunguidedskalpel
02engines
enginestatusjob
trajectory engineshippingintervenes at decision boundaries when the run is about to drift. acts when it should and is silent the rest of the time.
drift detectorshippingproduces the on-task signal we steer on. one number per agent decision, conditioned on the run's history.
compressorshippingholds long agent context on-task as the run grows. trims structurally and semantically, not by raw token count.
re-anchor primitivein flightbrings a drifting run back to its original goal without restarting it. early internal evals; not in the install path yet.
03method

we steer the trajectory, not the prompt.

we do not edit your prompt. we do not fine-tune. we do not route to a different model. we do not run a second model in the background. those interventions all have a reason we rejected them, written up on the docs.

we measure trajectory at every agent decision and intervene only at the boundaries where the run is committing to a path it can't easily back out of. when the drift detector reports high-confidence drift, the trajectory engine narrows the choice set. otherwise the engines watch.

internal evals run on two weeks of long-horizon traces and the standard agent benchmarks. quality matches the unguided baseline within noise. the methodology is being written up in pieces in the docs.