What is a run?
One sanderling test invocation. Fresh install,
spec-driven exploration, then the trace lands in
runs/<timestamp>/. Typically minutes to hours, not
seconds.
A run is not analogous to a unit test. A closer framing is: boot a fuzzer for an hour and see what breaks. Violations are recorded in the trace and exploration continues, so one run can surface many bugs.
Lifecycle
sanderling test --spec spec.ts --bundle-id com.example.app --duration 30m
│
├── uninstall and reinstall the app (clean slate, every run)
├── boot the sidecar, connect the agent socket
├── bundle the spec, load it into goja
│
├── step 0..N: pause, capture state, evaluate properties, pick action, resume, dispatch
│
└── terminate when --duration elapses (or SIGINT)
└── trace written to ./runs/<timestamp>/
├── trace.jsonl
├── screenshots/
└── meta.json
Why runs are long and linear
sanderling does not restart the app every N steps. Each restart throws away two things.
Novelty and coverage signal. The exploration strategy weights actions by whether they reach previously unseen state. Restarting resets that history.
Deep app states. Many screens take many actions to reach: nested settings, a loaded cart, post-checkout flows. A 50-step prefix to reach "cart with 3 items" does not happen if every run starts cold.
Long-linear trajectories find bugs that restart-based testing structurally cannot.
Setup cost amortizes
Preconditions (login, onboarding, consent dialogs) are written as weighted action generators gated on extractors. See writing specs. They fire only when applicable, so login happens once per run, not per step.
| Run length | Login cost | % of run |
|---|---|---|
| 5 min | ~15s | 5% |
| 30 min | ~15s | 0.8% |
| 1 hour | ~15s | 0.4% |
| CI: 3 seeds × 10 min | ~45s total | 2.5% |
At any non-trivial run length, preconditions are a rounding error.
Session state
Session tokens, keychain, shared prefs, cookies, and other
app-managed persistence survive the full run. If the app logs the user
out mid-run, the doLogin generator re-fires automatically
because its gating extractor (onLoginScreen) becomes true
again. No retry logic. No special-casing.
Termination
A run ends when either of these happens.
--durationelapses.- The process is interrupted (SIGINT).
The trace is written incrementally, so an interrupted run is still fully inspectable.
Additional termination conditions (--max-steps,
--exit-on-violation, hard crash handling) land in v0.1.0.