Case study: Folio

Folio is a personal-ledger app: log in, create accounts, record credits and debits. It is built once in Kotlin Multiplatform and ships to Android, iOS, and web from a single Compose codebase. The repo carries it under examples/folio, with its spec at examples/folio/sanderling/spec.ts.

It also carries a bug, the kind that survives manual testing and example-based suites and ships to production. This page follows how sanderling finds it.

The bug

Folio's add-transaction form saves on submit and navigates home. The submit handler does not disable the button while the save is in flight:

AddTransactionEvent.Submit -> submit()

Tap submit twice fast and two transactions post. The balance moves by twice what you typed.

Nobody writes this test. A manual tester taps submit once, sees the right number, moves on. A scripted test encodes the same single tap. The bug lives in a sequence no one thought to script: the double tap, on this screen, with a pending save. That is the class of bug sanderling exists to catch.

Telling sanderling what is true

You do not script the double tap. You state the invariant and let sanderling find the inputs that break it.

The amount the user types must equal the amount the balance moves:

const submitMovesBalanceByTypedAmount = always(
  next(() => {
    if (route.current !== "home") return true;
    const action = lastAction.current;
    if (action?.kind !== "Tap" && action?.kind !== "DoubleTap") return true;
    if (!JSON.stringify(action.on ?? "").includes("TxnSubmit")) return true;
    const typed = parseTypedAmount(txnAmountField.previous?.text);
    if (typed === 0) return true;
    return Math.abs(totalBalance.current - (totalBalance.previous ?? 0)) === typed;
  })
);

always checks the formula at every step; next lets it compare the step before a submit to the step after. The guards narrow it to the one transition that matters, a submit that lands back on home, and the last line states the rule: the balance moved by exactly the typed amount. Double-submit moves it by twice that, and the formula is false.

The values it reads come from extractors, which pull state out of the UI tree once per step:

const route = extract<string | null>("route", s => {
  if (s.ax.find({ testTag: "AddTransactionScreen" })) return "add-transaction";
  if (s.ax.find({ testTag: "HomeScreen" })) return "home";
  // ...other screens
  return null;
});

const totalBalance = extract("totalBalance", s =>
  s.ax.findAll([{ testTag: "HomeScreen" }, { testTag: "AccountCard" }])
    .reduce((sum, c) => sum + parseDollarCents(c.find({ testTag: "AccountBalance" })?.text), 0));

Every Folio screen and control carries a testTag. Compose exposes it as the resource-id on Android and the accessibility identifier on iOS, so one selector resolves on both. extract runs against the live tree each step; properties and actions read .current and .previous, never the raw state.

A second property states what new accounts must look like: a freshly created account starts at zero.

const newAccountBalanceIsZero = always(
  next(() => {
    const before = new Set((accounts.previous ?? []).map(a => a.name));
    return accounts.current
      .filter(a => !before.has(a.name))
      .every(a => a.balance === 0);
  })
);

Reaching the screens that matter

A fuzzer that pokes at random never logs in, and never reaches a transaction form. The spec gives sanderling enough to drive the real flows, no more.

Login is a precondition, exported as setup. The runner runs it before anything else and falls through to the main pool once it yields nothing:

const login = actions(() => {
  if (loggedIn.current) return [];
  const email = loginEmailField.current;
  if (email && !email.text) return [InputText({ into: email, text: DEMO_EMAIL })];
  const pwd = loginPasswordField.current;
  if (pwd && !pwd.text) return [InputText({ into: pwd, text: DEMO_PASSWORD })];
  const submit = loginSubmit.current;
  return submit ? [Tap({ on: submit })] : [];
});

export const setup = login;

It reads the form and offers the next move, never tracking what it did before. If a stray tap logs the user out mid-run, loggedIn flips and login re-engages on its own.

The transaction flow spans three screens. whenRoute keeps it eligible only where it applies, and returns every reasonable next action, so sometimes the form is filled and submitted, sometimes submitted empty:

const addTxn = whenRoute(route, ["home", "ledger", "add-transaction"], () => {
  if (route.current === "home") {
    const cards = accountCards.current;
    return cards.length ? [Tap({ on: from(cards).generate() })] : [];
  }
  if (route.current === "ledger") {
    const btn = addTxnButton.current;
    return btn ? [Tap({ on: btn })] : [];
  }
  const field = txnAmountField.current, submit = txnSubmit.current;
  const out = [];
  if (field) out.push(InputText({ into: field, text: String(amounts.generate()) }));
  if (submit) out.push(Tap({ on: submit }));
  return out;
});

The action pool weights the flows by how much they are worth testing. The transaction chain dominates, account creation stays in the mix so new accounts keep appearing, and defaultActions keeps a quarter of the budget on untargeted exploration: random taps, edge-case text in every field, scrolls, swipes, double taps.

export const actionsRoot = weighted(
  [45, addTxn],
  [25, addAccount],
  [25, defaultActions],
  [5, doubleTaps],
);

That is the whole input. Two invariants, a way in, and a weighted sense of where to spend time. Nothing here names the bug.

What the run does

sanderling launches Folio, logs in, and starts exploring. Most steps are unremarkable: open an account, add a transaction, watch the balance move by exactly what was typed, submitMovesBalanceByTypedAmount holds.

Then a step lands two taps on submit before the first save settles. Two transactions post. The balance jumps by twice the typed amount. At that step the formula evaluates false and the run records a violation: the step, the screenshot, the offending action, and the residual formula that failed.

The run does not stop. It keeps exploring and keeps checking, so one run surfaces every violation it can reach, not just the first. Open the trace with sanderling replay, press . to jump to the violation, and step across the boundary to watch the balance double.

sanderling did not know about this bug. It was given what the app guarantees and a realistic way to use it, and it found the sequence that breaks the guarantee. That is the point: you describe what must always be true, and the explorer finds what you would not have thought to try.

From here

The complete spec is examples/folio/sanderling/spec.ts. Folio uses just as a runner: from examples/folio, just install then just test drives it on an Android emulator or device, just test-ios on an iOS simulator, and just web in Chrome. Then sanderling replay and press . to land on the violation.

To point sanderling at your own app, see getting started. For every selector, operator, action, and sampler, see the spec language reference.