Swarm Testing for Topology Exploration¶
Swarm testing is an opt-in sampling strategy for motel check. The default
random strategy keeps sampled metrics close to the topology's configured
probabilities. The swarm strategy instead partitions the choice space so
sampled runs hit low-probability and retry-heavy corners more quickly.
Use swarm when you want to ask: "what structural bounds appear if several unlikely choices happen together?" Keep random sampling when you want empirical percentiles from the configured distribution.
motel check --sample-strategy swarm --samples 100 --seed 42 topology.yaml
Choice Points¶
Swarm testing treats these engine decisions as boolean choice points:
| Kind | When it exists | true means |
false means |
|---|---|---|---|
| Operation error | An operation has an effective error_rate between 0 and 1 |
The operation's own error fires | The operation's own error does not fire |
| Call probability | An effective call has probability between 0 and 1 |
The call is emitted | The call is skipped |
| Retry activation | An effective call has retries greater than 0 |
Retry attempts are taken until the last attempt | The first attempt is final |
The model is built from the effective topology for each scenario set. That
means scenario add_calls, remove_calls, and error-rate overrides are applied
before choice points are enumerated.
Scenario activation itself is not a swarm choice point. motel check already
enumerates every distinct set of co-active scenarios and runs sampling for each
set separately.
Strategy¶
For each sampled run, swarm testing creates a set of forced decisions and lets
all unforced choices use the normal engine RNG. The first run forces all choice
points enabled, covering error-conditioned calls and retry-heavy branches. The
second run forces operation errors off while probabilistic calls and retries are
enabled, covering healthy-path calls guarded by condition: on-success. The
third run forces all choice points disabled. Subsequent runs force individual
choice points in both directions while also fixing a random subset of other
points.
This gives two useful behaviours:
- A small sample can expose rare fan-out or span growth across both error and success paths that pure random sampling is unlikely to observe.
- Later samples still explore mixed partitions, such as one retry-heavy path activating while unrelated call probabilities vary normally.
The strategy never changes static analysis. MaxDepth, MaxFanOut, and
MaxSpans remain conservative upper bounds. Swarm only changes the sampled
observations and percentile summaries reported by check.
Interpreting Results¶
Swarm percentiles are not production-frequency percentiles. They describe the distribution of chosen partitions, not the probability distribution encoded in the topology. This is useful for stress exploration and regression tests, but random sampling is the better default for empirical threshold checks.
Retry activation can force retry control flow even when the child operation would otherwise succeed. This models the retry path structurally: additional attempt spans and retry counters appear, while child span error status remains owned by the child operation's error decision.