motel: Swarm Testing for Topology Exploration¶
2026-06-13T14:58:55Z by Showboat 0.6.1
Swarm testing is an opt-in sampling strategy for motel check. The default random sampler follows the probabilities in the topology, which is useful for empirical percentiles. The swarm sampler fixes subsets of probabilistic choices to their extremes so a small number of samples can exercise rare combinations, retry paths, and error-conditioned branches.
Rare fan-out choices¶
This topology has one root operation and five optional backend calls. Each call is configured with an extremely small probability, so a single random sample is expected to miss all of them. Static analysis still sees the structural upper bound because all five calls could fire together.
cat > /tmp/swarm-rare-fanout.yaml << 'EOF'
version: 1
services:
gateway:
operations:
request:
duration: 10ms
calls:
- target: backend.one
probability: 0.000000000000001
- target: backend.two
probability: 0.000000000000001
- target: backend.three
probability: 0.000000000000001
- target: backend.four
probability: 0.000000000000001
- target: backend.five
probability: 0.000000000000001
backend:
operations:
one:
duration: 5ms
two:
duration: 5ms
three:
duration: 5ms
four:
duration: 5ms
five:
duration: 5ms
traffic:
rate: 10/s
EOF
echo 'wrote /tmp/swarm-rare-fanout.yaml'
wrote /tmp/swarm-rare-fanout.yaml
build/motel check --samples 1 --seed 42 --sample-strategy random /tmp/swarm-rare-fanout.yaml
PASS max-depth: 1 (limit: 10)
path: gateway.request → backend.one
p50: 0 p95: 0 p99: 0 max: 0 (1 samples)
PASS max-fan-out: 5 (limit: 100)
worst: gateway.request
p50: 0 p95: 0 p99: 0 max: 0 (1 samples)
PASS max-spans: 6 static worst-case, 1 observed/1 samples (limit: 10000)
p50: 1 p95: 1 p99: 1 max: 1 (1 samples)
build/motel check --samples 1 --seed 42 --sample-strategy swarm /tmp/swarm-rare-fanout.yaml
PASS max-depth: 1 (limit: 10)
path: gateway.request → backend.one
p50: 1 p95: 1 p99: 1 max: 1 (1 samples)
PASS max-fan-out: 5 (limit: 100)
worst: gateway.request
p50: 5 p95: 5 p99: 5 max: 5 (1 samples)
PASS max-spans: 6 static worst-case, 6 observed/1 samples (limit: 10000)
p50: 6 p95: 6 p99: 6 max: 6 (1 samples)
With the same seed and sample count, swarm reaches the structural corner case immediately: all five rare calls fire in the first partition. The static max-spans value does not change; only the sampled observation changes from 1 span to 6 spans. This is the main reason to use swarm when checking whether a topology has hidden fan-out or span-count cliffs.
Retry path activation¶
Retries are also choice points for swarm exploration. In a normal sampled trace, retries only appear when a child attempt fails or times out. Swarm can force the retry control flow so the sampled trace includes the extra attempt spans even when the child operation itself would otherwise succeed.
cat > /tmp/swarm-retries.yaml << 'EOF'
version: 1
services:
gateway:
operations:
request:
duration: 10ms
calls:
- target: worker.step
retries: 2
retry_backoff: 1ms
worker:
operations:
step:
duration: 5ms
traffic:
rate: 10/s
EOF
echo 'wrote /tmp/swarm-retries.yaml'
wrote /tmp/swarm-retries.yaml
build/motel check --samples 1 --seed 42 --sample-strategy random /tmp/swarm-retries.yaml
PASS max-depth: 1 (limit: 10)
path: gateway.request → worker.step
p50: 1 p95: 1 p99: 1 max: 1 (1 samples)
PASS max-fan-out: 3 (limit: 100)
worst: gateway.request
p50: 1 p95: 1 p99: 1 max: 1 (1 samples)
PASS max-spans: 4 static worst-case, 2 observed/1 samples (limit: 10000)
p50: 2 p95: 2 p99: 2 max: 2 (1 samples)
build/motel check --samples 1 --seed 42 --sample-strategy swarm /tmp/swarm-retries.yaml
PASS max-depth: 1 (limit: 10)
path: gateway.request → worker.step
p50: 1 p95: 1 p99: 1 max: 1 (1 samples)
PASS max-fan-out: 3 (limit: 100)
worst: gateway.request
p50: 3 p95: 3 p99: 3 max: 3 (1 samples)
PASS max-spans: 4 static worst-case, 4 observed/1 samples (limit: 10000)
p50: 4 p95: 4 p99: 4 max: 4 (1 samples)
The retry topology has a static max-spans value of 4: one gateway span plus three worker attempts. Random sampling observes two spans because the first attempt succeeds and no retry is needed. Swarm forces the retry activation choice, so the observed fan-out and span count match the structural retry path with one sample.
Choosing a strategy¶
Use random sampling when percentile checks should reflect the topology's configured probabilities. Use swarm sampling when you want to stress the structural shape of the topology and quickly expose rare combinations. Swarm percentile lines describe the partitions explored by the strategy, not production-frequency percentiles.