Validating motel check Against Published Trace Studies¶
motel check computes static worst-case bounds and Monte Carlo percentile
distributions for trace depth, fan-out, and span count. The Alibaba and Meta
trace studies report empirical distributions for the same structural metrics,
measured from production traffic. This document validates check against
those independently published numbers: topologies under docs/examples/ were
constructed to exhibit the reported characteristics, and the test suite in
pkg/synth/empirical_test.go verifies that check reproduces them.
Method¶
- Model the structural characteristics each paper reports as a topology
(
docs/examples/alibaba-call-graph.yaml,docs/examples/meta-wide-fanout.yaml). - Verify the static analysis (
MaxDepth,MaxFanOut,MaxSpans) returns exactly the values the topology was constructed to exhibit. - Run Monte Carlo sampling (1000 traces, fixed seed) and verify the p50/p95/p99/max distributions stay within the published ranges and never exceed the static bounds.
This validates the analysis engine in both directions: the static DFS agrees with hand-computed expectations on realistic graph shapes, and the simulation engine produces trace populations consistent with both the static bounds and the published distributions.
Alibaba (Luo et al., IEEE TPDS 2022)¶
The paper analyses production call graphs from Alibaba clusters and reports:
| Published metric | Reported value | Modelled in topology | check agrees |
|---|---|---|---|
| Call graph depth | 2–6 for top services | Longest path depth 6 | Yes |
| Children set sizes | 1–10 | Max fan-out 10 | Yes |
| Repeated call rate | 16.2% | 3 of 18 call edges (16.7%) | Yes |
| Cache-dominated leaf tier | Heavy memcached access | memcached is the common leaf | n/a (shape) |
motel check --seed 42 docs/examples/alibaba-call-graph.yaml reports:
PASS max-depth: 6 (limit: 10)
path: gateway.POST /checkout → orchestrator.compose → product.detail → inventory.check → reservation.hold → cache.get → memcached.get
p50: 6 p95: 6 p99: 6 max: 6 (1000 samples)
PASS max-fan-out: 10 (limit: 100)
worst: orchestrator.compose
p50: 8 p95: 10 p99: 10 max: 10 (1000 samples)
PASS max-spans: 24 static worst-case, 24 observed/1000 samples (limit: 10000)
p50: 19 p95: 24 p99: 24 max: 24 (1000 samples)
The static bounds (depth 6, fan-out 10, spans 24) match hand-computed expectations exactly. The sampled distributions behave as the model predicts: fan-out varies between the deterministic floor (5 children) and the bound (10) according to the per-call probabilities, span counts spread below the worst case, and no sampled value exceeds a static bound.
The repeated call rate is checked against the topology definition rather
than check output: check does not report repeated calls as a metric, so
the test computes the fraction of call edges with count > 1 directly from
the parsed config (16.7%, within tolerance of the published 16.2%).
Meta (Huye et al., USENIX ATC 2023; Du et al., ICS 2025)¶
Huye et al. characterise Meta's request workflows as wide and shallow at the aggregation tier; Du et al. quantify children set sizes of up to 50 at Meta (versus 1–10 at Alibaba). The modelled topology is a feed-style workflow with an aggregator fanning out to 40 ranking leaves plus cache and metadata calls. For the public ATC 2023 summary data import workflow, see meta-trace-import.md.
| Published metric | Reported value | Modelled in topology | check agrees |
|---|---|---|---|
| Children set sizes | Up to 50 (Meta) | Fan-out 50 | Yes |
| Workflow shape | Wide, shallow aggregation | Depth 3, 54 spans | Yes |
motel check --seed 42 docs/examples/meta-wide-fanout.yaml reports:
PASS max-depth: 3 (limit: 10)
path: web.GET /feed → feed-agg.rank → social-graph.follows → social-db.query
p50: 3 p95: 3 p99: 3 max: 3 (1000 samples)
PASS max-fan-out: 50 (limit: 100)
worst: feed-agg.rank
p50: 50 p95: 50 p99: 50 max: 50 (1000 samples)
PASS max-spans: 54 static worst-case, 54 observed/1000 samples (limit: 10000)
p50: 54 p95: 54 p99: 54 max: 54 (1000 samples)
This topology is fully deterministic, so it doubles as an exactness check: every sampled trace must realise the static worst case, and the percentile distributions must be constant. They are.
Discrepancies and limitations¶
No disagreements between check and the modelled values were found. The
following published metrics could not be validated through check and are
documented as limitations:
- Overlap rate (77.1%, Luo et al.). Measures how often the same call
graph topology recurs across traces of the same entry service. This is a
cross-trace population metric;
checkanalyses one topology and reports per-trace structure, so there is no corresponding output to compare. The simulation engine does produce overlapping topologies (probabilistic calls make trace shapes recur), but quantifying that would require a new metric. - Repeated call rate (16.2%, Luo et al.). Validated against the topology
definition, not
checkoutput, becausecheckdoes not report repeated calls as a separate metric. Repeated calls are reflected in fan-out and span counts (acount: 3edge contributes 3 to its caller's fan-out). - Topology dynamics over time (Huye et al.). Meta's paper reports churn
in the service topology itself. motel models this with scenarios
(
add_calls/remove_calls), butcheckanalyses the base topology without scenario overlays, so dynamics are out of scope here. - Depth distribution shape. Luo et al. report that most call graphs are
shallow with a long tail. A single topology cannot reproduce a
population-level depth distribution across many distinct entry services;
the modelled topology instead represents one deep (depth-6) entry service
from the top of the published range. The property-based generator in
pkg/synth/check_test.go(genRealisticConfig) covers the population view by drawing many topologies from the published distributions.
Reproducing¶
make build
build/motel check --seed 42 docs/examples/alibaba-call-graph.yaml
build/motel check --seed 42 docs/examples/meta-wide-fanout.yaml
go test ./pkg/synth/ -run TestEmpirical -v
References¶
- Luo et al., "An In-Depth Study of Microservice Call Graph and Runtime Performance," IEEE TPDS, 2022. IEEE Xplore
- Luo et al., "Characterizing Microservice Dependency and Performance: Alibaba Trace Analysis," SoCC, 2021. ACM DL
- Huye, Shkuro, and Sambasivan, "Lifting the Veil on Meta's Microservice Architecture," USENIX ATC, 2023. USENIX
- Du et al., "A Microservice Graph Generator with Production Characteristics," ICS, 2025. arXiv
- Related work survey