Skip to content

How motel import Builds a Topology from Traces

This document walks through the inference pipeline step by step, using 4 real traces generated from a small topology. Every decision the code makes is shown against the actual data.

Files in this directory

File Purpose
topology.yaml The source topology used to generate the traces
traces.jsonl 10 curated spans (4 traces) in stdouttrace format
inferred-topology.yaml The topology produced by motel import

You can reproduce the import yourself:

motel import docs/explanation/import-pipeline/traces.jsonl

The source topology

Three services, one root operation, one probabilistic call:

api.handle ──always──▶ database.query
     │
     └──50% chance──▶ cache.lookup

The api service has a deployment.environment: staging attribute and a 10% error rate on handle. See topology.yaml for the full definition.

The raw trace data

We generated traces with motel run --stdout and hand-picked 4 that illustrate different paths through the topology:

Trace 1 (017acb8b): api → database + cache, no error

handle   [api]      span=7a48dd5f  parent=00000000  21:16:45.401 → .429  (28.3ms)  OK
├─ query [database] span=3073cfe2  parent=7a48dd5f  21:16:45.413 → .417  (4.4ms)   OK
└─ lookup[cache]    span=46509810  parent=7a48dd5f  21:16:45.413 → .414  (1.2ms)   OK

Both children start at the same time (.413) — this is a parallel call.

Trace 2 (009f737e): api → database only, no error

handle   [api]      span=58edc0f5  parent=00000000  21:16:44.284 → .316  (32.1ms)  OK
└─ query [database] span=6424ff5b  parent=58edc0f5  21:16:44.299 → .302  (3.2ms)   OK

No cache call this time — the 50% probability meant this trace skipped it.

Trace 3 (09d79ff5): api → database + cache, error

handle   [api]      span=da94c8aa  parent=00000000  21:16:44.183 → .211  (28.4ms)  ERROR
├─ query [database] span=e5acac49  parent=da94c8aa  21:16:44.192 → .202  (9.2ms)   OK
└─ lookup[cache]    span=e2cd9a6d  parent=da94c8aa  21:16:44.192 → .194  (1.3ms)   OK

The error is on api.handle itself — the children completed successfully.

Trace 4 (2c6c5e27): api → database only, error

handle   [api]      span=28a23c72  parent=00000000  21:16:43.980 → .001  (20.6ms)  ERROR
└─ query [database] span=f3adceaf  parent=28a23c72  21:16:43.988 → .994  (6.1ms)   OK

Stage 1: Parse spans

Code: span.goParseSpans()

The importer reads the 10 lines of JSON. Each line is a stdouttrace span with SpanContext, Parent, Name, StartTime, EndTime, Status, and Attributes fields.

The format is auto-detected: the first JSON object has a SpanContext key, which identifies it as stdouttrace (as opposed to OTLP, which would have resourceSpans).

Each span is normalised into a common Span struct:

  • TraceID and SpanID: taken from SpanContext
  • ParentID: taken from Parent.SpanID (00000000... means root)
  • Service: extracted from the synth.service attribute
  • Operation: the span Name
  • StartTime / EndTime: parsed from RFC3339 timestamps
  • IsError: true when Status.Code == "Error"
  • Attributes: all non-internal attributes (excluding synth.* keys)

After parsing, we have 10 Span values — a flat list with no structure.

Stage 2: Build trace trees

Code: tree.goBuildTrees()

The flat spans are grouped by TraceID (4 groups), then linked into trees by matching each span's ParentID to another span's SpanID.

For trace 017acb8b:

  1. Index all spans by SpanID: {7a48dd5f: handle, 3073cfe2: query, 46509810: lookup}
  2. handle has parent 00000000 (all zeros) → it's a root
  3. query has parent 7a48dd5f → child of handle
  4. lookup has parent 7a48dd5f → child of handle

Result:

handle (root)
├── query
└── lookup

The same process produces 4 trees:

Trace Tree
017acb8b handle → [query, lookup]
009f737e handle → [query]
09d79ff5 handle → [query, lookup]
2c6c5e27 handle → [query]

Stage 3: Collect statistics

Code: stats.goStatsCollector.CollectFromTrees()

The collector walks each tree recursively, accumulating per-operation data.

Duration statistics

For each (service, operation) pair, every span's duration is recorded:

Service Operation Durations (ms) Mean StdDev
api handle 28.3, 32.1, 28.4, 20.6 27.4ms 4.8ms
database query 4.4, 3.2, 9.2, 6.1 5.7ms 2.6ms
cache lookup 1.2, 1.3 1.3ms 83µs

The mean and sample standard deviation (n-1 denominator) are computed by MeanDuration() and StdDevDuration(). When stddev is non-zero, the duration is formatted as mean +/- stddev (e.g. 27ms +/- 4.8ms).

Error counts

Service Operation Total Errors Rate
api handle 4 2 50%
database query 4 0
cache lookup 2 0

FormatErrorRate() only emits an error_rate field when errors > 0. When --min-traces is greater than 1, import also uses it as the per-operation sample target. Operations below that target produce a confidence warning with the operation sample count and error count. The YAML is still emitted, but the duration and error-rate estimates need review.

Call counts (for probability)

For each parent operation, the collector records how many times each child was called and how many times the parent was invoked total:

Parent Child Times called Parent invocations Probability
api.handle database.query 4 4 4/4 = 1.0
api.handle cache.lookup 2 4 2/4 = 0.5

database.query appears in all 4 traces → probability 1.0 (always called). cache.lookup appears in 2 of 4 traces → probability 0.5.

In the YAML output, probability 1.0 is omitted (the call is listed as a plain string target). Probability < 1.0 is written as a mapping with an explicit probability field. When --min-traces is greater than 1, calls observed fewer times than that sample target are reported on stderr as low-confidence call probability estimates. Calls observed every time the parent ran are not warned as low-confidence probabilities.

Stage 4: Infer call style

Code: stats.goisParallel() / isSequential()

When a parent has 2+ children, the collector votes on whether the calls were parallel or sequential by examining timestamps.

For traces 1 and 3 (the ones where handle calls both query and lookup):

  • Trace 1: query starts at .413, lookup starts at .413 — difference is 0ms, well within the 1ms threshold → parallel
  • Trace 3: query starts at .192, lookup starts at .192 — same start time → parallel

Both votes are parallel, so no call_style field appears in the output (parallel is the default). If the votes had favoured sequential, the YAML would include call_style: sequential.

Traces 2 and 4 have only one child each, so no vote is cast. When --min-traces is greater than 1, a small call-style vote total or a meaningful minority vote reports the vote counts on stderr so the inferred call style can be checked against the real service behaviour.

Stage 5: Detect service attributes

Code: infer.goinferServiceAttributes()

The importer scans all spans and finds attributes that have the same value on every span of a given service. These are promoted to service-level attributes in the topology.

Service Attribute Values seen Constant?
api deployment.environment staging (×4) Yes — promoted
database (none)
cache (none)

Internal attributes (synth.service, synth.operation, synth.scenarios, service.name, and telemetry.sdk.*) are excluded from this analysis.

Result: only api gets a resource_attributes section in the YAML.

Stage 6: Compute traffic rate

Code: infer.gocomputeWindow()

The traffic rate is calculated from root span timestamps:

  • Earliest root: trace 4 at 21:16:43.980
  • Latest root: trace 1 at 21:16:45.401
  • Window: 1.42 seconds
  • Rate: 4 traces / 1.42s ≈ 3/s

The rate is formatted as 3/s in the traffic section.

Stage 7: Report confidence diagnostics

Code: diagnostics.goReportConfidenceDiagnostics()

Before writing YAML, import reviews the collected statistics for weak evidence:

  • operations below the --min-traces sample target, including the observed error count
  • downstream calls observed fewer times than that target
  • call-style inference with few votes or meaningfully mixed parallel/sequential evidence

Diagnostics are written to stderr as warnings. They do not make the import fail and they are not included in redirected topology YAML. With the default --min-traces=1, only the existing trace-count warnings are emitted.

Stage 8: Marshal to YAML

Code: marshal.goMarshalConfig()

All the collected data is assembled into the topology YAML format. Services and operations are sorted alphabetically for deterministic output.

Each decision maps to a line in inferred-topology.yaml:

# Inferred from 4 traces (10 spans) observed over 1.4 seconds ← trace/span count, window
version: 1
services:
  api:
    resource_attributes:
      deployment.environment: staging       ← stage 5: constant attribute
    operations:
      handle:
        duration: 27ms +/- 4.8ms            ← stage 3: mean +/- stddev
        error_rate: 50%                      ← stage 3: 2 errors / 4 total
        calls:
          - target: cache.lookup             ← stage 3: probability < 1.0
            probability: 0.5                 ←   so written as mapping
          - database.query                   ← stage 3: probability = 1.0
  cache:                                     ←   so written as plain string
    operations:
      lookup:
        duration: 1.3ms +/- 83µs
  database:
    operations:
      query:
        duration: 5.7ms +/- 2.6ms
traffic:
  rate: 3/s                                  ← stage 6: 4 traces / 1.42s

Note: no call_style field appears because all votes were parallel (the default).

Stage 9: Round-trip validation

Code: infer.govalidateRoundTrip()

As a final safety check, the generated YAML is written to a temp file and loaded back through synth.LoadConfig() and synth.ValidateConfig(). This catches any inconsistency between the marshal format and what the synth engine expects — duration formats that don't parse, unknown fields, broken call references, etc.

If round-trip validation fails, the import returns an error rather than emitting a topology that motel run would reject.

$ motel validate docs/explanation/import-pipeline/inferred-topology.yaml
Configuration valid: 3 services, 1 root operation

What wasn't inferred

The import produces a starting point, not a finished topology. Things the importer cannot determine from trace data alone:

  • Scenario overrides — there's no way to know which behaviour changes are intentional vs normal operation
  • Traffic patterns — only the average rate is computed, not whether it's uniform, diurnal, or bursty
  • Queue depth, circuit breakers, backpressure — these are simulation parameters, not observable from traces
  • Attribute distributions — only constant attributes are detected. Per-span varying attributes (like request IDs) are dropped
  • Duration distribution shape — the synth engine uses a normal distribution, but real durations are often log-normal or bimodal

The header comment in the output reminds users to review and adjust.