The Coppice gateway already speaks tracing — the Rust
idiom for structured span + event logging. Making those spans visible
to Jaeger / Tempo / Grafana is a matter of bolting one exporter layer
onto the existing subscriber and sprinkling
#[tracing::instrument] on the handler entry points. No
rewrite, no separate metrics pipeline, no per-span allocation tax
when the exporter is off.
The attachment
In e2b-compat/src/main.rs, the subscriber composition
used to be one call to fmt().with_env_filter(…).init().
It is now:
let tracer = init_otel_tracer()?;
match tracer {
Some(t) => tracing_subscriber::registry()
.with(env_filter)
.with(fmt_layer)
.with(tracing_opentelemetry::layer().with_tracer(t))
.init(),
None => tracing_subscriber::registry()
.with(env_filter)
.with(fmt_layer)
.init(),
}
init_otel_tracer returns Some(tracer) iff
OTEL_EXPORTER_OTLP_ENDPOINT is set in the environment.
Unset, the function returns None on its first
std::env::var call — no tonic client, no background
batch task, no network I/O. Behaviour is byte-for-byte what it was
before the B1 patch.
Set, it builds an OTLP/gRPC exporter pointed at the endpoint
(http://collector:4317 in production,
http://localhost:4317 against a local container for
smoke-testing), attaches a resource with
service.name=$OTEL_SERVICE_NAME (default
e2b-compat), and installs the resulting
Tracer as the global OTel provider so a graceful shutdown
drops pending batches before exit.
The span set
The handlers that produce a span:
| Span | Site | Attributes |
|---|---|---|
sandbox.create | routes::create_sandbox | template, cpu_count, memory_mb, sandbox_id (recorded after UUID gen) |
sandbox.kill | routes::kill_sandbox | sandbox_id |
sandbox.pause / sandbox.resume | routes::pause_sandbox, resume_sandbox | sandbox_id |
sandbox.execute | envd::execute | code_len, language, sandbox_id |
kernel.spawn | kernel::spawn_kernel | sandbox_id |
backend.create | FreeBSDJailBackend::create / create_with_limits | sandbox_id, template, cpu_count, memory_mb, writable_layer_mb |
backend.kill_internal | state::kill_sandbox_internal | sandbox_id |
files.read / files.write / files.list / files.make_dir / files.rename / files.remove | files.rs handlers | path, byte count (for writes) |
reaper.sweep | reaper::sweep per 10-s tick | scanned, reaped |
Spans nest naturally because tracing tracks the current
subscriber span across .await points. A
POST /sandboxes produces one outer
sandbox.create that parents one
backend.create and one kernel.spawn; the
collector renders them as a waterfall.
Request payloads (the POST body for /execute, the bytes
going to /files) are deliberately not captured —
high-cardinality, often sensitive, and not useful for latency
debugging. We record their lengths instead. A separate
COPPICE_TRACE_VERBOSE=1 flag could relax this later; we
don’t expose one yet.
Enabling
OTEL_EXPORTER_OTLP_ENDPOINT=http://collector:4317 \
OTEL_SERVICE_NAME=e2b-compat \
./target/release/e2b-compat
On the collector side, tools/otel/collector.yaml is a
two-line config that opens the OTLP receiver and pipes spans to a
debug exporter (stdout) — zero external deps, useful for
smoke-testing that the pipeline is wired. Uncomment the
otlp/jaeger exporter and point it at a
jaegertracing/all-in-one container for a UI. See
tools/otel/README.md for the copy-paste sequence.
Example transcript
Running benchmarks/rigs/otel-smoke.sh against a local
collector with the debug exporter prints one block per exported span:
Span #0
Trace ID : bc9d1467866e5b7ace9442125eaffd49
Parent ID : (root)
ID : 6a3bcc3b20c11a0b
Name : sandbox.create
Kind : Internal
Start time : 2026-04-23 00:32:11.501 UTC
End time : 2026-04-23 00:32:11.512 UTC
Attributes:
-> service.name: Str(e2b-compat-smoke)
-> sandbox_id: Str(785b4d23af564452a3b6c636f41af452)
-> template: Str(python)
-> cpu_count: Str(None)
-> memory_mb: Str(None)
Span #1
Trace ID : bc9d1467866e5b7ace9442125eaffd49
Parent ID : 6a3bcc3b20c11a0b
Name : backend.create
Attributes:
-> sandbox_id: Str(785b4d23af564452a3b6c636f41af452)
-> template: Str(python)
In Jaeger the same trace renders as a two-level waterfall:
sandbox.create at the root, backend.create
and kernel.spawn as children. The latency cost of each
stage is immediate.
Out of scope (deliberately)
Metrics. Per-sandbox CPU / memory gauges are
B2’s territory — they ship as a
Prometheus text endpoint at /metrics and an
rctl(8)-based sampler, not as OTLP metrics. We could fan
out to OTLP metrics later but there’s no compelling reason; every
host that runs a collector also runs Prometheus scraping.
Log aggregation. tracing events fire
inside every span but we don’t forward them as OTLP logs. The
gateway logs to stderr and rc.d/coppice_gateway pipes
that to /var/log/coppice.log. A future
tracing-opentelemetry log layer is a one-liner if the
operator prefers Loki / Tempo.
Propagation from the SDK. The E2B Python/Node SDKs don’t emit W3C-TraceContext headers today. A trace starts at the gateway, not at the caller. Bridging requires an SDK patch that isn’t in scope for a compat-shim project.
Receipt
benchmarks/rigs/otel-smoke.sh starts a collector
(docker/podman if present; falls back to scraping gateway stderr
otherwise), drives one create + one kill, and asserts a
sandbox.create span reaches the collector. Rig is
FreeBSD-optional — on a dev host without ZFS, the sandbox create
fails at the backend but the span still fires and exports,
which is what the rig asserts.