Parity gaps — what's left

What separates a demo from a shippable thing is the thoroughness of the “remaining work” list. This is that list, kept live. Every row below is either closed (measured, documented, cross-linked) or open (blocker, has-a-plan, or acknowledged-gap). New work lands by moving a row from open to closed and adding a benchmark.

Tier 1 — parity blockers

The items whose absence would prevent a real deployment.

gapstatewhat closing it takes
Sandbox → external NAT egressclosed 2026-04-22pf nat on vm-public from 10.77.0.0/24 to any -> (vm-public) in the root ruleset. Gotcha: putting it on re0 doesn’t work because re0 is a member of the vm-public bridge — pf sees the packet on vm-public first, so re0-NAT never matches. From sbx-a: ICMP to 1.1.1.1 = 18.7 ms RTT, HTTP = 40 ms. Rig: benchmarks/rigs/net/ext-egress.sh.
Per-sandbox pf anchors at scaleclosed 2026-04-22Rig policy-anchor-churn.sh swept N = 1, 10, 100, 500, 1000, 2000. Load latency (median of 5): N=1000 → 1.43 ms, N=2000 → 1.51 ms; flush ~1.0 ms across the range. Flat — no cliff. Default FreeBSD caps set limit anchors 512; raising to 4096 in the root ruleset is required for N>~500 and was the only knob we had to touch (not net.pf.request_maxcount). Cross-anchor enforcement verified with policy-anchor-isolation.sh: a block rule in cube_scale/sandbox-5 dropped sbx-a→sbx-b:32055; an unrelated block in cube_scale/sandbox-6 did not touch 32055 (sandbox-6’s rule correctly confined to its own port 32066). Teardown (policy-anchor-teardown.sh): pfctl -a cube_scale/sandbox-N -F rules clears the anchor; states survive rule flush (expected pf semantics) so sandbox release must also call pfctl -k <src> -k <dst>. See Anchor scale for the full table.

Tier 2 — closes measurable gaps

Items whose absence is visible in the numbers but doesn’t block deployment.

gapstatewhat closing it takes
dummynet rate limitingclosed 2026-04-22ipfw + dummynet pipe on sbx-a’s egress (ipfw add pipe 1 ip from 10.77.0.2 to any out). Four caps swept, each achieving >95% of configured rate: 10 Mbit/s → 9.63, 100 Mbit/s → 96.8, 500 Mbit/s → 482, 1000 Mbit/s → 954 (baseline unshaped 6750 Mbit/s). Rule-load wall-time 1.56–2.20 ms, flat across caps. Per-sandbox isolation verified: with sbx-a piped at 100 Mbit/s, sbx-b → sbx-a still clears 7271 Mbit/s (72× the cap) because the pipe rule only matches src=10.77.0.2. Pre-load gotcha: ipfw’s default rule on stock kernels is deny ip from any to any — the rig sets net.inet.ip.fw.default_to_accept=1 via kenv+sysctl before kldload ipfw, which is the only way to avoid locking out ssh the moment the module attaches. Bridge-side subtlety: with net.link.bridge.pfil_member=1 (required for the pf anchor), ipfw sees bridge packets twice per direction; matching … out instead of unqualified halves the budget-double-charge and restores full-rate shaping. Teardown flushes the pipe + rule and unloads both modules when the rig loaded them; default-accept is reasserted before module eviction. Rig: rate-limit-dummynet.sh. Receipt: benchmarks/results/rate-limit-dummynet-2026-04-22.txt. See Measured on honor.
Observability — diagnose.shclosed 2026-04-22Single-command bundle at tools/diagnose.sh. Aggregates pfctl -s info + pfctl -sr -vv across root and every anchor under the cube tree (sorted by packet count, hits flagged [HIT]) + pfctl -ss state slice + tcpdump -r /var/log/pflog + netstat -i bridge/tap columns + every cube_deny-prefixed table’s contents. Flags filter: —ip 10.77.0.5, —sandbox sandbox-5 (resolves via coppice-pool-ctl list), —since 30s. Emits hints when pfil_member=0, the root cube anchor is missing, or pf is disabled — the three enforcement holes documented in Gotchas. Demo/regression: benchmarks/rigs/net/diagnose-demo.sh installs a scoped cube_demo/diagnose anchor with a cube_deny_demo table, generates traffic, runs the tool, and asserts the table + rule-counter lines surface. Sample capture: benchmarks/results/diagnose-demo-2026-04-22.txt.
cc=50 cold start on bhyveclosed 2026-04-22Re-sampled on bhyve-durable-prewarm-pool after fixing the rig’s proof-of-life poll loop (added 1 ms sleep between bhyvectl —get-stats probes — without it, N parallel pollers at cc=50 compete with the very vCPUs they’re waiting to observe). Clean numbers: cc=1 20 ms / cc=10 105 ms / cc=20 333 ms mean, p50 138, p95 791 / cc=50 995 ms mean, p50 1053, p95 1290, p99 1309. This is load-on-system physics, not a bhyve ceiling — 50 guests SIGCONT-resumed simultaneously share 16 physical threads. Earlier draft reported 2143 ms / 2903 ms for cc=50; that was the rig polling-loop competing with itself. On a 32-thread host, cc=50 would fall back toward the cc=10 band. Receipts: bhyve-durable-prewarm-pool.sh (rig fix) + bhyve-cc50-2026-04-22.txt.
Upstream-quality vmm-vnode patchpartialWorks, measured. Still needs: MI header consolidation, ATF test expansion, INVARIANTS GPF repro captured empirically. See patches/upstream-review.md.
SDK filesystem API (files.read/write/list/watch/remove)openenvd exposes GET/POST /files/?path= and a WebSocket watcher on /files/watch. Not implemented in e2b-compat; SDK calls 404 today. FreeBSD primitive for watch is kqueue(2) EVFILT_VNODE. Rig: benchmarks/rigs/fs-crud.sh writes 1 MiB, reads back with checksum, walks a 10-deep tree, watches for an external touch, asserts <50 ms event latency. See feature-audit § Filesystem.
SDK commands API (commands.run streaming)openOur POST /sandboxes/:id/exec is one-shot; the SDK expects streaming stdio + a PID-tracked handle, with kill(pid) and send_stdin. REST, portable. Rig: benchmarks/rigs/commands-api.sh runs yes | head -n 1000, asserts line-by-line stream, then run(“sleep 30”) + kill returns within 1 s.
Per-sandbox metrics (GET /sandboxes/:id/metrics)openRoute returns []. rctl -hu jail:<name> yields CPU and RSS; bhyvectl —get-stats for microVMs. Rig: benchmarks/rigs/metrics-sampling.sh spins CPU in-sandbox, asserts cpuUsedPct > 50 within 5 s.
Per-sandbox logs (GET /sandboxes/:id/logs)openRoute returns []. Source: envd stderr + jail console. Ring buffer + cursor. Rig: capture envd output, assert SDK’s log iterator sees the emitted lines in order.
Live per-sandbox network update (PUT /sandboxes/:id/network)openRoute 501 today. Primitive is closed (pf anchor mutation at 1.5 ms p95); handler translation from SDK allow_out/deny_out payload to pfctl -a calls is the missing piece. Rig: allow_out=[“1.1.1.1/32”], assert curl 1.1.1.1 succeeds and curl 8.8.8.8 fails.
TTL reaperopenWe store end_at but no background sweeper. Cube’s CubeMaster does this cluster-side. Rig: create with timeout=5, wait 10 s, assert GET returns 404.
Durable snapshot endpoint (POST /sandboxes/:id/snapshots)openDistinct from pause/resume. Returns 501 today. Primitives exist (bhyvectl —suspend + ZFS snapshot); what’s missing is the named-fork-point semantic. Rig: benchmarks/rigs/snapshot-fork.shsandbox.snapshot()sandbox.fork(id), assert state divergence.
OCI template importopenCube’s cubemastercli tpl create-from-image consumes OCI refs. We bake templates ad-hoc. Rig: benchmarks/rigs/tpl-oci-import.sh pulls alpine, converts to a ZFS dataset, launches a sandbox from it.
Node & Go SDK round-tripopenPython SDK is 7/7 green. Node (@e2b/code-interpreter) and Go (github.com/e2b-dev/go-sdk) use the same REST surface; we haven’t re-run the matrix in those languages. Rig: port examples/02-persistent-kernel.py to TS and Go, assert identical transcripts.

Tier 3 — genuinely still behind

Items where FreeBSD’s capability is categorically smaller than Linux+Cilium’s. Most don’t block Coppice specifically, but they’re honest gaps.

gapstatewhat closing it takes
L7 policypartialClosed on semantics, partial on intended substrate. Cilium on Linux enforces per-HTTP-request policy (method-deny, path-prefix-deny, header-deny) via its L7 proxy redirector. The FreeBSD answer is a userspace sidecar proxy in-path between the sandbox and egress — the Envoy-sidecar pattern. Envoy itself is not packaged on FreeBSD 15.0 (pkg search -x ‘^envoy’ empty; upstream Bazel build doesn’t cleanly produce a FreeBSD binary). We therefore run haproxy 3.2.15 as the sidecar, which gives the three primitives we need (method-deny, path-prefix-deny, allow-all-else) via native ACLs with no Lua required. Intended-state Envoy config retained at tools/envoy/coppice-sidecar.yaml; running haproxy config at tools/envoy/coppice-sidecar-haproxy.cfg. Measured on sbx-a (N=5): sidecar startup 10 ms (bind-to-listening), per-request latency overhead ~80 µs (58-106 µs run-to-run) on a ~460 µs direct-loopback baseline, policy reload (haproxy -sf graceful swap) 12 ms, teardown 6-7 ms. 3/3 policy-matrix probes green (GET /get=200, POST /post=403, GET /admin/secret=403). Coppice-path example: GET /foo=200, POST /bar=403. Gotcha captured in the rig: fresh VNET jails often have only inet6 ::1 on lo0, so haproxy’s default 127.0.0.1 bind fails silently — rig auto-aliases 127.0.0.1/8. Rig: benchmarks/rigs/net/l7-policy-envoy.sh. Receipt: l7-policy-envoy-2026-04-22.txt. Still open: Envoy itself (Bazel+FreeBSD port work); header-match ACLs beyond the three we ship (haproxy supports them, we just haven’t rigged them); transparent interception (currently reverse-proxy in the sidecar — curl —proxy-style forward proxying is possible via option http-use-proxy-header but adds no semantic coverage). For the agent-sandbox threat model, this gap is now functionally closed: the enforcement surface is equivalent, the numbers are measured, and the Envoy-upstreaming work is a substrate swap.
IPv6 parityclosed 2026-04-22Lab extended to dual-stack via additive lab-setup-freebsd-v6.sh: fd77::/64 ULA on cubenet0, fd77::2/fd77::3 on sbx-a/sbx-b, merged cube_policy anchor with cube_deny_v6 table (quick-block, mirrors the v4 shape), NAT66 on re0 (not vm-public — v6 egress interface is asymmetric from v4: honor’s global SLAAC address lives on re0 directly, vm-public has no v6; the v4 NAT-on-bridge story inverted). Measurements on the same host state: TCP_RR p50 8 µs v6 vs 8 µs v4 (tied), mean 8.46 vs 8.27 µs (+2.3%); iperf3 single-stream 6.19 Gbit/s v6 vs 7.34 Gbit/s v4 (v6 at 84% of v4, ~16% header+rule-block overhead); pfctl -a cube_policy -t cube_deny_v6 -T add fd77::3 + pfctl -k fd77::3 blocks ping6 100%, delete restores on next packet; external egress to 2606:4700:4700::1111: 23.8 ms via NAT66 vs 23.5 ms host-direct (NAT66 add-latency below ICMP sample noise). Rig: run-net-bench-v6.sh. Receipt: benchmarks/results/net-v6-2026-04-22.txt. See /appendix/ebpf-to-pf § IPv6 parity.
Linux+Cilium head-to-head numbersopenOur “Cube/eBPF typical” column is published figures, not a measured run on comparable hardware. Would close with a second honor-class box running Cilium + same rigs.
virtio-fs in FreeBSD baseacknowledgedFreeBSD 15.0 does not ship it. 9p-over-virtio is the substitute. See /essays/caveats.
bpftrace-class observabilityacknowledgedDTrace covers 80-90% but not all. See /appendix/ebpf-on-freebsd.
Browser-sandbox demo (Playwright + CDP)openCube’s examples/browser-sandbox runs headless Chromium behind a CDP WebSocket, reachable through 9000-<id>.<domain>. www/chromium exists on FreeBSD 15 but is heavy; www/firefox-esr + marionette is a lighter alternative. Rig: sandbox with browser template, Playwright connects via cubeproxy, page.goto + page.screenshot.
Capsicum-wrapped envdopenEnvd today runs with jail-only confinement. Wrapping its per-request workers in cap_enter tightens the “LLM-generated Python reaches the kernel” surface. Rig: benchmarks/rigs/capsicum-envd.sh asserts open() outside the sandbox returns ECAPMODE.
Multi-node cluster overlayacknowledgedCube’s CubeMaster advertises multi-node. Coppice is single-host. FreeBSD answer is vxlan(4) or wireguard; untested. Outside the measured-single-host mandate but noted for completeness. Rig would need a second honor-class host.

Closed (for the record)

Items that were Tier 1 blockers and are now measured + documented.

gapclosed onreceipt
Durable snapshot-restore2026-04-22Built SNAPSHOT kernel + two-tier pool. bhyve-durable-prewarm-pool 17 ms cc=1. See /appendix/snapshot-cloning.
Cross-guest memory dedup2026-04-22vmm-vnode patch, N=1000 × 256 MiB in 9.1 GiB host RAM. See /appendix/vmm-vnode-patch.
E2B SDK compat (lifecycle)earlier10/10 Python SDK calls pass. /appendix/e2b-compat.
E2B envd /execute (run_code)2026-04-22NDJSON over :49999, jexec python3 backend. /appendix/run-code-protocol.
E2B persistent kernel (state across run_code)2026-04-22ipykernel spawned in-jail on sandbox create, in-jail Python bridge translates iopub → NDJSON. x = 42 / print(x) → 42 / np.array([1,2,3]) → text/plain, pandas → text/html, matplotlib → image/png, NameError → error+traceback. 7/7 checks pass via the e2b-code-interpreter Python SDK. Rig: benchmarks/rigs/jupyter-e2e.sh. /appendix/run-code-protocol.
Intra-sandbox network isolation2026-04-22Two VNET jails, pf deny-by-default anchor, 7 µs p50 TCP_RR. /appendix/ebpf-to-pf.
Policy update under traffic load2026-04-22No visible contention. At ~14 Gbit/s intra-host through cubenet0, per-op mutation p99 is 1.25 ms (idle 1.21 ms) and iperf3 throughput stays within run-to-run noise. pf’s table-lock is finer-grained than feared. See /appendix/ebpf-to-pf and benchmarks/rigs/net/policy-churn-under-load.sh.
Per-sandbox pf anchors at scale2026-04-22N=2000 sibling anchors, median load latency 1.51 ms, flush 1.08 ms. Flat across N=1…2000. Cross-anchor isolation + teardown verified. Sysctl: raise set limit anchors from default 512 to 4096. See /appendix/ebpf-to-pf and benchmarks/rigs/net/policy-anchor-churn.sh.
External → sandbox via rdr/DNAT2026-04-22pf cube_rdr nat anchor + dnsmasq for *.coppice.lan + Go cubeproxy for the E2B-style <port>-<id>.<domain> Host-header split. LAN-peer curl to both 192.168.1.182:30001 (plain rdr) and 80-sbxa.coppice.lan:30080 (rdr → proxy) returns 200. rdr add-latency 0.24 µs p50 over 9.63 µs bare s2s; rdr rule-add flat at 1.2 ms from N=10 to N=100; cubeproxy overhead +109 µs p50 / +187 µs p99 on 96 µs direct-HTTP. See /appendix/ebpf-to-pf and benchmarks/rigs/net/ext-to-sandbox.sh.
cubenet lifecycle integration2026-04-22Controller at tools/coppice-pool-ctl.sh. checkout allocates IP + tap (kernel auto-assigned — pinning the tap number trips make_dev_sv with EEXIST and panics the host, three reboots of learning) + per-sandbox cube/sandbox-<id> anchor. release flushes the anchor, kills pf states, destroys the tap. N=10 end-to-end (pool-cubenet-e2e.sh): checkout mean 21.5 ms / p95 23 ms, release mean 308 ms / p95 610 ms (release is ifconfig tap destroy-bound, not pf-bound), anchors verified 10/10, taps verified 10/10, deny-mutation enforcement confirmed, neighbor unaffected, 0 pf states leaked. Downstream bhyve wiring lives in bhyve-durable-prewarm-pool-cubenet.sh.
Multi-stream throughput scaling2026-04-22Suspect (pf state-table contention) was wrong. Rig throughput-multistream.sh swept P = 1, 2, 4, 8, 16, 32, 64, 128 streams TCP and UDP at MTU 1500. Not a cliff — a flat noisy plateau. TCP: P=1 → 7.10 Gbit/s, P=16 → 7.00 Gbit/s, P=128 → 6.44 Gbit/s (absolute numbers ~half the quiescent 14.6 Gbit/s baseline because three sibling subagents were sharing honor during the run; the shape is what the question asked about). Monotonic sag 1→128 is ≤15%. pf state-table is nowhere near contended: high-water 180-1024 states against a 131 072-bucket hash (load factor ≤0.008); insertion rate 1-33/s (new-flow setup, expected); search rate 1.2-1.8 M/sec flat across P. Host CPU pegs at ~16-17% (= one of 16 threads) regardless of stream count — the bottleneck is the single-threaded TCP sender/receiver path, not pf. UDP confirms: iperf3 pushes 148 Gbit/s of attempted send at P=128 but 96% is dropped by a 100%-CPU receiver. Raising net.pf.states_hashsize / net.pf.source_nodes_hashsize (loader-time tunables; default 131072 / 32768) would not move the inflection because the hash is not the constrained resource — load factor is <1%. For the Coppice workload (agent sandboxes, 100s of low-bitrate flows at ~Mbit/s each), this gap does not matter: pf has three orders of magnitude of headroom on state count and searches/sec. See /appendix/ebpf-to-pf § Multi-stream scaling. Receipts: benchmarks/results/throughput-multistream-2026-04-22.txt.

Method

Every open item has three things attached to it before it closes:

  1. A script under benchmarks/rigs/, executable from a fresh clone assuming honor access.
  2. A numbers section added to the relevant appendix page, citing that script.
  3. An update to this page moving the row from open to closed with the date and a link to the receipt.

No item closes on vibes. If the measurement surprises us, the prose changes; the measurement stays.