Parity gaps — what's left

What separates a demo from a shippable thing is the thoroughness of the “remaining work” list. This is that list, kept live. Every row below is either closed (measured, documented, cross-linked) or open (blocker, has-a-plan, or acknowledged-gap). New work lands by moving a row from open to closed and adding a benchmark.

Tier 1 — parity blockers

The items whose absence would prevent a real deployment.

gapstatewhat closing it takes
Sandbox → external NAT egressclosed 2026-04-22pf nat on vm-public from 10.77.0.0/24 to any -> (vm-public) in the root ruleset. Gotcha: putting it on re0 doesn’t work because re0 is a member of the vm-public bridge — pf sees the packet on vm-public first, so re0-NAT never matches. From sbx-a: ICMP to 1.1.1.1 = 18.7 ms RTT, HTTP = 40 ms. Rig: benchmarks/rigs/net/ext-egress.sh.
Per-sandbox pf anchors at scaleclosed 2026-04-22Rig policy-anchor-churn.sh swept N = 1, 10, 100, 500, 1000, 2000. Load latency (median of 5): N=1000 → 1.43 ms, N=2000 → 1.51 ms; flush ~1.0 ms across the range. Flat — no cliff. Default FreeBSD caps set limit anchors 512; raising to 4096 in the root ruleset is required for N>~500 and was the only knob we had to touch (not net.pf.request_maxcount). Cross-anchor enforcement verified with policy-anchor-isolation.sh: a block rule in cube_scale/sandbox-5 dropped sbx-a→sbx-b:32055; an unrelated block in cube_scale/sandbox-6 did not touch 32055 (sandbox-6’s rule correctly confined to its own port 32066). Teardown (policy-anchor-teardown.sh): pfctl -a cube_scale/sandbox-N -F rules clears the anchor; states survive rule flush (expected pf semantics) so sandbox release must also call pfctl -k <src> -k <dst>. See Anchor scale for the full table.

Tier 2 — closes measurable gaps

Items whose absence is visible in the numbers but doesn’t block deployment.

gapstatewhat closing it takes
dummynet rate limitingclosed 2026-04-22ipfw + dummynet pipe on sbx-a’s egress (ipfw add pipe 1 ip from 10.77.0.2 to any out). Four caps swept, each achieving >95% of configured rate: 10 Mbit/s → 9.63, 100 Mbit/s → 96.8, 500 Mbit/s → 482, 1000 Mbit/s → 954 (baseline unshaped 6750 Mbit/s). Rule-load wall-time 1.56–2.20 ms, flat across caps. Per-sandbox isolation verified: with sbx-a piped at 100 Mbit/s, sbx-b → sbx-a still clears 7271 Mbit/s (72× the cap) because the pipe rule only matches src=10.77.0.2. Pre-load gotcha: ipfw’s default rule on stock kernels is deny ip from any to any — the rig sets net.inet.ip.fw.default_to_accept=1 via kenv+sysctl before kldload ipfw, which is the only way to avoid locking out ssh the moment the module attaches. Bridge-side subtlety: with net.link.bridge.pfil_member=1 (required for the pf anchor), ipfw sees bridge packets twice per direction; matching … out instead of unqualified halves the budget-double-charge and restores full-rate shaping. Teardown flushes the pipe + rule and unloads both modules when the rig loaded them; default-accept is reasserted before module eviction. Rig: rate-limit-dummynet.sh. Receipt: benchmarks/results/rate-limit-dummynet-2026-04-22.txt. See Measured on honor.
Observability — diagnose.shclosed 2026-04-22Single-command bundle at tools/diagnose.sh. Aggregates pfctl -s info + pfctl -sr -vv across root and every anchor under the cube tree (sorted by packet count, hits flagged [HIT]) + pfctl -ss state slice + tcpdump -r /var/log/pflog + netstat -i bridge/tap columns + every cube_deny-prefixed table’s contents. Flags filter: —ip 10.77.0.5, —sandbox sandbox-5 (resolves via coppice-pool-ctl list), —since 30s. Emits hints when pfil_member=0, the root cube anchor is missing, or pf is disabled — the three enforcement holes documented in Gotchas. Demo/regression: benchmarks/rigs/net/diagnose-demo.sh installs a scoped cube_demo/diagnose anchor with a cube_deny_demo table, generates traffic, runs the tool, and asserts the table + rule-counter lines surface. Sample capture: benchmarks/results/diagnose-demo-2026-04-22.txt.
cc=50 cold start on bhyveclosed 2026-04-22Re-sampled on bhyve-durable-prewarm-pool after fixing the rig’s proof-of-life poll loop (added 1 ms sleep between bhyvectl —get-stats probes — without it, N parallel pollers at cc=50 compete with the very vCPUs they’re waiting to observe). Clean numbers: cc=1 20 ms / cc=10 105 ms / cc=20 333 ms mean, p50 138, p95 791 / cc=50 995 ms mean, p50 1053, p95 1290, p99 1309. This is load-on-system physics, not a bhyve ceiling — 50 guests SIGCONT-resumed simultaneously share 16 physical threads. Earlier draft reported 2143 ms / 2903 ms for cc=50; that was the rig polling-loop competing with itself. On a 32-thread host, cc=50 would fall back toward the cc=10 band. Receipts: bhyve-durable-prewarm-pool.sh (rig fix) + bhyve-cc50-2026-04-22.txt.
Upstream-quality vmm-vnode patchpartialWorks, measured. Still needs: MI header consolidation, ATF test expansion, INVARIANTS GPF repro captured empirically. See patches/upstream-review.md.
SDK filesystem CRUD API (files.read/write/list/make_dir/rename/remove/exists)closed 2026-04-24e2b-compat/src/files.rs serves the Python SDK’s filesystem surface on the host side against the sandbox’s ZFS clone, with the same path-safety gate every route shares (absolute path required, .. rejected, canonicalized prefix check). Connect-RPC endpoints at /filesystem.Filesystem/{Stat,ListDir,MakeDir,Move,Remove} plus the octet-stream read/write paths and batch/download-token helpers are live. Receipt: examples/08-filesystem.py round-trips write/read, list, nested mkdir, rename, remove, and traversal rejection. See /appendix/filesystem-api and feature-audit § Filesystem.
SDK filesystem watch (files.watch)closed 2026-04-24Closed with the real Connect server-streaming /filesystem.Filesystem/WatchDir route the SDK calls. The shipping implementation is host-side snapshot-diff polling over the jail rootfs, and the receipt is benchmarks/rigs/files-watch.sh: it opens the Node SDK’s watchDir(), mutates the jail rootfs directly from honor, and asserts create/write/rename/remove plus recursive nested-create events arrive within the configured latency budget. Latest transcript: benchmarks/results/files-watch/latest.txt. See /appendix/filesystem-api.
SDK commands API (commands.run streaming)closede2b-compat/src/commands.rs speaks Connect-RPC at /process.Process/{Start,Connect,List,SendInput,SendSignal,CloseStdin,Update} with JSON codec and 5-byte envelope framing, plus an NDJSON alias at POST /commands for curl rigs. Spawns jexec -l -U root e2b-<id>, line-reads stdio into ProcessEvent.data envelopes, tees into the per-sandbox LogBuffer, surfaces kill as SIGKILL via SendSignal. Metrics: coppice_commands_{started,finished,active}. Receipt: examples/11-commands-stream.py. Writeup: /appendix/commands-streaming.
Per-sandbox metrics (GET /sandboxes/:id/metrics)closedBackground sampler reads rctl -h -u jail:e2b-<id> + zfs get used,quota every 10 s; route returns the latest reading as a single-element array. Same numbers also land as labeled Prometheus gauges (coppice_sandbox_cpu_percent{sandbox,template}, …_memory_bytes, …_disk_used_bytes). Rig: benchmarks/rigs/per-sandbox-metrics-smoke.sh. Writeup: /appendix/per-sandbox-metrics.
Per-sandbox logs (GET /sandboxes/:id/logs)closedPer-sandbox Arc<LogBuffer> (bounded 1024-entry deque, drop-oldest, latching truncated flag). kernel::spawn_kernel pipes ipykernel stdout + stderr and tees each line into the ring as source=“kernel”. Handler serves snapshot with ?limit=N and ?since=<rfc3339> filters; falls back to jexec e2b-<id> tail -n 200 /var/log/messages (tagged source=“syslog”) when the buffer is empty. Per-sandbox counter coppice_sandbox_log_lines_total. CLI: coppice sandbox logs <id> [—limit N] [—since 5m] [—follow] [—json]. Rig: five LogBuffer unit tests plus three CLI integration tests against a mock gateway. See /appendix/per-sandbox-logs.
Live per-sandbox network update (PUT /sandboxes/:id/network)closed 2026-04-24Route is live in e2b-compat/src/routes.rs. The gateway now translates SDK allow_out/deny_out/air_gapped payloads into per-sandbox pf fragments under coppice/sandbox-<short>, preserves DNS reachability to 10.78.0.1/fd77::1, and explicitly kills live pf states when policy flips so existing TCP/UDP/ICMP flows do not survive an air-gap transition. Coverage rides benchmarks/rigs/air-gapped-smoke.sh and benchmarks/rigs/vnet-smoke.sh. There are no remaining E2B-surface parity gaps on this page now.
TTL reaperclosede2b-compat/src/reaper.rs: tokio::spawned at startup, 10-s tick, shared kill_sandbox_internal teardown path (same as DELETE). coppice_sandboxes_reaped_total counter. Rig benchmarks/rigs/reaper-test.sh (timeout=5, wait 12 s, assert 404 + counter advanced) passes on honor.
Durable snapshot endpoint (POST /sandboxes/:id/snapshots)closed#durable (2026-04-22). Five endpoints + five CLI subcommands + rig benchmarks/rigs/snapshot-fork.sh green 7/7 on honor. Fork parity with cold create (same stand_up_vnet_jail helper); registry persisted to /var/lib/coppice/snapshots.json, reconciled against zfs list at startup. v1 preserves filesystem state (cold-starts rootfs on fork); live-memory resume stays on the bhyve path. See /appendix/durable-snapshots.
OCI template importclosed#oci-import (2026-04-22). POST /templates with {name, from: “oci:<ref>”} shells out to tools/coppice-import-oci.shbuildah pullbuildah mountrsync —numeric-ids onto a fresh ZFS dataset → zfs snapshot @base → hot-reload the TemplateRegistry. CLI: coppice tpl import-oci <name> <oci-ref>; receipt: examples/14-oci-template.sh imports quay.io/dougrabson/freebsd14-minimal and asserts freebsd-version -r inside the resulting sandbox. Linux-only OCI images import cleanly but fail at exec without linuxulator — documented. See /appendix/oci-templates.
Node & Go SDK round-tripclosed 2026-04-24Now backed by reproducible rigs, not just example directories. benchmarks/rigs/sdk-node-roundtrip.sh runs the repo-pinned Node surface (e2b@2.19.0 + @e2b/code-interpreter@2.4.0) through create/list/kill plus stateful runCode. sdk-go-roundtrip.sh runs the Go receipts against the same gateway, using github.com/xerpa-ai/e2b-go v0.1.0 for paginator coverage and raw HTTP for envd/lifecycle where the exercised local path relies on loopback debug routing instead of wildcard DNS. Latest captured transcripts live at benchmarks/results/sdk-roundtrip/latest-node.txt and latest-go.txt. See feature-audit § SDKs.

Tier 3 — genuinely still behind

Items where FreeBSD’s capability is categorically smaller than Linux+Cilium’s. Most don’t block Coppice specifically, but they’re honest gaps.

gapstatewhat closing it takes
L7 policypartialClosed on semantics, partial on intended substrate. Cilium on Linux enforces per-HTTP-request policy (method-deny, path-prefix-deny, header-deny) via its L7 proxy redirector. The FreeBSD answer is a userspace sidecar proxy in-path between the sandbox and egress — the Envoy-sidecar pattern. Envoy itself is not packaged on FreeBSD 15.0 (pkg search -x ‘^envoy’ empty; upstream Bazel build doesn’t cleanly produce a FreeBSD binary). We therefore run haproxy 3.2.15 as the sidecar, which gives the three primitives we need (method-deny, path-prefix-deny, allow-all-else) via native ACLs with no Lua required. Intended-state Envoy config retained at tools/envoy/coppice-sidecar.yaml; running haproxy config at tools/envoy/coppice-sidecar-haproxy.cfg. Measured on sbx-a (N=5): sidecar startup 10 ms (bind-to-listening), per-request latency overhead ~80 µs (58-106 µs run-to-run) on a ~460 µs direct-loopback baseline, policy reload (haproxy -sf graceful swap) 12 ms, teardown 6-7 ms. 3/3 policy-matrix probes green (GET /get=200, POST /post=403, GET /admin/secret=403). Coppice-path example: GET /foo=200, POST /bar=403. Gotcha captured in the rig: fresh VNET jails often have only inet6 ::1 on lo0, so haproxy’s default 127.0.0.1 bind fails silently — rig auto-aliases 127.0.0.1/8. Rig: benchmarks/rigs/net/l7-policy-envoy.sh. Receipt: l7-policy-envoy-2026-04-22.txt. Still open: Envoy itself (Bazel+FreeBSD port work); header-match ACLs beyond the three we ship (haproxy supports them, we just haven’t rigged them); transparent interception (currently reverse-proxy in the sidecar — curl —proxy-style forward proxying is possible via option http-use-proxy-header but adds no semantic coverage). For the agent-sandbox threat model, this gap is now functionally closed: the enforcement surface is equivalent, the numbers are measured, and the Envoy-upstreaming work is a substrate swap.
IPv6 parityclosed 2026-04-22Lab extended to dual-stack via additive lab-setup-freebsd-v6.sh: fd77::/64 ULA on coppbhyve0, fd77::2/fd77::3 on sbx-a/sbx-b, merged cube_policy anchor with cube_deny_v6 table (quick-block, mirrors the v4 shape), NAT66 on re0 (not vm-public — v6 egress interface is asymmetric from v4: honor’s global SLAAC address lives on re0 directly, vm-public has no v6; the v4 NAT-on-bridge story inverted). Measurements on the same host state: TCP_RR p50 8 µs v6 vs 8 µs v4 (tied), mean 8.46 vs 8.27 µs (+2.3%); iperf3 single-stream 6.19 Gbit/s v6 vs 7.34 Gbit/s v4 (v6 at 84% of v4, ~16% header+rule-block overhead); pfctl -a cube_policy -t cube_deny_v6 -T add fd77::3 + pfctl -k fd77::3 blocks ping6 100%, delete restores on next packet; external egress to 2606:4700:4700::1111: 23.8 ms via NAT66 vs 23.5 ms host-direct (NAT66 add-latency below ICMP sample noise). Rig: run-net-bench-v6.sh. Receipt: benchmarks/results/net-v6-2026-04-22.txt. See /appendix/ebpf-to-pf § IPv6 parity.
Linux+Cilium head-to-head numbersopenOur “Cube/eBPF typical” column is published figures, not a measured run on comparable hardware. Would close with a second honor-class box running Cilium + same rigs.
virtio-fs in FreeBSD baseacknowledgedFreeBSD 15.0 does not ship it. 9p-over-virtio is the substitute. See /essays/caveats.
bpftrace-class observabilityacknowledgedDTrace covers 80-90% but not all. See /appendix/ebpf-on-freebsd.
Browser-sandbox demo (Chromium via CDP)closedClosed 2026-04-22 after #69 unblocked VNET. Headless chromium 147 runs in a VNET jail at 10.78.0.251:9222 (socat proxy onto chromium’s [::1] listener — chromium’s —headless=new ignores —remote-debugging-address), and a pure-Python CDP client drives navigation + screenshot from the host. Playwright has no FreeBSD wheel, which sank the first attempt; the new shape uses pychrome instead. Receipt: examples/09-browser-sandbox.py, tools/coppice-browser-demo.sh, and the reference screenshot at examples/fixtures/browser-example-screenshot.png. Full writeup in /appendix/browser-sandbox.
Capsicum-wrapped envdopenEnvd today runs with jail-only confinement. Wrapping its per-request workers in cap_enter tightens the “LLM-generated Python reaches the kernel” surface. Rig: benchmarks/rigs/capsicum-envd.sh asserts open() outside the sandbox returns ECAPMODE.
Multi-node cluster overlayacknowledgedCube’s CubeMaster advertises multi-node. Coppice is single-host. FreeBSD answer is vxlan(4) or wireguard; untested. Outside the measured-single-host mandate but noted for completeness. Rig would need a second honor-class host.

Closed (for the record)

Items that were Tier 1 blockers and are now measured + documented.

gapclosed onreceipt
Durable snapshot-restore2026-04-22Built SNAPSHOT kernel + two-tier pool. bhyve-durable-prewarm-pool 17 ms cc=1. See /appendix/snapshot-cloning.
Cross-guest memory dedup2026-04-22vmm-vnode patch, N=1000 × 256 MiB in 9.1 GiB host RAM. See /appendix/vmm-vnode-patch.
E2B SDK compat (lifecycle)earlier10/10 Python SDK calls pass. /appendix/e2b-compat.
E2B envd /execute (run_code)2026-04-22NDJSON over :49999, jexec python3 backend. /appendix/run-code-protocol.
E2B persistent kernel (state across run_code)2026-04-22ipykernel spawned in-jail on sandbox create, in-jail Python bridge translates iopub → NDJSON. x = 42 / print(x) → 42 / np.array([1,2,3]) → text/plain, pandas → text/html, matplotlib → image/png, NameError → error+traceback. 7/7 checks pass via the e2b-code-interpreter Python SDK. Rig: benchmarks/rigs/jupyter-e2e.sh. /appendix/run-code-protocol.
Intra-sandbox network isolation2026-04-22Two VNET jails, pf deny-by-default anchor, 7 µs p50 TCP_RR. /appendix/ebpf-to-pf.
Policy update under traffic load2026-04-22No visible contention. At ~14 Gbit/s intra-host through coppbhyve0 (the bhyve bench rig bridge, renamed from cubenet0 in #66), per-op mutation p99 is 1.25 ms (idle 1.21 ms) and iperf3 throughput stays within run-to-run noise. pf’s table-lock is finer-grained than feared. See /appendix/ebpf-to-pf and benchmarks/rigs/net/policy-churn-under-load.sh.
Per-sandbox pf anchors at scale2026-04-22N=2000 sibling anchors, median load latency 1.51 ms, flush 1.08 ms. Flat across N=1…2000. Cross-anchor isolation + teardown verified. Sysctl: raise set limit anchors from default 512 to 4096. See /appendix/ebpf-to-pf and benchmarks/rigs/net/policy-anchor-churn.sh.
Per-sandbox VNET / distinct IP per sandbox2026-04-22#69 (steps 1–7, commits f33cd4cc072a22). Each sandbox gets its own epair pair bridged to coppicenet0 (10.78.0.0/24), its own VNET, and an IP reservation returned as sandboxIP on GET /sandboxes/:id. Replaces the earlier shared-tap-with-uid-tagging design: pf anchors are now source-IP-scoped (from <sandbox_ip>), so one sandbox’s anchor cannot match host or sibling-jail traffic on the shared bridge. Air-gapped fragment adds a pass <sandbox_ip> → 10.78.0.1 rule so bridge-gateway control-plane (DNS, metadata) stays reachable when external internet is blocked. Rigs: benchmarks/rigs/vnet-smoke.sh (11 assertions, 3 sandboxes, distinct IPs + in-jail ifconfig match + host↔jail + jail↔jail L2 + external NAT + no phantom epairs on teardown) and air-gapped-smoke.sh (4/4 green after step 7). See /appendix/vnet-jail.
External → sandbox via rdr/DNAT2026-04-22pf cube_rdr nat anchor + dnsmasq for *.coppice.lan + Go cubeproxy for the E2B-style <port>-<id>.<domain> Host-header split. LAN-peer curl to both 192.168.1.182:30001 (plain rdr) and 80-sbxa.coppice.lan:30080 (rdr → proxy) returns 200. rdr add-latency 0.24 µs p50 over 9.63 µs bare s2s; rdr rule-add flat at 1.2 ms from N=10 to N=100; cubeproxy overhead +109 µs p50 / +187 µs p99 on 96 µs direct-HTTP. See /appendix/ebpf-to-pf and benchmarks/rigs/net/ext-to-sandbox.sh.
Bridge-lifecycle integration2026-04-22Controller at tools/coppice-pool-ctl.sh. checkout allocates IP + tap (kernel auto-assigned — pinning the tap number trips make_dev_sv with EEXIST and panics the host, three reboots of learning) + per-sandbox coppice/sandbox-<id> anchor. release flushes the anchor, kills pf states, destroys the tap. N=10 end-to-end (pool-coppicenet-e2e.sh): checkout mean 21.5 ms / p95 23 ms, release mean 308 ms / p95 610 ms (release is ifconfig tap destroy-bound, not pf-bound), anchors verified 10/10, taps verified 10/10, deny-mutation enforcement confirmed, neighbor unaffected, 0 pf states leaked. Downstream bhyve wiring lives in bhyve-durable-prewarm-pool-coppicenet.sh.
Multi-stream throughput scaling2026-04-22Suspect (pf state-table contention) was wrong. Rig throughput-multistream.sh swept P = 1, 2, 4, 8, 16, 32, 64, 128 streams TCP and UDP at MTU 1500. Not a cliff — a flat noisy plateau. TCP: P=1 → 7.10 Gbit/s, P=16 → 7.00 Gbit/s, P=128 → 6.44 Gbit/s (absolute numbers ~half the quiescent 14.6 Gbit/s baseline because three sibling subagents were sharing honor during the run; the shape is what the question asked about). Monotonic sag 1→128 is ≤15%. pf state-table is nowhere near contended: high-water 180-1024 states against a 131 072-bucket hash (load factor ≤0.008); insertion rate 1-33/s (new-flow setup, expected); search rate 1.2-1.8 M/sec flat across P. Host CPU pegs at ~16-17% (= one of 16 threads) regardless of stream count — the bottleneck is the single-threaded TCP sender/receiver path, not pf. UDP confirms: iperf3 pushes 148 Gbit/s of attempted send at P=128 but 96% is dropped by a 100%-CPU receiver. Raising net.pf.states_hashsize / net.pf.source_nodes_hashsize (loader-time tunables; default 131072 / 32768) would not move the inflection because the hash is not the constrained resource — load factor is <1%. For the Coppice workload (agent sandboxes, 100s of low-bitrate flows at ~Mbit/s each), this gap does not matter: pf has three orders of magnitude of headroom on state count and searches/sec. See /appendix/ebpf-to-pf § Multi-stream scaling. Receipts: benchmarks/results/throughput-multistream-2026-04-22.txt.

Method

Every open item has three things attached to it before it closes:

  1. A script under benchmarks/rigs/, executable from a fresh clone assuming honor access.
  2. A numbers section added to the relevant appendix page, citing that script.
  3. An update to this page moving the row from open to closed with the date and a link to the receipt.

No item closes on vibes. If the measurement surprises us, the prose changes; the measurement stays.