Parity gaps — what's left

What separates a demo from a shippable thing is the thoroughness of the “remaining work” list. This is that list, kept live. Every row below is either closed (measured, documented, cross-linked) or open (blocker, has-a-plan, or acknowledged-gap). New work lands by moving a row from open to closed and adding a benchmark.

Tier 1 — parity blockers

The items whose absence would prevent a real deployment.

gap	state	what closing it takes
Sandbox → external NAT egress	closed 2026-04-22	pf `nat on vm-public from 10.77.0.0/24 to any -> (vm-public)` in the root ruleset. Gotcha: putting it on `re0` doesn’t work because `re0` is a member of the `vm-public` bridge — pf sees the packet on vm-public first, so re0-NAT never matches. From `sbx-a`: ICMP to 1.1.1.1 = 18.7 ms RTT, HTTP = 40 ms. Rig: `benchmarks/rigs/net/ext-egress.sh`.
Per-sandbox pf anchors at scale	closed 2026-04-22	Rig `policy-anchor-churn.sh` swept N = 1, 10, 100, 500, 1000, 2000. Load latency (median of 5): N=1000 → 1.43 ms, N=2000 → 1.51 ms; flush ~1.0 ms across the range. Flat — no cliff. Default FreeBSD caps `set limit anchors 512`; raising to 4096 in the root ruleset is required for N>~500 and was the only knob we had to touch (not `net.pf.request_maxcount`). Cross-anchor enforcement verified with `policy-anchor-isolation.sh`: a block rule in `cube_scale/sandbox-5` dropped sbx-a→sbx-b:32055; an unrelated block in `cube_scale/sandbox-6` did not touch 32055 (sandbox-6’s rule correctly confined to its own port 32066). Teardown (`policy-anchor-teardown.sh`): `pfctl -a cube_scale/sandbox-N -F rules` clears the anchor; states survive rule flush (expected pf semantics) so sandbox release must also call `pfctl -k <src> -k <dst>`. See Anchor scale for the full table.

Tier 2 — closes measurable gaps

Items whose absence is visible in the numbers but doesn’t block deployment.

gap	state	what closing it takes
dummynet rate limiting	closed 2026-04-22	ipfw + dummynet pipe on sbx-a’s egress (`ipfw add pipe 1 ip from 10.77.0.2 to any out`). Four caps swept, each achieving >95% of configured rate: 10 Mbit/s → 9.63, 100 Mbit/s → 96.8, 500 Mbit/s → 482, 1000 Mbit/s → 954 (baseline unshaped 6750 Mbit/s). Rule-load wall-time 1.56–2.20 ms, flat across caps. Per-sandbox isolation verified: with sbx-a piped at 100 Mbit/s, sbx-b → sbx-a still clears 7271 Mbit/s (72× the cap) because the pipe rule only matches src=10.77.0.2. Pre-load gotcha: ipfw’s default rule on stock kernels is `deny ip from any to any` — the rig sets `net.inet.ip.fw.default_to_accept=1` via kenv+sysctl before `kldload ipfw`, which is the only way to avoid locking out ssh the moment the module attaches. Bridge-side subtlety: with `net.link.bridge.pfil_member=1` (required for the pf anchor), ipfw sees bridge packets twice per direction; matching `… out` instead of unqualified halves the budget-double-charge and restores full-rate shaping. Teardown flushes the pipe + rule and unloads both modules when the rig loaded them; default-accept is reasserted before module eviction. Rig: `rate-limit-dummynet.sh`. Receipt: `benchmarks/results/rate-limit-dummynet-2026-04-22.txt`. See Measured on honor.
Observability — diagnose.sh	closed 2026-04-22	Single-command bundle at `tools/diagnose.sh`. Aggregates `pfctl -s info` + `pfctl -sr -vv` across root and every anchor under the `cube` tree (sorted by packet count, hits flagged `[HIT]`) + `pfctl -ss` state slice + `tcpdump -r /var/log/pflog` + `netstat -i` bridge/tap columns + every `cube_deny`-prefixed table’s contents. Flags filter: `—ip 10.77.0.5`, `—sandbox sandbox-5` (resolves via `coppice-pool-ctl list`), `—since 30s`. Emits hints when `pfil_member=0`, the root `cube` anchor is missing, or pf is disabled — the three enforcement holes documented in Gotchas. Demo/regression: `benchmarks/rigs/net/diagnose-demo.sh` installs a scoped `cube_demo/diagnose` anchor with a `cube_deny_demo` table, generates traffic, runs the tool, and asserts the table + rule-counter lines surface. Sample capture: `benchmarks/results/diagnose-demo-2026-04-22.txt`.
cc=50 cold start on bhyve	closed 2026-04-22	Re-sampled on `bhyve-durable-prewarm-pool` after fixing the rig’s proof-of-life poll loop (added 1 ms `sleep` between `bhyvectl —get-stats` probes — without it, N parallel pollers at cc=50 compete with the very vCPUs they’re waiting to observe). Clean numbers: cc=1 20 ms / cc=10 105 ms / cc=20 333 ms mean, p50 138, p95 791 / cc=50 995 ms mean, p50 1053, p95 1290, p99 1309. This is load-on-system physics, not a bhyve ceiling — 50 guests SIGCONT-resumed simultaneously share 16 physical threads. Earlier draft reported 2143 ms / 2903 ms for cc=50; that was the rig polling-loop competing with itself. On a 32-thread host, cc=50 would fall back toward the cc=10 band. Receipts: `bhyve-durable-prewarm-pool.sh` (rig fix) + `bhyve-cc50-2026-04-22.txt`.
Upstream-quality vmm-vnode patch	partial	Works, measured. Still needs: MI header consolidation, ATF test expansion, INVARIANTS GPF repro captured empirically. See `patches/upstream-review.md`.
SDK filesystem CRUD API (`files.read/write/list/make_dir/rename/remove/exists`)	closed 2026-04-24	`e2b-compat/src/files.rs` serves the Python SDK’s filesystem surface on the host side against the sandbox’s ZFS clone, with the same path-safety gate every route shares (absolute path required, `..` rejected, canonicalized prefix check). Connect-RPC endpoints at `/filesystem.Filesystem/{Stat,ListDir,MakeDir,Move,Remove}` plus the octet-stream read/write paths and batch/download-token helpers are live. Receipt: `examples/08-filesystem.py` round-trips write/read, list, nested mkdir, rename, remove, and traversal rejection. See /appendix/filesystem-api and feature-audit § Filesystem.
SDK filesystem watch (`files.watch`)	closed 2026-04-24	Closed with the real Connect server-streaming `/filesystem.Filesystem/WatchDir` route the SDK calls. The shipping implementation is host-side snapshot-diff polling over the jail rootfs, and the receipt is `benchmarks/rigs/files-watch.sh`: it opens the Node SDK’s `watchDir()`, mutates the jail rootfs directly from honor, and asserts create/write/rename/remove plus recursive nested-create events arrive within the configured latency budget. Latest transcript: `benchmarks/results/files-watch/latest.txt`. See /appendix/filesystem-api.
SDK commands API (`commands.run` streaming)	closed	`e2b-compat/src/commands.rs` speaks Connect-RPC at `/process.Process/{Start,Connect,List,SendInput,SendSignal,CloseStdin,Update}` with JSON codec and 5-byte envelope framing, plus an NDJSON alias at `POST /commands` for curl rigs. Spawns `jexec -l -U root e2b-<id>`, line-reads stdio into `ProcessEvent.data` envelopes, tees into the per-sandbox `LogBuffer`, surfaces kill as SIGKILL via `SendSignal`. Metrics: `coppice_commands_{started,finished,active}`. Receipt: `examples/11-commands-stream.py`. Writeup: /appendix/commands-streaming.
Per-sandbox metrics (`GET /sandboxes/:id/metrics`)	closed	Background sampler reads `rctl -h -u jail:e2b-<id>` + `zfs get used,quota` every 10 s; route returns the latest reading as a single-element array. Same numbers also land as labeled Prometheus gauges (`coppice_sandbox_cpu_percent{sandbox,template}`, `…_memory_bytes`, `…_disk_used_bytes`). Rig: `benchmarks/rigs/per-sandbox-metrics-smoke.sh`. Writeup: /appendix/per-sandbox-metrics.
Per-sandbox logs (`GET /sandboxes/:id/logs`)	closed	Per-sandbox `Arc<LogBuffer>` (bounded 1024-entry deque, drop-oldest, latching `truncated` flag). `kernel::spawn_kernel` pipes ipykernel stdout + stderr and tees each line into the ring as `source=“kernel”`. Handler serves snapshot with `?limit=N` and `?since=<rfc3339>` filters; falls back to `jexec e2b-<id> tail -n 200 /var/log/messages` (tagged `source=“syslog”`) when the buffer is empty. Per-sandbox counter `coppice_sandbox_log_lines_total`. CLI: `coppice sandbox logs <id> [—limit N] [—since 5m] [—follow] [—json]`. Rig: five `LogBuffer` unit tests plus three CLI integration tests against a mock gateway. See /appendix/per-sandbox-logs.
Live per-sandbox network update (`PUT /sandboxes/:id/network`)	closed 2026-04-24	Route is live in `e2b-compat/src/routes.rs`. The gateway now translates SDK `allow_out`/`deny_out`/`air_gapped` payloads into per-sandbox pf fragments under `coppice/sandbox-<short>`, preserves DNS reachability to `10.78.0.1`/`fd77::1`, and explicitly kills live pf states when policy flips so existing TCP/UDP/ICMP flows do not survive an air-gap transition. Coverage rides `benchmarks/rigs/air-gapped-smoke.sh` and `benchmarks/rigs/vnet-smoke.sh`. There are no remaining E2B-surface parity gaps on this page now.
TTL reaper	closed	`e2b-compat/src/reaper.rs`: `tokio::spawn`ed at startup, 10-s tick, shared `kill_sandbox_internal` teardown path (same as `DELETE`). `coppice_sandboxes_reaped_total` counter. Rig `benchmarks/rigs/reaper-test.sh` (`timeout=5`, wait 12 s, assert 404 + counter advanced) passes on honor.
Durable snapshot endpoint (`POST /sandboxes/:id/snapshots`)	closed	`#durable` (2026-04-22). Five endpoints + five CLI subcommands + rig `benchmarks/rigs/snapshot-fork.sh` green 7/7 on honor. Fork parity with cold create (same `stand_up_vnet_jail` helper); registry persisted to `/var/lib/coppice/snapshots.json`, reconciled against `zfs list` at startup. v1 preserves filesystem state (cold-starts rootfs on fork); live-memory resume stays on the bhyve path. See /appendix/durable-snapshots.
OCI template import	closed	`#oci-import` (2026-04-22). `POST /templates` with `{name, from: “oci:<ref>”}` shells out to `tools/coppice-import-oci.sh` — `buildah pull` → `buildah mount` → `rsync —numeric-ids` onto a fresh ZFS dataset → `zfs snapshot @base` → hot-reload the `TemplateRegistry`. CLI: `coppice tpl import-oci <name> <oci-ref>`; receipt: `examples/14-oci-template.sh` imports `quay.io/dougrabson/freebsd14-minimal` and asserts `freebsd-version -r` inside the resulting sandbox. Linux-only OCI images import cleanly but fail at exec without `linuxulator` — documented. See /appendix/oci-templates.
Node & Go SDK round-trip	closed 2026-04-24	Now backed by reproducible rigs, not just example directories. `benchmarks/rigs/sdk-node-roundtrip.sh` runs the repo-pinned Node surface (`e2b@2.19.0` + `@e2b/code-interpreter@2.4.0`) through create/list/kill plus stateful `runCode`. `sdk-go-roundtrip.sh` runs the Go receipts against the same gateway, using `github.com/xerpa-ai/e2b-go v0.1.0` for paginator coverage and raw HTTP for envd/lifecycle where the exercised local path relies on loopback debug routing instead of wildcard DNS. Latest captured transcripts live at `benchmarks/results/sdk-roundtrip/latest-node.txt` and `latest-go.txt`. See feature-audit § SDKs.

Tier 3 — genuinely still behind

Items where FreeBSD’s capability is categorically smaller than Linux+Cilium’s. Most don’t block Coppice specifically, but they’re honest gaps.

gap	state	what closing it takes
L7 policy	partial	Closed on semantics, partial on intended substrate. Cilium on Linux enforces per-HTTP-request policy (method-deny, path-prefix-deny, header-deny) via its L7 proxy redirector. The FreeBSD answer is a userspace sidecar proxy in-path between the sandbox and egress — the Envoy-sidecar pattern. Envoy itself is not packaged on FreeBSD 15.0 (`pkg search -x ‘^envoy’` empty; upstream Bazel build doesn’t cleanly produce a FreeBSD binary). We therefore run haproxy 3.2.15 as the sidecar, which gives the three primitives we need (method-deny, path-prefix-deny, allow-all-else) via native ACLs with no Lua required. Intended-state Envoy config retained at `tools/envoy/coppice-sidecar.yaml`; running haproxy config at `tools/envoy/coppice-sidecar-haproxy.cfg`. Measured on sbx-a (N=5): sidecar startup 10 ms (bind-to-listening), per-request latency overhead ~80 µs (58-106 µs run-to-run) on a ~460 µs direct-loopback baseline, policy reload (`haproxy -sf` graceful swap) 12 ms, teardown 6-7 ms. 3/3 policy-matrix probes green (GET /get=200, POST /post=403, GET /admin/secret=403). Coppice-path example: `GET /foo=200`, `POST /bar=403`. Gotcha captured in the rig: fresh VNET jails often have only `inet6 ::1` on `lo0`, so haproxy’s default 127.0.0.1 bind fails silently — rig auto-aliases `127.0.0.1/8`. Rig: `benchmarks/rigs/net/l7-policy-envoy.sh`. Receipt: `l7-policy-envoy-2026-04-22.txt`. Still open: Envoy itself (Bazel+FreeBSD port work); header-match ACLs beyond the three we ship (haproxy supports them, we just haven’t rigged them); transparent interception (currently reverse-proxy in the sidecar — `curl —proxy`-style forward proxying is possible via `option http-use-proxy-header` but adds no semantic coverage). For the agent-sandbox threat model, this gap is now functionally closed: the enforcement surface is equivalent, the numbers are measured, and the Envoy-upstreaming work is a substrate swap.
IPv6 parity	closed 2026-04-22	Lab extended to dual-stack via additive `lab-setup-freebsd-v6.sh`: `fd77::/64` ULA on `coppbhyve0`, `fd77::2`/`fd77::3` on sbx-a/sbx-b, merged `cube_policy` anchor with `cube_deny_v6` table (quick-block, mirrors the v4 shape), NAT66 on re0 (not vm-public — v6 egress interface is asymmetric from v4: honor’s global SLAAC address lives on re0 directly, vm-public has no v6; the v4 NAT-on-bridge story inverted). Measurements on the same host state: TCP_RR p50 8 µs v6 vs 8 µs v4 (tied), mean 8.46 vs 8.27 µs (+2.3%); iperf3 single-stream 6.19 Gbit/s v6 vs 7.34 Gbit/s v4 (v6 at 84% of v4, ~16% header+rule-block overhead); `pfctl -a cube_policy -t cube_deny_v6 -T add fd77::3` + `pfctl -k fd77::3` blocks ping6 100%, delete restores on next packet; external egress to `2606:4700:4700::1111`: 23.8 ms via NAT66 vs 23.5 ms host-direct (NAT66 add-latency below ICMP sample noise). Rig: `run-net-bench-v6.sh`. Receipt: `benchmarks/results/net-v6-2026-04-22.txt`. See /appendix/ebpf-to-pf § IPv6 parity.
Linux+Cilium head-to-head numbers	open	Our “Cube/eBPF typical” column is published figures, not a measured run on comparable hardware. Would close with a second honor-class box running Cilium + same rigs.
virtio-fs in FreeBSD base	acknowledged	FreeBSD 15.0 does not ship it. 9p-over-virtio is the substitute. See /essays/caveats.
bpftrace-class observability	acknowledged	DTrace covers 80-90% but not all. See /appendix/ebpf-on-freebsd.
Browser-sandbox demo (Chromium via CDP)	closed	Closed 2026-04-22 after `#69` unblocked VNET. Headless chromium 147 runs in a VNET jail at `10.78.0.251:9222` (socat proxy onto chromium’s `[::1]` listener — chromium’s `—headless=new` ignores `—remote-debugging-address`), and a pure-Python CDP client drives navigation + screenshot from the host. Playwright has no FreeBSD wheel, which sank the first attempt; the new shape uses `pychrome` instead. Receipt: `examples/09-browser-sandbox.py`, `tools/coppice-browser-demo.sh`, and the reference screenshot at `examples/fixtures/browser-example-screenshot.png`. Full writeup in /appendix/browser-sandbox.
Capsicum-wrapped envd	open	Envd today runs with jail-only confinement. Wrapping its per-request workers in `cap_enter` tightens the “LLM-generated Python reaches the kernel” surface. Rig: `benchmarks/rigs/capsicum-envd.sh` asserts `open()` outside the sandbox returns `ECAPMODE`.
Multi-node cluster overlay	acknowledged	Cube’s CubeMaster advertises multi-node. Coppice is single-host. FreeBSD answer is `vxlan(4)` or wireguard; untested. Outside the measured-single-host mandate but noted for completeness. Rig would need a second honor-class host.

Closed (for the record)

Items that were Tier 1 blockers and are now measured + documented.

gap	closed on	receipt
Durable snapshot-restore	2026-04-22	Built SNAPSHOT kernel + two-tier pool. `bhyve-durable-prewarm-pool` 17 ms cc=1. See /appendix/snapshot-cloning.
Cross-guest memory dedup	2026-04-22	vmm-vnode patch, N=1000 × 256 MiB in 9.1 GiB host RAM. See /appendix/vmm-vnode-patch.
E2B SDK compat (lifecycle)	earlier	10/10 Python SDK calls pass. /appendix/e2b-compat.
E2B envd /execute (run_code)	2026-04-22	NDJSON over `:49999`, `jexec python3` backend. /appendix/run-code-protocol.
E2B persistent kernel (state across `run_code`)	2026-04-22	ipykernel spawned in-jail on sandbox create, in-jail Python bridge translates iopub → NDJSON. `x = 42` / `print(x) → 42` / `np.array([1,2,3]) → text/plain`, pandas → `text/html`, matplotlib → `image/png`, NameError → error+traceback. 7/7 checks pass via the `e2b-code-interpreter` Python SDK. Rig: `benchmarks/rigs/jupyter-e2e.sh`. /appendix/run-code-protocol.
Intra-sandbox network isolation	2026-04-22	Two VNET jails, pf deny-by-default anchor, 7 µs p50 TCP_RR. /appendix/ebpf-to-pf.
Policy update under traffic load	2026-04-22	No visible contention. At ~14 Gbit/s intra-host through `coppbhyve0` (the bhyve bench rig bridge, renamed from `cubenet0` in #66), per-op mutation p99 is 1.25 ms (idle 1.21 ms) and iperf3 throughput stays within run-to-run noise. pf’s table-lock is finer-grained than feared. See /appendix/ebpf-to-pf and `benchmarks/rigs/net/policy-churn-under-load.sh`.
Per-sandbox pf anchors at scale	2026-04-22	N=2000 sibling anchors, median load latency 1.51 ms, flush 1.08 ms. Flat across N=1…2000. Cross-anchor isolation + teardown verified. Sysctl: raise `set limit anchors` from default 512 to 4096. See /appendix/ebpf-to-pf and `benchmarks/rigs/net/policy-anchor-churn.sh`.
Per-sandbox VNET / distinct IP per sandbox	2026-04-22	`#69` (steps 1–7, commits `f33cd4c`…`c072a22`). Each sandbox gets its own epair pair bridged to `coppicenet0` (10.78.0.0/24), its own VNET, and an IP reservation returned as `sandboxIP` on `GET /sandboxes/:id`. Replaces the earlier shared-tap-with-uid-tagging design: pf anchors are now source-IP-scoped (`from <sandbox_ip>`), so one sandbox’s anchor cannot match host or sibling-jail traffic on the shared bridge. Air-gapped fragment adds a `pass <sandbox_ip> → 10.78.0.1` rule so bridge-gateway control-plane (DNS, metadata) stays reachable when external internet is blocked. Rigs: `benchmarks/rigs/vnet-smoke.sh` (11 assertions, 3 sandboxes, distinct IPs + in-jail ifconfig match + host↔jail + jail↔jail L2 + external NAT + no phantom epairs on teardown) and `air-gapped-smoke.sh` (4/4 green after step 7). See /appendix/vnet-jail.
External → sandbox via rdr/DNAT	2026-04-22	pf `cube_rdr` nat anchor + dnsmasq for `*.coppice.lan` + Go `cubeproxy` for the E2B-style `<port>-<id>.<domain>` Host-header split. LAN-peer curl to both `192.168.1.182:30001` (plain rdr) and `80-sbxa.coppice.lan:30080` (rdr → proxy) returns 200. rdr add-latency 0.24 µs p50 over 9.63 µs bare s2s; rdr rule-add flat at 1.2 ms from N=10 to N=100; cubeproxy overhead +109 µs p50 / +187 µs p99 on 96 µs direct-HTTP. See /appendix/ebpf-to-pf and `benchmarks/rigs/net/ext-to-sandbox.sh`.
Bridge-lifecycle integration	2026-04-22	Controller at `tools/coppice-pool-ctl.sh`. `checkout` allocates IP + tap (kernel auto-assigned — pinning the tap number trips `make_dev_sv` with EEXIST and panics the host, three reboots of learning) + per-sandbox `coppice/sandbox-<id>` anchor. `release` flushes the anchor, kills pf states, destroys the tap. N=10 end-to-end (`pool-coppicenet-e2e.sh`): checkout mean 21.5 ms / p95 23 ms, release mean 308 ms / p95 610 ms (release is `ifconfig tap destroy`-bound, not pf-bound), anchors verified 10/10, taps verified 10/10, deny-mutation enforcement confirmed, neighbor unaffected, 0 pf states leaked. Downstream bhyve wiring lives in `bhyve-durable-prewarm-pool-coppicenet.sh`.
Multi-stream throughput scaling	2026-04-22	Suspect (pf state-table contention) was wrong. Rig `throughput-multistream.sh` swept P = 1, 2, 4, 8, 16, 32, 64, 128 streams TCP and UDP at MTU 1500. Not a cliff — a flat noisy plateau. TCP: P=1 → 7.10 Gbit/s, P=16 → 7.00 Gbit/s, P=128 → 6.44 Gbit/s (absolute numbers ~half the quiescent 14.6 Gbit/s baseline because three sibling subagents were sharing honor during the run; the shape is what the question asked about). Monotonic sag 1→128 is ≤15%. pf state-table is nowhere near contended: high-water 180-1024 states against a 131 072-bucket hash (load factor ≤0.008); insertion rate 1-33/s (new-flow setup, expected); search rate 1.2-1.8 M/sec flat across P. Host CPU pegs at ~16-17% (= one of 16 threads) regardless of stream count — the bottleneck is the single-threaded TCP sender/receiver path, not pf. UDP confirms: iperf3 pushes 148 Gbit/s of attempted send at P=128 but 96% is dropped by a 100%-CPU receiver. Raising `net.pf.states_hashsize` / `net.pf.source_nodes_hashsize` (loader-time tunables; default 131072 / 32768) would not move the inflection because the hash is not the constrained resource — load factor is <1%. For the Coppice workload (agent sandboxes, 100s of low-bitrate flows at ~Mbit/s each), this gap does not matter: pf has three orders of magnitude of headroom on state count and searches/sec. See /appendix/ebpf-to-pf § Multi-stream scaling. Receipts: `benchmarks/results/throughput-multistream-2026-04-22.txt`.

Method

Every open item has three things attached to it before it closes:

A script under benchmarks/rigs/, executable from a fresh clone assuming honor access.
A numbers section added to the relevant appendix page, citing that script.
An update to this page moving the row from open to closed with the date and a link to the receipt.

No item closes on vibes. If the measurement surprises us, the prose changes; the measurement stays.