Coppice’s jails used to run with ip4=inherit, sharing
honor’s host IP. That was enough for the envd surface (the gateway
owns those listeners) but it collapses the moment a user wants to
expose anything from inside the jail — a Jupyter web
frontend, a user web app, Chromium’s CDP on :9222. It also made pf
filtering a uid/tag exercise rather than a subnet one. Step 3 of
#69 moved to VNET jails on a
dedicated bridge, and this page is the map of what changed.
Why we moved off ip4=inherit
Three problems, one switch.
First, listener collisions. Two sandboxes both wanting
:9222 for CDP can’t have it; they share the host’s IP,
and the kernel’s socket table is flat. Any serious browser-sandbox
story needs per-jail addressability, full stop.
Second, pf rule gymnastics. A pass quick from any
rule inside cube/sandbox-<short> matched all
traffic leaving the host IP — not just this sandbox’s — because from
pf’s point of view every jail was the host. Disambiguating meant
tagging by uid (the per-sandbox jail uid), which works, but is
fiddly and couples pf semantics to the jail backend. Source IPs are
what pf wants to filter on, and the new setup gives it one.
Third, browser-sandbox (tracked as #60) is
explicitly blocked without this. Playwright connects to a CDP
WebSocket and expects to reach it at a stable address; no address,
no browser sandbox.
Subnet plan
Two disjoint /24s on honor:
- 10.77.0.0/24 — pre-existing bhyve microVM pool,
on
cubenet0. Untouched by this work. - 10.78.0.0/24 — new per-sandbox VNET jails, on
coppicenet0. Gateway10.78.0.1/24sits on the bridge. Allocator hands out.10through.250(.0network,.1gateway,.2–.9and.251–.255reserved for infrastructure). 240 concurrent sandboxes before the pool is exhausted — enough for current scope, easy to widen to a /23 later.
The split is deliberate. bhyve rigs keep running on their own
bridge with their own address plan; jails get their own space; no
address reuse means no confused pf matches. The allocator lives in
e2b-compat/src/ipalloc.rs and is fresh every gateway
process.
Per-sandbox lifecycle
On POST /sandboxes, the backend:
- Allocates an IP from the pool (
IpAllocator::allocate). - Creates an epair pair —
ifconfig epair create, which returnsepair<N>aandepair<N>bas a pair. - Adds the a-end to
coppicenet0on the host. - Launches the jail with
vnet=newandvnet.interface=epair<N>b, which hands the b-end into the jail’s fresh vnet. - The jail’s
exec.start(notexec.prestart— that’s a landmine) assigns the IP inside the jail and addsroute add default 10.78.0.1.
The exec.prestart vs exec.start distinction
matters enough to spell out. exec.prestart runs on the
host, before the vnet hand-off; the b-end isn’t in the jail’s stack
yet, so any ifconfig or route commands
poke the host’s routing table. On a host that already has a default
route, route add default hits “File exists” and the
jail comes up half-configured. exec.start runs inside
the jail after hand-off, against a clean empty route table, and
does the right thing.
Teardown reverses in the order that matters: jail -r
first (which returns the b-end to the host), then
ifconfig <a> destroy (which also removes the
b-end, since epair pairs die together). The a-end name is stashed
in a map keyed by sandbox id at create time so teardown can find
it.
pf anchor semantics change
With ip4=inherit, a sandbox’s anchor rules read
from any and relied on anchor scoping to not bleed.
With VNET, every rule is source-IP-scoped:
block quick from 10.78.0.42 to <sandbox_deny>
pass quick from 10.78.0.42 to <sandbox_allow>
The air-gapped fragment’s terminal rule goes from
block quick all to
block quick from 10.78.0.42 to any. Critically,
lo0 inside the anchor now means the jail’s
loopback, not the host’s — each VNET has its own. That’s the clean
semantics we always wanted: in-jail services reach each other on
127.0.0.1 without accidentally reaching host services on the same
address.
The coppicenet0 bridge bring-up
Bridge creation plus NAT is handled by
tools/coppice-net-setup.sh — idempotent,
dead-man-switched. It:
- Creates the bridge (if missing) with
10.78.0.1/24. - Appends
nat-anchor "cube/*"(at top of the root ruleset, before filter rules — pf requires this ordering) andanchor "cube/*"(with the other filter anchors) to the existing root pf. Siblings likecube_policyandcube/sandbox-*are preserved. - Loads a NAT rule into
cube/jail-nat:nat on vm-public from 10.78.0.0/24 to any -> (vm-public).
vm-public, not re0. Honor’s root pf has
set skip on re0 — meaning re0 is
invisible to pf. NAT out re0 would never fire. The
vm-public interface is the bhyve bridge that actually
carries uplink traffic, and that’s where the NAT rule lives.
The dead-man switch is the usual daemon(8) pattern:
reload rules now, schedule a revert after N seconds, cancel the
revert once the smoke test passes. If the script crashes before
the smoke test, root pf reverts to its previous state and ssh
stays alive.
Startup reconstitution is punted
If the gateway restarts with live sandboxes still running, the
IpAllocator forgets which IPs are in use. A fresh
sandbox can in principle draw an IP that’s already live on an
existing jail. The code path (reconstitute_ip_reservations)
is stubbed in freebsd_jail.rs with a TODO and a sketch
of what it needs to do: jls for jail names,
ifconfig coppicenet0 for a-end members, map those to
b-end IPs inside each jail, re-register with the allocator.
In practice this rarely bites: gateway restarts typically tear sandboxes down as a side effect, and when they don’t, the next collision surfaces fast. It’s a correctness gap, not an operational one, and it stays on the list.
DNS via local_unbound on the bridge gateway
With VNET the jail’s /etc/resolv.conf is inherited from
honor at clone time. Honor’s is nameserver 127.0.0.1 —
fine for honor (its own local_unbound listens there), but
inside a VNET jail that 127.0.0.1 is the jail’s empty
loopback, not the host’s resolver. DNS just dies.
Fix: run local_unbound (FreeBSD base, no ports) on the
bridge gateway 10.78.0.1:53 as well as 127.0.0.1,
and bake nameserver 10.78.0.1 into the jail-template
ZFS datasets so every future clone inherits the right answer.
tools/coppice-net-setup.sh drops a sentinel-wrapped
/var/unbound/conf.d/coppice.conf that adds the bridge
interface plus access-control: 10.78.0.0/24 allow, and
rewrites forward.conf to forward . to
1.1.1.1 and 9.9.9.9 while preserving any
LAN-scoped forward-zone blocks (so honor’s own
host mybox.lan keeps working).
The split-binding is dead-man-by-construction: the host’s own
resolver path is unchanged, so an unbound crash or misconfig won’t
take honor’s DNS with it — worst case, new sandboxes can’t resolve
until the service recovers. Air-gap compatibility rides free on
step 7’s pass quick from <ip> to 10.78.0.1 rule
— DNS queries are just more bridge-gateway traffic.
Adding interface: to unbound requires a full
service local_unbound restart, not a reload; the script
detects this and restarts only when a new interface needs to be
bound.
Air-gapped fragment learns to pass the gateway
The air-gapped fragment (see air-gapped)
installs a blanket block quick from <ip> to any
as the terminal rule. Under ip4=inherit this worked
because pass quick on lo0 covered gateway traffic —
the sandbox and gateway both lived on the host’s loopback. Under
VNET, the gateway is at 10.78.0.1 from the sandbox’s
point of view, which isn’t loopback and isn’t covered by the
other pass rules. A follow-up commit in step 7 added
pass quick from <ip> to 10.78.0.1 between the
loopback pass and the DNS allowlist, so air-gapped sandboxes stay
reachable for envd, metadata, and any other host-side control-plane
service bound on the bridge. “Air-gapped” means no external
internet, not no gateway.
Summary
| component | value | note |
|---|---|---|
| bhyve pool subnet | 10.77.0.0/24 on cubenet0 | Unchanged by #69. Disjoint from jail subnet by design. |
| jail subnet | 10.78.0.0/24 on coppicenet0 | New in #69 step 3. |
| bridge gateway | 10.78.0.1/24 | Configured on coppicenet0; jail’s default route points here. |
| allocator range | 10.78.0.10 – 10.78.0.250 | IpAllocator in e2b-compat/src/ipalloc.rs. 240 concurrent sandboxes. |
| NAT anchor | cube/jail-nat | nat on vm-public from 10.78.0.0/24 to any -> (vm-public). Not re0 — pf skips re0. |
| root pf hooks | nat-anchor “cube/*” + anchor “cube/*“ | Installed by tools/coppice-net-setup.sh, dead-man-switched. |
| per-sandbox anchor shape | from 10.78.0.<M> | Rules source-IP-scoped. lo0 now means jail’s own loopback. |
| DNS resolver | local_unbound on 10.78.0.1:53 | Base-system unbound, also listens on 127.0.0.1 so honor’s own DNS is untouched. Forwards . to 1.1.1.1 + 9.9.9.9. Template /etc/resolv.conf points at 10.78.0.1. |
| startup reconstitution | punted | TODO in freebsd_jail.rs; gateway restart with live jails may double-allocate. |
What this unblocks
- Chromium CDP on :9222. The browser-sandbox story
(
#60) was waiting for per-jail addressability. Playwright can now connect tows://10.78.0.<M>:9222directly from the host, or viatools/coppiceproxyfor the<port>-<id>.<domain>case. - Arbitrary in-jail listener ports. User web apps, Jupyter classic frontends, anything the sandbox wants to expose — all routable now, either by IP or through the L7 splitter.
- Cleaner per-sandbox pf posture. Rules read
from <ip>rather thanfrom any-plus-uid-tag. Easier to audit, easier to reason about. - Cross-host VXLAN (future). The bridge is the
seam. Wrap
coppicenet0in a VXLAN and sandboxes on two honors share a flat /24. Not on the roadmap yet, but the shape is right.
Cross-refs: wildcard DNS for the SDK-routing side of the story (gateway still owns envd, regardless of per-jail IPs), air-gapped for the pf fragment that uses the new source-IP scoping, and eBPF → pf for the broader pf-as-policy story.