The jail backend is fast and cheap and has been the gateway’s only substrate since the repo existed. The bhyve backend is the second, wired up after the Session A substrate work landed: same REST surface, same SDK shape, different isolation boundary. This appendix is the editorial companion to that wiring — why two backends co-exist, how the gateway picks between them, and what the v1 shim does and doesn’t do.
Why two backends
Jails are the right answer for most of the demo surface. They start
in tens of milliseconds, share a kernel, and the existing VNET/pf
plumbing puts every sandbox on its own 10.78.0.X address with
full-stack network isolation. For code-interpreter workloads — run
a cell, read a file, kill — the isolation ceiling jails offer is
plenty.
bhyve is the answer when the threat model demands a full guest kernel. The vmm-vnode patch lets one host hold a thousand 256 MiB microVMs without the memory arithmetic falling apart, and the pool-ctl substrate (see /appendix/bhyve-substrate) already keeps a handful of SIGSTOPped guests ready to go. The gateway’s job is to hand one of those pool slots to an SDK caller on demand and SSH into it when the caller asks to run something.
Both backends serve the same Backend trait. AppState stashes one
Arc<dyn Backend> for the jail path (always present) and an
Option<Arc<dyn Backend>> for the bhyve path (present only when the
operator opts in). Per-call dispatch picks the right one via a
backend_for(&template) helper that consults the template registry.
Template flavouring
The registry grows two discovery passes. The jail pass is unchanged:
every directory matching <templates_root>/<name>-template becomes a
BackendKind::Jail entry. The bhyve pass is added when the operator
passes --bhyve-templates-root <path> (default unset): every file at
<path>/<name>.img becomes a BackendKind::Bhyve entry. The honor
box has /vms/templates/python-bhyve.img; dev laptops leave the flag
unset and the bhyve pass is a no-op.
If a name collides across the two passes — say someone builds a
python-bhyve-template jail and a python-bhyve.img image — the
bhyve entry wins. This is an operator mistake, not a data condition;
the bhyve substrate is the higher-fidelity path so we’d rather
surface the microVM.
GET /templates serialises the backend field alongside name and
path, so coppice tpl list shows it and the demo-portal UI can key
off it. The registry is otherwise unchanged — hot-reload
(POST /templates/reload) rescans both roots.
Dispatch
POST /sandboxes {templateID: "python-bhyve"} lands on the same
create_sandbox handler as before. The handler calls
state.backend_for(&req.template_id), which returns the right
Arc<dyn Backend> based on the registry’s backend field, and then
calls create_with_limits on it. From there down it’s the backend’s
problem: the jail backend does its ZFS clone + epair + jail -c
dance; the bhyve backend shells out to
/usr/local/sbin/coppice-bhyve-pool-ctl.sh checkout <template>,
parses the one-line JSON, and returns the allocated 10.77.0.X IP.
DELETE /sandboxes/:id, /exec, /pause, /resume, and /connect
all route through the same backend_for helper using the live
sandbox’s template_id. Snapshot / fork paths stay on the jail
backend — bhyve v1 doesn’t implement them and the HTTP layer maps
those calls to the same 501-style Other("…not supported on bhyve backend v1") response every other unsupported-on-bhyve op returns.
The SSH-exec shim
Bhyve guests don’t host the gateway’s envd (a fully-fledged in-guest gRPC service is future work). In v1 the gateway reaches in over SSH:
ssh -i /root/coppice-signing/pool-key \
-o StrictHostKeyChecking=no \
-o UserKnownHostsFile=/dev/null \
-o ConnectTimeout=5 \
-o BatchMode=yes \
root@10.77.0.<N> -- <escaped cmd>
The pool image bakes the public half of the key into
/root/.ssh/authorized_keys and sets PerSourcePenalties no in
sshd_config — without that second edit OpenSSH 9.8’s
per-source-ip lockout trips on the probe loop the smoke rig hammers
during warm. Commands are single-quote-escaped before being
concatenated; the pool template’s /bin/sh handles the rest.
Cost: one SSH connect per /exec, somewhere between 20 and 60 ms
depending on whether the guest has already warmed its host key cache.
Acceptable for demo-portal latency; comfortably below the 147 ms
the REST → ready path spends on checkout.
The numbers
Two wall times to keep straight:
- Checkout → SSH-ready: 147 ms on honor. This is what the pool
substrate reports from
coppice-bhyve-pool-ctl.sh checkout— the gateway’sPOST /sandboxesadds one shell-out plus whatever the SDK round-trip costs, so end-to-end REST create → SSH works is ~180 ms. - vCPU resume: 17 ms, reported in snapshot-cloning. That’s the bhyve kernel-level SIGCONT → vCPU-runtime-advances number, and it’s a lower bound on anything the substrate can do. The gap between 17 ms and 147 ms is SSH handshake plus readiness probes.
coppice_bhyve_checkout_ns_sum (divided by
coppice_bhyve_checkouts_total) gives the running mean in
production. coppice_bhyve_pool_available{template} and
coppice_bhyve_pool_in_use{template} are refreshed every 10 s from
pool-ctl list --json.
What this v1 doesn’t do
pause/resume. The pool already SIGSTOPs warm entries and SIGCONTs on checkout. A second freeze layer gets us nothing; both methods returnOther("not supported on bhyve backend v1 …").snapshot/fork/delete_snapshot. Needs per-entry ZFS snapshots on the pool’s image directory — Session C on the substrate side. For v1 the jail path is the only durable-snapshot surface.- Template signing. The jail backend verifies a
signify(1)signature over the snapshot’s ZFS guid before cloning (see /appendix/image-signing); bhyve templates are raw images and have no equivalent gate in v1. TheCOPPICE_REQUIRE_SIGNED_TEMPLATES=1environment knob still applies to jails and only jails. - CPU / memory limits. The pool-entry’s CPU and memory are baked
into the image (
bhyve -c N -m XG). The HTTP layer acceptscpuCount/memoryMBon create but the bhyve backend ignores them for now. The jail backend honours them viarctl(8)as before. - Per-jail
ifconfig lo0 upand friends. The pool image handles interface bring-up internally on boot; the gateway doesn’t touch guest-side networking.
Pool lifecycle
Warm happens at gateway startup when
—bhyve-pool-size N is set. The gateway picks the first
bhyve template the registry lists, shells out to
coppice-bhyve-pool-ctl.sh warm <tpl> —count N, and
blocks startup until it returns. N=2 is ~15 s on honor. Operators
who want multiple templates warm at boot should run additional
POST /pool/<name>/warm (or the shell equivalent)
after startup — the gateway’s startup warm is a default, not a
policy.
Drain happens on SIGINT (Ctrl-C). The signal handler walks every
bhyve template in the registry and runs … drain <tpl>
best-effort before exit. A second Ctrl-C short-circuits.
Cross-references
- /appendix/bhyve-substrate — the pool-ctl script, template image build, and the 147 ms receipt on honor.
- /appendix/snapshot-cloning — the kernel-level 17 ms number this sits on top of.
- /appendix/image-signing — what the jail backend does that bhyve v1 doesn’t.
e2b-compat/src/backend/bhyve.rs— the code.e2b-compat/src/state.rs::AppState::backend_for— the per-call dispatcher.