The jail backend is fast and cheap and has been the gateway’s only substrate since the repo existed. The bhyve backend is the second, wired up after the Session A substrate work landed: same REST surface, same SDK shape, different isolation boundary. This appendix is the editorial companion to that wiring — why two backends co-exist, how the gateway picks between them, and what the v1 shim does and doesn’t do.
Why two backends
Jails are the right answer for most of the demo surface. They start
in tens of milliseconds, share a kernel, and the existing VNET/pf
plumbing puts every sandbox on its own 10.78.0.X address with
full-stack network isolation. For code-interpreter workloads — run
a cell, read a file, kill — the isolation ceiling jails offer is
plenty.
bhyve is the answer when the threat model demands a full guest kernel. The vmm-vnode patch lets one host hold a thousand 256 MiB microVMs without the memory arithmetic falling apart, and the pool-ctl substrate (see /appendix/bhyve-substrate) already keeps a handful of SIGSTOPped guests ready to go. The gateway’s job is to hand one of those pool slots to an SDK caller on demand and SSH into it when the caller asks to run something.
Both backends serve the same Backend trait. AppState stashes one
Arc<dyn Backend> for the jail path (always present) and an
Option<Arc<dyn Backend>> for the bhyve path (present only when the
operator opts in). Per-call dispatch picks the right one via a
backend_for(&template) helper that consults the template registry.
Inside bhyve, that backend now serves two guest shapes:
- SSH guests — the original
python-bhyvepath. Create returns a guest IP on10.77.0.0/24and the gateway reaches the guest over SSH for/execand websocket shell traffic. - Host-console guests — the current Windows path. Create returns no guest IP up front and instead attaches host-console metadata (VNC host/port) to the sandbox record so the portal can treat it as a framebuffer console rather than a normal in-guest desktop server.
Template flavouring
The registry grows two discovery passes. The jail pass is unchanged:
every directory matching <templates_root>/<name>-template becomes a
BackendKind::Jail entry. The bhyve pass is added when the operator
passes --bhyve-templates-root <path> (default unset): every file at
<path>/<name>.img becomes a BackendKind::Bhyve entry. The honor
box has /vms/templates/python-bhyve.img; dev laptops leave the flag
unset and the bhyve pass is a no-op.
If a name collides across the two passes — say someone builds a
python-bhyve-template jail and a python-bhyve.img image — the
bhyve entry wins. This is an operator mistake, not a data condition;
the bhyve substrate is the higher-fidelity path so we’d rather
surface the microVM.
GET /templates serialises the backend field alongside name and
path, so coppice tpl list shows it and the demo-portal UI can key
off it. The registry is otherwise unchanged — hot-reload
(POST /templates/reload) rescans both roots.
Dispatch
POST /sandboxes {templateID: "python-bhyve"} or
{templateID: "windows-server-2025"} lands on the same
create_sandbox handler as before. The handler calls
state.backend_for(&req.template_id), which returns the right
Arc<dyn Backend> based on the registry’s backend field, and then
calls create_with_limits on it. From there down it’s the backend’s
problem: the jail backend does its ZFS clone + epair + jail -c
dance; the bhyve backend reads the template’s .conf sidecar and
chooses between two subpaths:
gateway_mode=sshshells out to/usr/local/sbin/coppice-bhyve-pool-ctl.sh checkout <template>, parses the one-line JSON, and returns the allocated10.77.0.XIP.gateway_mode=host-vncshells out to/usr/local/sbin/coppice-bhyve-pool-ctl.sh console-start <template> <sandbox-id>, parses the returned VNC host/port, and stores that undercoppice_console_vnc_*metadata on the sandbox.
DELETE /sandboxes/:id, /exec, /pause, /resume, and /connect
all route through the same backend_for helper using the live
sandbox’s template_id. Pause/resume is owner-preserving: SSH guests
call pool-ctl pause/resume <entry-id>, host-console guests call
console-pause/console-resume <sandbox-id>, and the pool keeps paused
entries out of checkout by marking them paused rather than
available. Snapshot / fork paths stay on the jail backend — bhyve v1
doesn’t implement them and the HTTP layer maps those calls to the same
501-style Other("…not supported on bhyve backend v1") response every
other unsupported-on-bhyve op returns.
That metadata merge is the key UI bridge. SandboxCreated.ip is now
optional; host-console guests leave it unset and use metadata instead,
which the routes layer persists so GET /sandboxes and
GET /sandboxes/:id can hand it back to the portal.
The SSH-exec shim (SSH-mode guests)
Bhyve guests don’t host the gateway’s envd (a fully-fledged in-guest gRPC service is future work). In v1 the gateway reaches in over SSH:
ssh -i /root/coppice-signing/pool-key \
-o StrictHostKeyChecking=no \
-o UserKnownHostsFile=/dev/null \
-o ConnectTimeout=5 \
-o BatchMode=yes \
<template-user>@10.77.0.<N> -- <escaped cmd>
The pool image bakes the public half of the key into the guest’s SSH
trust path and sets PerSourcePenalties no in sshd_config — without
that second edit OpenSSH 9.8’s per-source-ip lockout trips on the
probe loop the smoke rig hammers during warm. The login user now comes
from the template’s optional .conf sidecar rather than being
hardcoded to root. Commands are single-quote-escaped before being
concatenated; the guest shell handles the rest.
Cost: one SSH connect per /exec, somewhere between 20 and 60 ms
depending on whether the guest has already warmed its host key cache.
Acceptable for demo-portal latency; comfortably below the 147 ms
the REST → ready path spends on checkout.
Host-console guests do not use this path. For
gateway_mode=host-vnc, /exec and websocket shell startup return a
clear “host-console bhyve guest” error until in-guest bootstrap exists.
That is deliberate: a Windows console guest that does not yet have
OpenSSH enabled should not pretend the SSH shim can reach it.
The numbers
Two wall times to keep straight:
- Checkout → SSH-ready: 147 ms on honor. This is what the pool
substrate reports from
coppice-bhyve-pool-ctl.sh checkout— the gateway’sPOST /sandboxesadds one shell-out plus whatever the SDK round-trip costs, so end-to-end REST create → SSH works is ~180 ms. - vCPU resume: 17 ms, reported in snapshot-cloning. That’s the bhyve kernel-level SIGCONT → vCPU-runtime-advances number, and it’s a lower bound on anything the substrate can do. The gap between 17 ms and 147 ms is SSH handshake plus readiness probes.
coppice_bhyve_checkout_ns_sum (divided by
coppice_bhyve_checkouts_total) gives the running mean in
production. coppice_bhyve_pool_available{template} and
coppice_bhyve_pool_in_use{template} are refreshed every 10 s from
pool-ctl list --json. Entries in paused state count as in-use
capacity, not available capacity.
What this v1 doesn’t do
- Durable pause across host reboot. Gateway pause/resume is
process-level
SIGSTOP/SIGCONT, the same live-memory hot-tier mechanism as checkout. It preserves guest RAM only while the host stays up. Durable bhyve checkpoints remain thebhyvectl --suspendpath documented under snapshot-cloning. snapshot/fork/delete_snapshot. Needs per-entry ZFS snapshots on the pool’s image directory — Session C on the substrate side. For v1 the jail path is the only durable-snapshot surface.- Template signing. The jail backend verifies a
signify(1)signature over the snapshot’s ZFS guid before cloning (see /appendix/image-signing); bhyve templates are raw images and have no equivalent gate in v1. TheCOPPICE_REQUIRE_SIGNED_TEMPLATES=1environment knob still applies to jails and only jails. - CPU / memory limits. The pool-entry’s CPU and memory are baked
into the image (
bhyve -c N -m XG). The HTTP layer acceptscpuCount/memoryMBon create but the bhyve backend ignores them for now. The jail backend honours them viarctl(8)as before. - Host-console promotion into the SSH pool. Windows can launch from
/ui/today, but it is not yet awarm/checkout/returnguest because readiness is still gated on SSH. That becomes real only after the guest installs OpenSSH and consumes the per-clone seed disk. - Per-jail
ifconfig lo0 upand friends. The pool image handles interface bring-up internally on boot; the gateway doesn’t touch guest-side networking.
Pool lifecycle
Warm happens at gateway startup when
—bhyve-pool-size N is set. The gateway picks the first
bhyve template the registry lists, shells out to
coppice-bhyve-pool-ctl.sh warm <tpl> —count N, and
blocks startup until it returns. N=2 is ~15 s on honor. Operators
who want multiple templates warm at boot should run additional
POST /pool/<name>/warm (or the shell equivalent)
after startup — the gateway’s startup warm is a default, not a
policy.
Drain happens on SIGINT (Ctrl-C). The signal handler walks every
bhyve template in the registry and runs … drain <tpl>
best-effort before exit. A second Ctrl-C short-circuits.
Cross-references
- /appendix/bhyve-substrate — the pool-ctl script, template image build, and the 147 ms receipt on honor.
- /appendix/windows-bhyve-template — the current Windows Server 2025 host-console path.
- /appendix/snapshot-cloning — the kernel-level 17 ms number this sits on top of.
- /appendix/image-signing — what the jail backend does that bhyve v1 doesn’t.
e2b-compat/src/backend/bhyve.rs— the code.e2b-compat/src/state.rs::AppState::backend_for— the per-call dispatcher.