bhyve backend

The jail backend is fast and cheap and has been the gateway’s only substrate since the repo existed. The bhyve backend is the second, wired up after the Session A substrate work landed: same REST surface, same SDK shape, different isolation boundary. This appendix is the editorial companion to that wiring — why two backends co-exist, how the gateway picks between them, and what the v1 shim does and doesn’t do.

Why two backends

Jails are the right answer for most of the demo surface. They start in tens of milliseconds, share a kernel, and the existing VNET/pf plumbing puts every sandbox on its own 10.78.0.X address with full-stack network isolation. For code-interpreter workloads — run a cell, read a file, kill — the isolation ceiling jails offer is plenty.

bhyve is the answer when the threat model demands a full guest kernel. The vmm-vnode patch lets one host hold a thousand 256 MiB microVMs without the memory arithmetic falling apart, and the pool-ctl substrate (see /appendix/bhyve-substrate) already keeps a handful of SIGSTOPped guests ready to go. The gateway’s job is to hand one of those pool slots to an SDK caller on demand and SSH into it when the caller asks to run something.

Both backends serve the same Backend trait. AppState stashes one Arc<dyn Backend> for the jail path (always present) and an Option<Arc<dyn Backend>> for the bhyve path (present only when the operator opts in). Per-call dispatch picks the right one via a backend_for(&template) helper that consults the template registry.

Inside bhyve, that backend now serves two guest shapes:

SSH guests — the original python-bhyve path. Create returns a guest IP on 10.77.0.0/24 and the gateway reaches the guest over SSH for /exec and websocket shell traffic.
Host-console guests — the current Windows path. Create returns no guest IP up front and instead attaches host-console metadata (VNC host/port) to the sandbox record so the portal can treat it as a framebuffer console rather than a normal in-guest desktop server.

Template flavouring

The registry grows two discovery passes. The jail pass is unchanged: every directory matching <templates_root>/<name>-template becomes a BackendKind::Jail entry. The bhyve pass is added when the operator passes --bhyve-templates-root <path> (default unset): every file at <path>/<name>.img becomes a BackendKind::Bhyve entry. The honor box has /vms/templates/python-bhyve.img; dev laptops leave the flag unset and the bhyve pass is a no-op.

If a name collides across the two passes — say someone builds a python-bhyve-template jail and a python-bhyve.img image — the bhyve entry wins. This is an operator mistake, not a data condition; the bhyve substrate is the higher-fidelity path so we’d rather surface the microVM.

GET /templates serialises the backend field alongside name and path, so coppice tpl list shows it and the demo-portal UI can key off it. The registry is otherwise unchanged — hot-reload (POST /templates/reload) rescans both roots.

Dispatch

POST /sandboxes {templateID: "python-bhyve"} or {templateID: "windows-server-2025"} lands on the same create_sandbox handler as before. The handler calls state.backend_for(&req.template_id), which returns the right Arc<dyn Backend> based on the registry’s backend field, and then calls create_with_limits on it. From there down it’s the backend’s problem: the jail backend does its ZFS clone + epair + jail -c dance; the bhyve backend reads the template’s .conf sidecar and chooses between two subpaths:

gateway_mode=ssh shells out to /usr/local/sbin/coppice-bhyve-pool-ctl.sh checkout <template>, parses the one-line JSON, and returns the allocated 10.77.0.X IP.
gateway_mode=host-vnc shells out to /usr/local/sbin/coppice-bhyve-pool-ctl.sh console-start <template> <sandbox-id>, parses the returned VNC host/port, and stores that under coppice_console_vnc_* metadata on the sandbox.

DELETE /sandboxes/:id, /exec, /pause, /resume, and /connect all route through the same backend_for helper using the live sandbox’s template_id. Pause/resume is owner-preserving: SSH guests call pool-ctl pause/resume <entry-id>, host-console guests call console-pause/console-resume <sandbox-id>, and the pool keeps paused entries out of checkout by marking them paused rather than available. Snapshot / fork paths stay on the jail backend — bhyve v1 doesn’t implement them and the HTTP layer maps those calls to the same 501-style Other("…not supported on bhyve backend v1") response every other unsupported-on-bhyve op returns.

That metadata merge is the key UI bridge. SandboxCreated.ip is now optional; host-console guests leave it unset and use metadata instead, which the routes layer persists so GET /sandboxes and GET /sandboxes/:id can hand it back to the portal.

The SSH-exec shim (SSH-mode guests)

Bhyve guests don’t host the gateway’s envd (a fully-fledged in-guest gRPC service is future work). In v1 the gateway reaches in over SSH:

ssh -i /root/coppice-signing/pool-key \
    -o StrictHostKeyChecking=no \
    -o UserKnownHostsFile=/dev/null \
    -o ConnectTimeout=5 \
    -o BatchMode=yes \
    <template-user>@10.77.0.<N> -- <escaped cmd>

The pool image bakes the public half of the key into the guest’s SSH trust path and sets PerSourcePenalties no in sshd_config — without that second edit OpenSSH 9.8’s per-source-ip lockout trips on the probe loop the smoke rig hammers during warm. The login user now comes from the template’s optional .conf sidecar rather than being hardcoded to root. Commands are single-quote-escaped before being concatenated; the guest shell handles the rest.

Cost: one SSH connect per /exec, somewhere between 20 and 60 ms depending on whether the guest has already warmed its host key cache. Acceptable for demo-portal latency; comfortably below the 147 ms the REST → ready path spends on checkout.

Host-console guests do not use this path. For gateway_mode=host-vnc, /exec and websocket shell startup return a clear “host-console bhyve guest” error until in-guest bootstrap exists. That is deliberate: a Windows console guest that does not yet have OpenSSH enabled should not pretend the SSH shim can reach it.

The numbers

Two wall times to keep straight:

Checkout → SSH-ready: 147 ms on honor. This is what the pool substrate reports from coppice-bhyve-pool-ctl.sh checkout — the gateway’s POST /sandboxes adds one shell-out plus whatever the SDK round-trip costs, so end-to-end REST create → SSH works is ~180 ms.
vCPU resume: 17 ms, reported in snapshot-cloning. That’s the bhyve kernel-level SIGCONT → vCPU-runtime-advances number, and it’s a lower bound on anything the substrate can do. The gap between 17 ms and 147 ms is SSH handshake plus readiness probes.

coppice_bhyve_checkout_ns_sum (divided by coppice_bhyve_checkouts_total) gives the running mean in production. coppice_bhyve_pool_available{template} and coppice_bhyve_pool_in_use{template} are refreshed every 10 s from pool-ctl list --json. Entries in paused state count as in-use capacity, not available capacity.

What this v1 doesn’t do

Durable pause across host reboot. Gateway pause/resume is process-level SIGSTOP/SIGCONT, the same live-memory hot-tier mechanism as checkout. It preserves guest RAM only while the host stays up. Durable bhyve checkpoints remain the bhyvectl --suspend path documented under snapshot-cloning.
snapshot / fork / delete_snapshot. Needs per-entry ZFS snapshots on the pool’s image directory — Session C on the substrate side. For v1 the jail path is the only durable-snapshot surface.
Template signing. The jail backend verifies a signify(1) signature over the snapshot’s ZFS guid before cloning (see /appendix/image-signing); bhyve templates are raw images and have no equivalent gate in v1. The COPPICE_REQUIRE_SIGNED_TEMPLATES=1 environment knob still applies to jails and only jails.
CPU / memory limits. The pool-entry’s CPU and memory are baked into the image (bhyve -c N -m XG). The HTTP layer accepts cpuCount / memoryMB on create but the bhyve backend ignores them for now. The jail backend honours them via rctl(8) as before.
Host-console promotion into the SSH pool. Windows can launch from /ui/ today, but it is not yet a warm / checkout / return guest because readiness is still gated on SSH. That becomes real only after the guest installs OpenSSH and consumes the per-clone seed disk.
Per-jail ifconfig lo0 up and friends. The pool image handles interface bring-up internally on boot; the gateway doesn’t touch guest-side networking.

Pool lifecycle

Warm happens at gateway startup when —bhyve-pool-size N is set. The gateway picks the first bhyve template the registry lists, shells out to coppice-bhyve-pool-ctl.sh warm <tpl> —count N, and blocks startup until it returns. N=2 is ~15 s on honor. Operators who want multiple templates warm at boot should run additional POST /pool/<name>/warm (or the shell equivalent) after startup — the gateway’s startup warm is a default, not a policy.

Drain happens on SIGINT (Ctrl-C). The signal handler walks every bhyve template in the registry and runs … drain <tpl> best-effort before exit. A second Ctrl-C short-circuits.

Cross-references

/appendix/bhyve-substrate — the pool-ctl script, template image build, and the 147 ms receipt on honor.
/appendix/windows-bhyve-template — the current Windows Server 2025 host-console path.
/appendix/snapshot-cloning — the kernel-level 17 ms number this sits on top of.
/appendix/image-signing — what the jail backend does that bhyve v1 doesn’t.
e2b-compat/src/backend/bhyve.rs — the code.
e2b-compat/src/state.rs::AppState::backend_for — the per-call dispatcher.