bhyve backend

The jail backend is fast and cheap and has been the gateway’s only substrate since the repo existed. The bhyve backend is the second, wired up after the Session A substrate work landed: same REST surface, same SDK shape, different isolation boundary. This appendix is the editorial companion to that wiring — why two backends co-exist, how the gateway picks between them, and what the v1 shim does and doesn’t do.

Why two backends

Jails are the right answer for most of the demo surface. They start in tens of milliseconds, share a kernel, and the existing VNET/pf plumbing puts every sandbox on its own 10.78.0.X address with full-stack network isolation. For code-interpreter workloads — run a cell, read a file, kill — the isolation ceiling jails offer is plenty.

bhyve is the answer when the threat model demands a full guest kernel. The vmm-vnode patch lets one host hold a thousand 256 MiB microVMs without the memory arithmetic falling apart, and the pool-ctl substrate (see /appendix/bhyve-substrate) already keeps a handful of SIGSTOPped guests ready to go. The gateway’s job is to hand one of those pool slots to an SDK caller on demand and SSH into it when the caller asks to run something.

Both backends serve the same Backend trait. AppState stashes one Arc<dyn Backend> for the jail path (always present) and an Option<Arc<dyn Backend>> for the bhyve path (present only when the operator opts in). Per-call dispatch picks the right one via a backend_for(&template) helper that consults the template registry.

Template flavouring

The registry grows two discovery passes. The jail pass is unchanged: every directory matching <templates_root>/<name>-template becomes a BackendKind::Jail entry. The bhyve pass is added when the operator passes --bhyve-templates-root <path> (default unset): every file at <path>/<name>.img becomes a BackendKind::Bhyve entry. The honor box has /vms/templates/python-bhyve.img; dev laptops leave the flag unset and the bhyve pass is a no-op.

If a name collides across the two passes — say someone builds a python-bhyve-template jail and a python-bhyve.img image — the bhyve entry wins. This is an operator mistake, not a data condition; the bhyve substrate is the higher-fidelity path so we’d rather surface the microVM.

GET /templates serialises the backend field alongside name and path, so coppice tpl list shows it and the demo-portal UI can key off it. The registry is otherwise unchanged — hot-reload (POST /templates/reload) rescans both roots.

Dispatch

POST /sandboxes {templateID: "python-bhyve"} lands on the same create_sandbox handler as before. The handler calls state.backend_for(&req.template_id), which returns the right Arc<dyn Backend> based on the registry’s backend field, and then calls create_with_limits on it. From there down it’s the backend’s problem: the jail backend does its ZFS clone + epair + jail -c dance; the bhyve backend shells out to /usr/local/sbin/coppice-bhyve-pool-ctl.sh checkout <template>, parses the one-line JSON, and returns the allocated 10.77.0.X IP.

DELETE /sandboxes/:id, /exec, /pause, /resume, and /connect all route through the same backend_for helper using the live sandbox’s template_id. Snapshot / fork paths stay on the jail backend — bhyve v1 doesn’t implement them and the HTTP layer maps those calls to the same 501-style Other("…not supported on bhyve backend v1") response every other unsupported-on-bhyve op returns.

The SSH-exec shim

Bhyve guests don’t host the gateway’s envd (a fully-fledged in-guest gRPC service is future work). In v1 the gateway reaches in over SSH:

ssh -i /root/coppice-signing/pool-key \
    -o StrictHostKeyChecking=no \
    -o UserKnownHostsFile=/dev/null \
    -o ConnectTimeout=5 \
    -o BatchMode=yes \
    root@10.77.0.<N> -- <escaped cmd>

The pool image bakes the public half of the key into /root/.ssh/authorized_keys and sets PerSourcePenalties no in sshd_config — without that second edit OpenSSH 9.8’s per-source-ip lockout trips on the probe loop the smoke rig hammers during warm. Commands are single-quote-escaped before being concatenated; the pool template’s /bin/sh handles the rest.

Cost: one SSH connect per /exec, somewhere between 20 and 60 ms depending on whether the guest has already warmed its host key cache. Acceptable for demo-portal latency; comfortably below the 147 ms the REST → ready path spends on checkout.

The numbers

Two wall times to keep straight:

coppice_bhyve_checkout_ns_sum (divided by coppice_bhyve_checkouts_total) gives the running mean in production. coppice_bhyve_pool_available{template} and coppice_bhyve_pool_in_use{template} are refreshed every 10 s from pool-ctl list --json.

What this v1 doesn’t do

Pool lifecycle

Warm happens at gateway startup when —bhyve-pool-size N is set. The gateway picks the first bhyve template the registry lists, shells out to coppice-bhyve-pool-ctl.sh warm <tpl> —count N, and blocks startup until it returns. N=2 is ~15 s on honor. Operators who want multiple templates warm at boot should run additional POST /pool/<name>/warm (or the shell equivalent) after startup — the gateway’s startup warm is a default, not a policy.

Drain happens on SIGINT (Ctrl-C). The signal handler walks every bhyve template in the registry and runs … drain <tpl> best-effort before exit. A second Ctrl-C short-circuits.

Cross-references