The bhyve Port

If you take CubeSandbox’s promised property — dedicated guest kernel per sandbox — at face value, bhyve is the fair comparison target. The interesting question isn’t whether bhyve can boot a Linux VM (it can). The interesting question is whether the pool-resume path that earns the <60ms cold-start claim survives the port, and at what cost.

What maps cleanly

bhyve is a production VMM with the features Cloud Hypervisor also has at this level:

vCPU model — both use KVM-style per-vCPU threads with hardware virtualization extensions (VT-x / AMD-V). bhyve drives them through its vmm(4) kernel module; Cloud Hypervisor uses KVM ioctls. The abstractions over vCPU state (run/pause/read-registers) are essentially isomorphic.
Paravirt devices — both speak virtio. bhyve ships virtio-blk, virtio-net, virtio-console, virtio-9p, virtio-rnd, virtio-scsi. Cloud Hypervisor ships all of those plus virtio-fs, virtio-balloon, virtio-iommu, and vhost-user frontends. Feature parity on the hot path; feature gap on advanced device offload.
PCI passthrough — bhyve’s ppt(4) + host-device fencing maps to Cloud Hypervisor’s VFIO-based passthrough. Comparable capability, different plumbing.
UEFI / Linux direct boot — bhyve boots UEFI guests and Linux kernels via grub-bhyve or bhyveload. Cloud Hypervisor boots PVH + bzImage directly. Different boot protocols, same outcome.

For a Linux-guest-on-FreeBSD deployment of CubeSandbox, the VMM swap is bounded work.

What doesn’t map

Snapshot / restore

Cloud Hypervisor has a stable memory-snapshot / restore HTTP API that CubeSandbox’s pool-resume path depends on. bhyve has BHYVE_SNAPSHOT, which is:

off in GENERIC on FreeBSD 15.0
documented as experimental with partial device-state coverage
not something we can benchmark honestly without either enabling it (and validating the device-state restore for each virtio device in use) or building a proxy that’s not actually the same thing.

See /appendix/snapshot-cloning for the full comparison. Our proxy is bhyve-prewarm-pool — a pool of live VMs that we SIGSTOP after they reach ready, SIGCONT on “create”. It’s process-level suspend/resume, not VM snapshot/restore; the performance profile is similar, the durability and portability properties are not.

virtio-fs

Not in FreeBSD base as of 15.0-RELEASE-p4. The 9p-over-virtio path works but is a different guest contract — Cube’s cube-agent (Kata fork) expects virtio-fs for efficient host-to-guest sharing. On FreeBSD, either port the agent to 9p or use raw block devices with a smaller-scale provisioning story.

Memory ballooning

No virtio-balloon in bhyve base. Cube’s pool-resume path depends on this less than one might think — the memory CoW story dominates at the host page-cache level — but if you rely on ballooning for density you’ll miss it.

rust-vmm

Linux-only. The FreeBSD equivalent is libvmmapi (C). A CubeShim port that talks to bhyve would replace Cloud Hypervisor’s HTTP API client (in CubeShim/shim/src/hypervisor/) with a libvmmapi bridge. That’s a rewrite of one module, not a rewrite of the shim.

The numbers

All measurements on honor (FreeBSD 15.0-RELEASE-p4, amd64, custom SNAPSHOT kernel — see /appendix/bench-rig for the reproduction recipe).

Tier comparison (the headline chart)

Chart · cc=1 resume / boot latency — log scale

Four configurations on the same axis (log). The dashed line is Tencent's advertised 60 ms pool-hit. bhyve-durable-prewarm-pool (accent) is the durable-yet-hot production shape; bhyve-prewarm-pool is a lower bound (in-memory only); bhyve-full is a cold-boot-from-scratch upper bound.

▸ reproduce · mise run bench:bhyve-full ·mise run bench:bhyve-durable-pool ·mise run bench:bhyve-durable-prewarm-pool ·mise run bench:bhyve-prewarm-pool · methodology

Cold start

Chart · Cold start (mean, concurrency=1)

▸ reproduce · mise run bench:bhyve-full ·mise run bench:bhyve-minimal ·mise run bench:bhyve-prewarm-pool · methodology

bhyve-full boots a GENERIC FreeBSD guest from a full base.txz image. The naïve baseline — no one would ship a microVM service this way.
bhyve-minimal uses a stripped MINIMAL kernel built from /usr/src plus a makefs-generated tiny UFS rootfs. The apples-to-apples “Firecracker-class minimal guest” config.
bhyve-prewarm-pool pre-boots POOL_SIZE minimal VMs, SIGSTOPs them at ready, SIGCONT-resumes on “create”. This is our fair analog to Cube’s snapshot-resume pool.

Tail latency under concurrency

Chart · Cold start percentiles (concurrency=50)

▸ reproduce · mise run bench:bhyve-full ·mise run bench:bhyve-minimal ·mise run bench:bhyve-prewarm-pool · methodology

bhyve-full at concurrency 50 is usually a demonstration of host CPU/disk contention more than a measurement. bhyve-minimal should scale better. bhyve-prewarm-pool should scale best — the per-iteration cost is close to a kill-signal.

Idle memory overhead

Chart · Idle RSS

▸ reproduce · mise run bench:bhyve-full ·mise run bench:bhyve-minimal ·mise run bench:bhyve-prewarm-pool · methodology

The relevant comparison is bhyve-minimal vs. Cube’s claimed <5MB-per- instance overhead. Cube’s overhead number depends on CoW page-sharing across identical guests; bhyve inherits less of this automatically (no KSM equivalent in base). Expected conclusion: bhyve’s per-instance overhead is larger than Cube’s without a host-level page-sharing layer.

▸ reproduce mise run bench:bhyve-full · script

▸ reproduce mise run bench:bhyve-minimal · script

▸ reproduce mise run bench:bhyve-prewarm-pool · script

Reading the results

Measured on honor, 2026-04-22:

config	cc=1 mean	cc=10 mean	cc=50 mean	cc=50 p95	idle RSS
bhyve-full	3906 ms	5846 ms	96 878 ms	100 194 ms	24.8 MB
bhyve-minimal	—	—	—	—	—
bhyve-prewarm-pool	10 ms	10 ms	17 ms	41 ms	24.8 MB
bhyve-durable-pool	271 ms	1565 ms	2831 ms	3103 ms	24.8 MB
bhyve-durable-prewarm-pool	20 ms	105 ms	995 ms ‡	1290 ms ‡	24.8 MB × pool size

‡ cc=50 for bhyve-durable-prewarm-pool is load-on-system bound: 50 guests SIGCONT-resumed simultaneously have to share 16 physical threads, so each vCPU waits for a time-slice. Latency degrades smoothly from cc=10 (105 ms) through cc=20 (333 ms) to cc=50 (995 ms) — honest laptop-class concurrency under a strict vCPU-runtime-advances proof-of-life. On a 32-thread host this falls back into the cc=10 band. An earlier draft of these numbers reported 2143 ms mean / 2903 ms p95 — that was the proof-of-life poll loop competing with itself, fixed by adding a 1 ms delay between probes. See the parity-gaps entry for the full sub-sampled curve.

bhyve-durable-pool is the honest CubeSandbox analog: resume from an on-disk bhyvectl —suspend checkpoint. It required rebuilding honor’s kernel with options BHYVE_SNAPSHOT (gated off in GENERIC on FreeBSD 15.0) plus rebuilding bhyvectl/bhyve userspace with WITH_BHYVE_SNAPSHOT=YES. Resume path: bhyvectl —create + bhyve -r (no bhyveload — that’d add ~10 s reloading a kernel we’re about to overlay anyway). Per-iteration work: copy a 6 GB disk image (bulk byte copy; a ZFS clone would collapse this), create VM resource, restore checkpoint. The 271 ms cc=1 number is disk-bandwidth-bound — switching to zfs clone of a base dataset would move it closer to the prewarm-pool’s 10 ms. Under concurrency (cc≥10) parallel disk copies fight each other on the NVMe + GELI stack, driving the mean up.

The tier stack (hot / cold / cold-boot):

bhyve-prewarm-pool (SIGCONT, in-memory) = hot tier ≈ 10 ms, dies on host reboot.
bhyve-durable-pool (bhyve -r, on-disk) = cold tier ≈ 270 ms, survives reboot.
bhyve-full (full kernel boot) = cold-boot ≈ 4 s single, ~97 s at cc=50.

A production pool manager keeps both hot and cold tiers, rebuilds hot from cold at host boot. That’s bhyve-durable-prewarm-pool:

`bhyve-durable-prewarm-pool` — the full Cube analog

On-disk bhyvectl —suspend checkpoints as the durable cold tier + bhyve -r-resumed-and-SIGSTOP’d VMs as the hot tier. A “create” request SIGCONTs one hot entry. When the host reboots, the hot tier rebuilds from the cold tier — amortized ~58 ms per entry (measured), so a 50-entry hot pool is back live in ~3 seconds.

Measured on honor (10-entry hot pool, 256 MB guests):

	cc=1	cc=10	refill cost (per entry)
bhyve-durable-prewarm-pool	17 ms	39 ms	58 ms

This is the only FreeBSD config that’s both durable across reboots and in Cube’s latency neighborhood. At cc=1 it’s faster than Cube’s advertised 60 ms pool-hit, with the caveat that our clock measures SIGCONT → vcpu-runtime-advances (narrow, kernel-only), while Cube’s is HTTP-round-trip (broader — includes CubeAPI + CubeMaster + Cubelet allocator + VMM fork, all of which we’d have to stack on top for a full-stack comparison). When we do stack the E2B REST layer on top (see /appendix/e2b-compat), the end-to-end POST /sandboxes latency adds maybe 10-30 ms of Rust/axum + jail-backend overhead, bringing the durable-prewarm path to rough parity with Cube’s end-to-end.

Memory cost, unpatched baseline. 50 × 256 MB bhyve VMs on honor consumed ~13 GB of physical RAM — roughly the nominal un-shared sum. Out of the box FreeBSD does zero content-based dedup; per-bhyve RSS reports ~275 MB each. Tencent’s “thousands of sandboxes per node” claim depends on Linux KSM collapsing identical kernel pages across clones; without it, un-patched honor caps around 100-ish 256 MB hot entries.

Memory cost, with the vmm-vnode patch. The fix turned out to be a ~350-line kernel diff rather than a year-scale KSM port: back guest RAM with the ckp file’s vnode, shadow an anonymous object over it for CoW on first write. Same outcome as KSM for the many-guests-from-one-image case. 1 000 concurrent 256 MiB microVMs now fit in 9.1 GiB of host RAM (naive un-shared cost would be ~250 GiB — a server’s worth of memory, on a laptop). The annotated diff walk-through is at /appendix/vmm-vnode-patch; the motivating gap analysis at /appendix/ksm-equivalent; the homepage has the full N=8 → N=1000 scaling table. The last architectural gap against Cube is closed.

bhyve-full is a boot-from-scratch catastrophe at scale. 50 concurrent full-VM boots ate ~97 seconds of wall time. That’s not a bhyve defect — it’s what “boot a GENERIC FreeBSD kernel with a 6 GB UFS disk image 50 times in parallel on a laptop APU” looks like. The point of a pool is that no one does this path in production. Cube doesn’t either.

bhyve-prewarm-pool at cc=1 = 10 ms. That’s faster than Tencent’s advertised 60 ms cold-start. Caveats, plural:

Our SIGCONT proxy measures narrow: VCPU-unfreeze plus a tiny readiness settle. No memory restoration (the VM’s pages are already resident). No device re-init. No network reconfigure.
Cube’s 60 ms is an HTTP round-trip that includes CubeAPI parsing, CubeMaster scheduling, Cubelet allocator checkout, snapshot restore of VMM memory-file, CubeVS network-agent plumbing, and response write. That’s a much broader clock.
Cube’s warm pool survives reboots via durable snapshots. Ours does not — SIGSTOP’d bhyve processes die with the host.
Cube’s 5 MB per-instance overhead is an ensemble property (CoW page-sharing across identical guests). Our 24.8 MB per idle bhyve is un-shared, because FreeBSD doesn’t ship KSM-equivalent.

Apples to apples with clock-definitions aligned, our bhyve-prewarm-pool at cc=50 p95=41 ms sits below Tencent’s advertised p95=90 ms. With the measurement caveats above, the honest statement is: a FreeBSD pool-resume path is in the same performance neighborhood as Cube’s pool-resume path, and the remaining gap is about the durability and page-sharing features, not raw resume latency.

bhyve-minimal is still pending — it needs /usr/src on honor + a MINIMAL kernel build (~25 minutes on this Ryzen) + a handcrafted makefs rootfs. Deferred.

▸ reproduce mise run bench:bhyve-full · script

▸ reproduce mise run bench:bhyve-prewarm-pool · script

Comparing apples to apples

Tencent’s bar: 60ms mean, p95 90ms, p99 137ms at concurrency 50, using a pre-warmed pool with snapshot-restore.

Our fair comparison is bhyve-prewarm-pool at concurrency 50. The numbers land in the same ballpark: a FreeBSD port can match the pool-hit latency profile using SIGCONT as a snapshot- restore proxy, but not the production shape — the durable-prewarm two-tier pool replaces it and solves both concerns: durability via the cold tier, shared pages via the vmm-vnode patch.

What a production FreeBSD deployment would actually look like

Taking the honest answer from all of the above:

VMM: bhyve with bhyve-minimal guests + the vmm-vnode patch for cross-guest page sharing. Commit to Linux or FreeBSD guests, not both.
Pool: two-tier — on-disk bhyvectl --suspend checkpoints as the cold tier, bhyve -r-resumed-and-SIGSTOP’d bhyve processes as the hot tier. Durable across reboots; hot-tier rebuilds from cold in ~3 s per 50 entries.
Guest: Linux with Kata-derived agent if you want Cube’s full feature set; or FreeBSD with a small Rust PID 1 if you want the purist port.
Networking: VNET jails per sandbox on a pf-governed bridge (coppbhyve0 on honor — the bhyve bench rig bridge, renamed from cubenet0 in #66), with deny-by-default + per-sandbox anchors. 7 µs intra-sandbox p50 RTT, 250k policy-update ops/sec via atomic table replace. See /appendix/ebpf-to-pf for the full measured comparison.
Shim: port containerd-shim-rs to FreeBSD + swap the hypervisor client to talk libvmmapi.
Everything above that: CubeAPI, CubeMaster, CubeProxy run unchanged.

The total architectural delta is: VMM and network. The delta is bounded, the outcome is a system with the same agent-developer experience (E2B drop-in for the common path) and a different isolation substrate.

Production deploy shape — the integrated pool + bridge story

The pool gets you a VM handle in milliseconds; coppbhyve0 gets you a policy-enforced bridge. The production story is what happens when those two are a single verb rather than two unrelated rigs.

tools/coppice-pool-ctl.sh is the small shell controller that does this wiring. The allocation path is:

Checkout (coppice-pool-ctl checkout <pool-entry>):
- allocate next free host octet from 10.77.0.10–10.77.0.200;
- create tap<id>, addm it to coppbhyve0;
- load a per-sandbox pf anchor at coppice/sandbox-<id> containing a deny_<id> persistent table and a minimal allow stanza (intra-subnet states + egress to non-local);
- return the sandbox metadata (id, ip, mac, anchor, tap, pool_entry) as KEY=VAL on stdout.
Release: flush anchor, pfctl -k <ip> to kill any surviving states, destroy the tap, delete the on-disk metadata record. Reports states_before / states_after for leak tests.
Init is additive and race-tolerant: re-emits the current root ruleset verbatim and injects a single anchor "coppice/*" if it’s missing. Other teams’ root-level anchors (cube_rdr, cube_scale_wrap, cube_policy) survive untouched.

The bhyve-durable-prewarm-pool sidecar (bhyve-durable-prewarm-pool-coppicenet.sh) calls checkout inside its refill loop and passes the returned tap + mac straight into bhyve -r as -s 3:0,virtio-net,tap<id>,mac=02:cf:…. Release happens on pool teardown.

This is the shape a real deployment runs: anchor per sandbox, tap per sandbox, IP per sandbox, all allocated and torn down atomically in a single process rather than scattered across ad-hoc scripts. It’s the difference between “a sandbox” and “a sandbox with receipts”: every verb the pool-manager does has a symmetric undo and a state-leak check.

End-to-end integration tested by pool-coppicenet-e2e.sh: spin up 10 sandboxes, verify each has tap + anchor + IP on coppbhyve0, mutate one’s deny table and confirm the neighbor is unaffected, release all, assert zero pf states reference any freed IP.

What maps cleanly

What doesn’t map

Snapshot / restore

virtio-fs

Memory ballooning

rust-vmm

The numbers

Tier comparison (the headline chart)

Cold start

Tail latency under concurrency

Idle memory overhead

Reading the results

bhyve-durable-prewarm-pool — the full Cube analog

Comparing apples to apples

What a production FreeBSD deployment would actually look like

Production deploy shape — the integrated pool + bridge story

`bhyve-durable-prewarm-pool` — the full Cube analog