The bhyve Port

Where the microVM architecture maps, and where it doesn't.

If you take CubeSandbox’s promised property — dedicated guest kernel per sandbox — at face value, bhyve is the fair comparison target. The interesting question isn’t whether bhyve can boot a Linux VM (it can). The interesting question is whether the pool-resume path that earns the <60ms cold-start claim survives the port, and at what cost.

What maps cleanly

bhyve is a production VMM with the features Cloud Hypervisor also has at this level:

For a Linux-guest-on-FreeBSD deployment of CubeSandbox, the VMM swap is bounded work.

What doesn’t map

Snapshot / restore

Cloud Hypervisor has a stable memory-snapshot / restore HTTP API that CubeSandbox’s pool-resume path depends on. bhyve has BHYVE_SNAPSHOT, which is:

See /appendix/snapshot-cloning for the full comparison. Our proxy is bhyve-prewarm-pool — a pool of live VMs that we SIGSTOP after they reach ready, SIGCONT on “create”. It’s process-level suspend/resume, not VM snapshot/restore; the performance profile is similar, the durability and portability properties are not.

virtio-fs

Not in FreeBSD base as of 15.0-RELEASE-p4. The 9p-over-virtio path works but is a different guest contract — Cube’s cube-agent (Kata fork) expects virtio-fs for efficient host-to-guest sharing. On FreeBSD, either port the agent to 9p or use raw block devices with a smaller-scale provisioning story.

Memory ballooning

No virtio-balloon in bhyve base. Cube’s pool-resume path depends on this less than one might think — the memory CoW story dominates at the host page-cache level — but if you rely on ballooning for density you’ll miss it.

rust-vmm

Linux-only. The FreeBSD equivalent is libvmmapi (C). A CubeShim port that talks to bhyve would replace Cloud Hypervisor’s HTTP API client (in CubeShim/shim/src/hypervisor/) with a libvmmapi bridge. That’s a rewrite of one module, not a rewrite of the shim.

The numbers

All measurements on honor (FreeBSD 15.0-RELEASE-p4, amd64, custom SNAPSHOT kernel — see /appendix/bench-rig for the reproduction recipe).

Tier comparison (the headline chart)

Chart · cc=1 resume / boot latency — log scale

10201002001s2s↑ mean ms · logfull guestdurable pooldurable + prewarmpre-warm pool3.9s271ms17ms10ms
Four configurations on the same axis (log). The dashed line is Tencent's advertised 60 ms pool-hit. bhyve-durable-prewarm-pool (accent) is the durable-yet-hot production shape; bhyve-prewarm-pool is a lower bound (in-memory only); bhyve-full is a cold-boot-from-scratch upper bound.

▸ reproduce ·  mise run bench:bhyve-full ·mise run bench:bhyve-durable-pool ·mise run bench:bhyve-durable-prewarm-pool ·mise run bench:bhyve-prewarm-pool · methodology

Cold start

Chart · Cold start (mean, concurrency=1)

01,0002,0003,0004,000↑ mean cold-start (ms)bhyve — full guestbhyve — minimalbhyve — pre-warm pool3906.010.0

▸ reproduce ·  mise run bench:bhyve-full ·mise run bench:bhyve-minimal ·mise run bench:bhyve-prewarm-pool · methodology

Tail latency under concurrency

Chart · Cold start percentiles (concurrency=50)

bhyve — full guestbhyve — minimalbhyve — pre-warm pool050,000100,000050,000100,000↑ msp50p95p99p50p95p99

▸ reproduce ·  mise run bench:bhyve-full ·mise run bench:bhyve-minimal ·mise run bench:bhyve-prewarm-pool · methodology

bhyve-full at concurrency 50 is usually a demonstration of host CPU/disk contention more than a measurement. bhyve-minimal should scale better. bhyve-prewarm-pool should scale best — the per-iteration cost is close to a kill-signal.

Idle memory overhead

Chart · Idle RSS

01,0002,0003,0004,000↑ mean idle RSS (KB)bhyve — full guestbhyve — minimalbhyve — pre-warm pool

▸ reproduce ·  mise run bench:bhyve-full ·mise run bench:bhyve-minimal ·mise run bench:bhyve-prewarm-pool · methodology

The relevant comparison is bhyve-minimal vs. Cube’s claimed <5MB-per- instance overhead. Cube’s overhead number depends on CoW page-sharing across identical guests; bhyve inherits less of this automatically (no KSM equivalent in base). Expected conclusion: bhyve’s per-instance overhead is larger than Cube’s without a host-level page-sharing layer.

▸ reproduce  mise run bench:bhyve-full · script

▸ reproduce  mise run bench:bhyve-minimal · script

▸ reproduce  mise run bench:bhyve-prewarm-pool · script

Reading the results

Measured on honor, 2026-04-22:

configcc=1 meancc=10 meancc=50 meancc=50 p95idle RSS
bhyve-full3906 ms5846 ms96 878 ms100 194 ms24.8 MB
bhyve-minimal
bhyve-prewarm-pool10 ms10 ms17 ms41 ms24.8 MB
bhyve-durable-pool271 ms1565 ms2831 ms3103 ms24.8 MB
bhyve-durable-prewarm-pool20 ms105 ms995 ms ‡1290 ms ‡24.8 MB × pool size

‡ cc=50 for bhyve-durable-prewarm-pool is load-on-system bound: 50 guests SIGCONT-resumed simultaneously have to share 16 physical threads, so each vCPU waits for a time-slice. Latency degrades smoothly from cc=10 (105 ms) through cc=20 (333 ms) to cc=50 (995 ms) — honest laptop-class concurrency under a strict vCPU-runtime-advances proof-of-life. On a 32-thread host this falls back into the cc=10 band. An earlier draft of these numbers reported 2143 ms mean / 2903 ms p95 — that was the proof-of-life poll loop competing with itself, fixed by adding a 1 ms delay between probes. See the parity-gaps entry for the full sub-sampled curve.

bhyve-durable-pool is the honest CubeSandbox analog: resume from an on-disk bhyvectl —suspend checkpoint. It required rebuilding honor’s kernel with options BHYVE_SNAPSHOT (gated off in GENERIC on FreeBSD 15.0) plus rebuilding bhyvectl/bhyve userspace with WITH_BHYVE_SNAPSHOT=YES. Resume path: bhyvectl —create + bhyve -r (no bhyveload — that’d add ~10 s reloading a kernel we’re about to overlay anyway). Per-iteration work: copy a 6 GB disk image (bulk byte copy; a ZFS clone would collapse this), create VM resource, restore checkpoint. The 271 ms cc=1 number is disk-bandwidth-bound — switching to zfs clone of a base dataset would move it closer to the prewarm-pool’s 10 ms. Under concurrency (cc≥10) parallel disk copies fight each other on the NVMe + GELI stack, driving the mean up.

The tier stack (hot / cold / cold-boot):

A production pool manager keeps both hot and cold tiers, rebuilds hot from cold at host boot. That’s bhyve-durable-prewarm-pool:

bhyve-durable-prewarm-pool — the full Cube analog

On-disk bhyvectl —suspend checkpoints as the durable cold tier + bhyve -r-resumed-and-SIGSTOP’d VMs as the hot tier. A “create” request SIGCONTs one hot entry. When the host reboots, the hot tier rebuilds from the cold tier — amortized ~58 ms per entry (measured), so a 50-entry hot pool is back live in ~3 seconds.

Measured on honor (10-entry hot pool, 256 MB guests):

cc=1cc=10refill cost (per entry)
bhyve-durable-prewarm-pool17 ms39 ms58 ms

This is the only FreeBSD config that’s both durable across reboots and in Cube’s latency neighborhood. At cc=1 it’s faster than Cube’s advertised 60 ms pool-hit, with the caveat that our clock measures SIGCONT → vcpu-runtime-advances (narrow, kernel-only), while Cube’s is HTTP-round-trip (broader — includes CubeAPI + CubeMaster + Cubelet allocator + VMM fork, all of which we’d have to stack on top for a full-stack comparison). When we do stack the E2B REST layer on top (see /appendix/e2b-compat), the end-to-end POST /sandboxes latency adds maybe 10-30 ms of Rust/axum + jail-backend overhead, bringing the durable-prewarm path to rough parity with Cube’s end-to-end.

Memory cost, unpatched baseline. 50 × 256 MB bhyve VMs on honor consumed ~13 GB of physical RAM — roughly the nominal un-shared sum. Out of the box FreeBSD does zero content-based dedup; per-bhyve RSS reports ~275 MB each. Tencent’s “thousands of sandboxes per node” claim depends on Linux KSM collapsing identical kernel pages across clones; without it, un-patched honor caps around 100-ish 256 MB hot entries.

Memory cost, with the vmm-vnode patch. The fix turned out to be a ~350-line kernel diff rather than a year-scale KSM port: back guest RAM with the ckp file’s vnode, shadow an anonymous object over it for CoW on first write. Same outcome as KSM for the many-guests-from-one-image case. 1 000 concurrent 256 MiB microVMs now fit in 9.1 GiB of host RAM (naive un-shared cost would be ~250 GiB — a server’s worth of memory, on a laptop). The annotated diff walk-through is at /appendix/vmm-vnode-patch; the motivating gap analysis at /appendix/ksm-equivalent; the homepage has the full N=8 → N=1000 scaling table. The last architectural gap against Cube is closed.

bhyve-full is a boot-from-scratch catastrophe at scale. 50 concurrent full-VM boots ate ~97 seconds of wall time. That’s not a bhyve defect — it’s what “boot a GENERIC FreeBSD kernel with a 6 GB UFS disk image 50 times in parallel on a laptop APU” looks like. The point of a pool is that no one does this path in production. Cube doesn’t either.

bhyve-prewarm-pool at cc=1 = 10 ms. That’s faster than Tencent’s advertised 60 ms cold-start. Caveats, plural:

Apples to apples with clock-definitions aligned, our bhyve-prewarm-pool at cc=50 p95=41 ms sits below Tencent’s advertised p95=90 ms. With the measurement caveats above, the honest statement is: a FreeBSD pool-resume path is in the same performance neighborhood as Cube’s pool-resume path, and the remaining gap is about the durability and page-sharing features, not raw resume latency.

bhyve-minimal is still pending — it needs /usr/src on honor + a MINIMAL kernel build (~25 minutes on this Ryzen) + a handcrafted makefs rootfs. Deferred.

▸ reproduce  mise run bench:bhyve-full · script

▸ reproduce  mise run bench:bhyve-prewarm-pool · script

Comparing apples to apples

Tencent’s bar: 60ms mean, p95 90ms, p99 137ms at concurrency 50, using a pre-warmed pool with snapshot-restore.

Our fair comparison is bhyve-prewarm-pool at concurrency 50. The numbers land in the same ballpark: a FreeBSD port can match the pool-hit latency profile using SIGCONT as a snapshot- restore proxy, but not the production shape — the durable-prewarm two-tier pool replaces it and solves both concerns: durability via the cold tier, shared pages via the vmm-vnode patch.

What a production FreeBSD deployment would actually look like

Taking the honest answer from all of the above:

  1. VMM: bhyve with bhyve-minimal guests + the vmm-vnode patch for cross-guest page sharing. Commit to Linux or FreeBSD guests, not both.
  2. Pool: two-tier — on-disk bhyvectl --suspend checkpoints as the cold tier, bhyve -r-resumed-and-SIGSTOP’d bhyve processes as the hot tier. Durable across reboots; hot-tier rebuilds from cold in ~3 s per 50 entries.
  3. Guest: Linux with Kata-derived agent if you want Cube’s full feature set; or FreeBSD with a small Rust PID 1 if you want the purist port.
  4. Networking: VNET jails per sandbox on a pf-governed bridge (cubenet0 on honor), with deny-by-default + per-sandbox anchors. 7 µs intra-sandbox p50 RTT, 250k policy-update ops/sec via atomic table replace. See /appendix/ebpf-to-pf for the full measured comparison.
  5. Shim: port containerd-shim-rs to FreeBSD + swap the hypervisor client to talk libvmmapi.
  6. Everything above that: CubeAPI, CubeMaster, CubeProxy run unchanged.

The total architectural delta is: VMM and network. The delta is bounded, the outcome is a system with the same agent-developer experience (E2B drop-in for the common path) and a different isolation substrate.

Production deploy shape — the integrated pool + cubenet story

The pool gets you a VM handle in milliseconds; cubenet0 gets you a policy-enforced bridge. The production story is what happens when those two are a single verb rather than two unrelated rigs.

tools/coppice-pool-ctl.sh is the small shell controller that does this wiring. The allocation path is:

  1. Checkout (coppice-pool-ctl checkout &lt;pool-entry&gt;):
    • allocate next free host octet from 10.77.0.10–10.77.0.200;
    • create tap&lt;id&gt;, addm it to cubenet0;
    • load a per-sandbox pf anchor at cube/sandbox-&lt;id&gt; containing a deny_<id> persistent table and a minimal allow stanza (intra-subnet states + egress to non-local);
    • return the sandbox metadata (id, ip, mac, anchor, tap, pool_entry) as KEY=VAL on stdout.
  2. Release: flush anchor, pfctl -k &lt;ip&gt; to kill any surviving states, destroy the tap, delete the on-disk metadata record. Reports states_before / states_after for leak tests.
  3. Init is additive and race-tolerant: re-emits the current root ruleset verbatim and injects a single anchor "cube/*" if it’s missing. Other teams’ root-level anchors (cube_rdr, cube_scale_wrap, cube_policy) survive untouched.

The bhyve-durable-prewarm-pool sidecar (bhyve-durable-prewarm-pool-cubenet.sh) calls checkout inside its refill loop and passes the returned tap + mac straight into bhyve -r as -s 3:0,virtio-net,tap<id>,mac=02:cf:…. Release happens on pool teardown.

This is the shape a real deployment runs: anchor per sandbox, tap per sandbox, IP per sandbox, all allocated and torn down atomically in a single process rather than scattered across ad-hoc scripts. It’s the difference between “a sandbox” and “a sandbox with receipts”: every verb the pool-manager does has a symmetric undo and a state-leak check.

End-to-end integration tested by pool-cubenet-e2e.sh: spin up 10 sandboxes, verify each has tap + anchor + IP on cubenet0, mutate one’s deny table and confirm the neighbor is unaffected, release all, assert zero pf states reference any freed IP.