If you take CubeSandbox’s promised property — dedicated guest kernel per sandbox — at face value, bhyve is the fair comparison target. The interesting question isn’t whether bhyve can boot a Linux VM (it can). The interesting question is whether the pool-resume path that earns the <60ms cold-start claim survives the port, and at what cost.
What maps cleanly
bhyve is a production VMM with the features Cloud Hypervisor also has at this level:
- vCPU model — both use KVM-style per-vCPU threads with hardware
virtualization extensions (VT-x / AMD-V). bhyve drives them through its
vmm(4)kernel module; Cloud Hypervisor uses KVM ioctls. The abstractions over vCPU state (run/pause/read-registers) are essentially isomorphic. - Paravirt devices — both speak virtio. bhyve ships virtio-blk, virtio-net, virtio-console, virtio-9p, virtio-rnd, virtio-scsi. Cloud Hypervisor ships all of those plus virtio-fs, virtio-balloon, virtio-iommu, and vhost-user frontends. Feature parity on the hot path; feature gap on advanced device offload.
- PCI passthrough — bhyve’s
ppt(4)+ host-device fencing maps to Cloud Hypervisor’s VFIO-based passthrough. Comparable capability, different plumbing. - UEFI / Linux direct boot — bhyve boots UEFI guests and Linux kernels via grub-bhyve or bhyveload. Cloud Hypervisor boots PVH + bzImage directly. Different boot protocols, same outcome.
For a Linux-guest-on-FreeBSD deployment of CubeSandbox, the VMM swap is bounded work.
What doesn’t map
Snapshot / restore
Cloud Hypervisor has a stable memory-snapshot / restore HTTP API that
CubeSandbox’s pool-resume path depends on. bhyve has BHYVE_SNAPSHOT,
which is:
- off in GENERIC on FreeBSD 15.0
- documented as experimental with partial device-state coverage
- not something we can benchmark honestly without either enabling it (and validating the device-state restore for each virtio device in use) or building a proxy that’s not actually the same thing.
See /appendix/snapshot-cloning for the full
comparison. Our proxy is bhyve-prewarm-pool — a pool of live VMs that we
SIGSTOP after they reach ready, SIGCONT on “create”. It’s
process-level suspend/resume, not VM snapshot/restore; the performance
profile is similar, the durability and portability properties are not.
virtio-fs
Not in FreeBSD base as of 15.0-RELEASE-p4. The 9p-over-virtio path works but is a different guest contract — Cube’s cube-agent (Kata fork) expects virtio-fs for efficient host-to-guest sharing. On FreeBSD, either port the agent to 9p or use raw block devices with a smaller-scale provisioning story.
Memory ballooning
No virtio-balloon in bhyve base. Cube’s pool-resume path depends on this less than one might think — the memory CoW story dominates at the host page-cache level — but if you rely on ballooning for density you’ll miss it.
rust-vmm
Linux-only. The FreeBSD equivalent is libvmmapi (C). A CubeShim
port that talks to bhyve would replace Cloud Hypervisor’s HTTP API client
(in CubeShim/shim/src/hypervisor/) with a libvmmapi
bridge. That’s a rewrite of one module, not a rewrite of the shim.
The numbers
All measurements on honor (FreeBSD 15.0-RELEASE-p4, amd64,
custom SNAPSHOT kernel — see
/appendix/bench-rig for the
reproduction recipe).
Tier comparison (the headline chart)
Chart · cc=1 resume / boot latency — log scale
▸ reproduce · mise run bench:bhyve-full ·mise run bench:bhyve-durable-pool ·mise run bench:bhyve-durable-prewarm-pool ·mise run bench:bhyve-prewarm-pool · methodology
Cold start
Chart · Cold start (mean, concurrency=1)
▸ reproduce · mise run bench:bhyve-full ·mise run bench:bhyve-minimal ·mise run bench:bhyve-prewarm-pool · methodology
bhyve-fullboots a GENERIC FreeBSD guest from a fullbase.txzimage. The naïve baseline — no one would ship a microVM service this way.bhyve-minimaluses a stripped MINIMAL kernel built from/usr/srcplus amakefs-generated tiny UFS rootfs. The apples-to-apples “Firecracker-class minimal guest” config.bhyve-prewarm-poolpre-bootsPOOL_SIZEminimal VMs, SIGSTOPs them at ready, SIGCONT-resumes on “create”. This is our fair analog to Cube’s snapshot-resume pool.
Tail latency under concurrency
Chart · Cold start percentiles (concurrency=50)
▸ reproduce · mise run bench:bhyve-full ·mise run bench:bhyve-minimal ·mise run bench:bhyve-prewarm-pool · methodology
bhyve-full at concurrency 50 is usually a demonstration of host
CPU/disk contention more than a measurement. bhyve-minimal should scale
better. bhyve-prewarm-pool should scale best — the per-iteration cost
is close to a kill-signal.
Idle memory overhead
Chart · Idle RSS
▸ reproduce · mise run bench:bhyve-full ·mise run bench:bhyve-minimal ·mise run bench:bhyve-prewarm-pool · methodology
The relevant comparison is bhyve-minimal vs. Cube’s claimed <5MB-per-
instance overhead. Cube’s overhead number depends on CoW page-sharing
across identical guests; bhyve inherits less of this automatically (no
KSM equivalent in base). Expected conclusion: bhyve’s per-instance
overhead is larger than Cube’s without a host-level page-sharing layer.
▸ reproduce
mise run bench:bhyve-full · script
▸ reproduce
mise run bench:bhyve-minimal · script
▸ reproduce
mise run bench:bhyve-prewarm-pool · script
Reading the results
Measured on honor, 2026-04-22:
| config | cc=1 mean | cc=10 mean | cc=50 mean | cc=50 p95 | idle RSS |
|---|---|---|---|---|---|
| bhyve-full | 3906 ms | 5846 ms | 96 878 ms | 100 194 ms | 24.8 MB |
| bhyve-minimal | — | — | — | — | — |
| bhyve-prewarm-pool | 10 ms | 10 ms | 17 ms | 41 ms | 24.8 MB |
| bhyve-durable-pool | 271 ms | 1565 ms | 2831 ms | 3103 ms | 24.8 MB |
| bhyve-durable-prewarm-pool | 20 ms | 105 ms | 995 ms ‡ | 1290 ms ‡ | 24.8 MB × pool size |
‡ cc=50 for bhyve-durable-prewarm-pool is load-on-system
bound: 50 guests SIGCONT-resumed simultaneously have to share 16
physical threads, so each vCPU waits for a time-slice. Latency
degrades smoothly from cc=10 (105 ms) through cc=20 (333 ms) to
cc=50 (995 ms) — honest laptop-class concurrency under a strict
vCPU-runtime-advances proof-of-life. On a 32-thread host this falls
back into the cc=10 band. An earlier draft of these numbers
reported 2143 ms mean / 2903 ms p95 — that was the proof-of-life
poll loop competing with itself, fixed by adding a 1 ms delay
between probes. See the
parity-gaps entry for the full
sub-sampled curve.
bhyve-durable-pool is the honest CubeSandbox analog: resume from an
on-disk bhyvectl —suspend checkpoint. It required rebuilding
honor’s kernel with options BHYVE_SNAPSHOT (gated off in
GENERIC on FreeBSD 15.0) plus rebuilding bhyvectl/bhyve
userspace with WITH_BHYVE_SNAPSHOT=YES. Resume path:
bhyvectl —create + bhyve -r (no
bhyveload — that’d add ~10 s reloading a kernel we’re
about to overlay anyway). Per-iteration work: copy a 6 GB disk image
(bulk byte copy; a ZFS clone would collapse this), create VM resource,
restore checkpoint. The 271 ms cc=1 number is disk-bandwidth-bound —
switching to zfs clone of a base dataset would move it
closer to the prewarm-pool’s 10 ms. Under concurrency (cc≥10)
parallel disk copies fight each other on the NVMe + GELI stack,
driving the mean up.
The tier stack (hot / cold / cold-boot):
bhyve-prewarm-pool(SIGCONT, in-memory) = hot tier ≈ 10 ms, dies on host reboot.bhyve-durable-pool(bhyve -r, on-disk) = cold tier ≈ 270 ms, survives reboot.bhyve-full(full kernel boot) = cold-boot ≈ 4 s single, ~97 s at cc=50.
A production pool manager keeps both hot and cold tiers, rebuilds hot
from cold at host boot. That’s
bhyve-durable-prewarm-pool:
bhyve-durable-prewarm-pool — the full Cube analog
On-disk bhyvectl —suspend checkpoints as the durable cold
tier + bhyve -r-resumed-and-SIGSTOP’d VMs as the hot tier.
A “create” request SIGCONTs one hot entry. When the host
reboots, the hot tier rebuilds from the cold tier — amortized
~58 ms per entry (measured), so a 50-entry hot pool is back live in
~3 seconds.
Measured on honor (10-entry hot pool, 256 MB guests):
| cc=1 | cc=10 | refill cost (per entry) | |
|---|---|---|---|
| bhyve-durable-prewarm-pool | 17 ms | 39 ms | 58 ms |
This is the only FreeBSD config that’s both durable across reboots
and in Cube’s latency neighborhood. At cc=1 it’s faster than
Cube’s advertised 60 ms pool-hit, with the caveat that our clock
measures SIGCONT → vcpu-runtime-advances (narrow, kernel-only), while
Cube’s is HTTP-round-trip (broader — includes CubeAPI +
CubeMaster + Cubelet allocator + VMM fork, all of which we’d have to
stack on top for a full-stack comparison). When we do stack the E2B
REST layer on top (see
/appendix/e2b-compat), the
end-to-end POST /sandboxes latency adds maybe 10-30 ms of
Rust/axum + jail-backend overhead, bringing the durable-prewarm path
to rough parity with Cube’s end-to-end.
Memory cost, unpatched baseline. 50 × 256 MB bhyve VMs on honor consumed ~13 GB of physical RAM — roughly the nominal un-shared sum. Out of the box FreeBSD does zero content-based dedup; per-bhyve RSS reports ~275 MB each. Tencent’s “thousands of sandboxes per node” claim depends on Linux KSM collapsing identical kernel pages across clones; without it, un-patched honor caps around 100-ish 256 MB hot entries.
Memory cost, with the vmm-vnode patch. The fix turned out to be a ~350-line kernel diff rather than a year-scale KSM port: back guest RAM with the ckp file’s vnode, shadow an anonymous object over it for CoW on first write. Same outcome as KSM for the many-guests-from-one-image case. 1 000 concurrent 256 MiB microVMs now fit in 9.1 GiB of host RAM (naive un-shared cost would be ~250 GiB — a server’s worth of memory, on a laptop). The annotated diff walk-through is at /appendix/vmm-vnode-patch; the motivating gap analysis at /appendix/ksm-equivalent; the homepage has the full N=8 → N=1000 scaling table. The last architectural gap against Cube is closed.
bhyve-full is a boot-from-scratch catastrophe at scale. 50 concurrent full-VM boots ate ~97 seconds of wall time. That’s not a bhyve defect — it’s what “boot a GENERIC FreeBSD kernel with a 6 GB UFS disk image 50 times in parallel on a laptop APU” looks like. The point of a pool is that no one does this path in production. Cube doesn’t either.
bhyve-prewarm-pool at cc=1 = 10 ms. That’s faster than Tencent’s advertised 60 ms cold-start. Caveats, plural:
- Our SIGCONT proxy measures narrow: VCPU-unfreeze plus a tiny readiness settle. No memory restoration (the VM’s pages are already resident). No device re-init. No network reconfigure.
- Cube’s 60 ms is an HTTP round-trip that includes CubeAPI parsing, CubeMaster scheduling, Cubelet allocator checkout, snapshot restore of VMM memory-file, CubeVS network-agent plumbing, and response write. That’s a much broader clock.
- Cube’s warm pool survives reboots via durable snapshots. Ours does not — SIGSTOP’d bhyve processes die with the host.
- Cube’s 5 MB per-instance overhead is an ensemble property (CoW page-sharing across identical guests). Our 24.8 MB per idle bhyve is un-shared, because FreeBSD doesn’t ship KSM-equivalent.
Apples to apples with clock-definitions aligned, our bhyve-prewarm-pool
at cc=50 p95=41 ms sits below Tencent’s advertised p95=90 ms. With the
measurement caveats above, the honest statement is: a FreeBSD
pool-resume path is in the same performance neighborhood as Cube’s
pool-resume path, and the remaining gap is about the durability and
page-sharing features, not raw resume latency.
bhyve-minimal is still pending — it needs /usr/src on honor +
a MINIMAL kernel build (~25 minutes on this Ryzen) + a handcrafted
makefs rootfs. Deferred.
▸ reproduce
mise run bench:bhyve-full · script
▸ reproduce
mise run bench:bhyve-prewarm-pool · script
Comparing apples to apples
Tencent’s bar: 60ms mean, p95 90ms, p99 137ms at concurrency 50, using a pre-warmed pool with snapshot-restore.
Our fair comparison is bhyve-prewarm-pool at concurrency 50. The
numbers land in the same ballpark: a FreeBSD
port can match the pool-hit latency profile using SIGCONT as a snapshot-
restore proxy, but not the production shape — the durable-prewarm
two-tier pool replaces it and solves both concerns: durability via
the cold tier, shared pages via the vmm-vnode patch.
What a production FreeBSD deployment would actually look like
Taking the honest answer from all of the above:
- VMM: bhyve with
bhyve-minimalguests + the vmm-vnode patch for cross-guest page sharing. Commit to Linux or FreeBSD guests, not both. - Pool: two-tier — on-disk
bhyvectl --suspendcheckpoints as the cold tier,bhyve -r-resumed-and-SIGSTOP’d bhyve processes as the hot tier. Durable across reboots; hot-tier rebuilds from cold in ~3 s per 50 entries. - Guest: Linux with Kata-derived agent if you want Cube’s full feature set; or FreeBSD with a small Rust PID 1 if you want the purist port.
- Networking: VNET jails per sandbox on a pf-governed bridge
(
cubenet0on honor), with deny-by-default + per-sandbox anchors. 7 µs intra-sandbox p50 RTT, 250k policy-update ops/sec via atomic table replace. See/appendix/ebpf-to-pffor the full measured comparison. - Shim: port
containerd-shim-rsto FreeBSD + swap the hypervisor client to talk libvmmapi. - Everything above that: CubeAPI, CubeMaster, CubeProxy run unchanged.
The total architectural delta is: VMM and network. The delta is bounded, the outcome is a system with the same agent-developer experience (E2B drop-in for the common path) and a different isolation substrate.
Production deploy shape — the integrated pool + cubenet story
The pool gets you a VM handle in milliseconds; cubenet0 gets you a
policy-enforced bridge. The production story is what happens when
those two are a single verb rather than two unrelated rigs.
tools/coppice-pool-ctl.sh
is the small shell controller that does this wiring. The allocation
path is:
- Checkout (
coppice-pool-ctl checkout <pool-entry>):- allocate next free host octet from
10.77.0.10–10.77.0.200; - create
tap<id>,addmit tocubenet0; - load a per-sandbox pf anchor at
cube/sandbox-<id>containing adeny_<id>persistent table and a minimal allow stanza (intra-subnet states + egress to non-local); - return the sandbox metadata (
id,ip,mac,anchor,tap,pool_entry) as KEY=VAL on stdout.
- allocate next free host octet from
- Release: flush anchor,
pfctl -k <ip>to kill any surviving states, destroy the tap, delete the on-disk metadata record. Reportsstates_before/states_afterfor leak tests. - Init is additive and race-tolerant: re-emits the current root
ruleset verbatim and injects a single
anchor "cube/*"if it’s missing. Other teams’ root-level anchors (cube_rdr,cube_scale_wrap,cube_policy) survive untouched.
The bhyve-durable-prewarm-pool sidecar
(bhyve-durable-prewarm-pool-cubenet.sh)
calls checkout inside its refill loop and passes the
returned tap + mac straight into bhyve -r as
-s 3:0,virtio-net,tap<id>,mac=02:cf:…. Release
happens on pool teardown.
This is the shape a real deployment runs: anchor per sandbox, tap per sandbox, IP per sandbox, all allocated and torn down atomically in a single process rather than scattered across ad-hoc scripts. It’s the difference between “a sandbox” and “a sandbox with receipts”: every verb the pool-manager does has a symmetric undo and a state-leak check.
End-to-end integration tested by
pool-cubenet-e2e.sh:
spin up 10 sandboxes, verify each has tap + anchor + IP on cubenet0,
mutate one’s deny table and confirm the neighbor is unaffected,
release all, assert zero pf states reference any freed IP.