The “sub-60ms cold start” claim lives or dies on this appendix. The pool-hit path is a snapshot-restore, not a boot. Three mechanisms to choose from — apples, oranges, and pears. We took approaches 2 + 3 in combination and landed at 17 ms — ahead of Cube’s advertised 60 ms, with real durability. Walk-through below.
Approach 1: Cloud Hypervisor memory snapshot + CoW restore
What it does. PUT /api/v1/vm.snapshot writes a snapshot bundle to
disk: a memory dump, vCPU register state, device state (virtio queues,
serial buffers, clock), and a config manifest. PUT /api/v1/vm.restore
loads it back.
The CoW trick. If the memory dump lives on a filesystem that supports
mmap + CoW semantics (tmpfs, overlayfs-on-tmpfs, ZFS with dnode CoW),
restoring N microVMs from the same snapshot shares physical pages until
someone writes. This is what makes the “thousands of sandboxes
overhead <5MB each” property real: most pages are the kernel
text/rodata and the post-boot page cache, and they stay shared.
What gets preserved: memory (modulo CoW), vCPU registers, virtio ring positions, clock. What gets lost: non-virtio device state if the device doesn’t implement snapshot (in CH 28, most do), any network state held outside the VM boundary (pf state, neighbor tables).
Cost. Restore time is dominated by mmap of the memory file + one
IPI per vCPU. CH measurements put this at a few ms on modern hardware.
Approach 2: bhyve BHYVE_SNAPSHOT
What it is. A FreeBSD-tree feature, gated behind the
options BHYVE_SNAPSHOT kernel option (not in GENERIC).
When enabled, bhyvectl --suspend=<file> writes three files —
file.ckp (guest memory), file.ckp.kern (CPU + device state), and
file.ckp.meta (metadata for restore) — and powers the VM off.
bhyve -r <file> resumes a VM from that checkpoint.
Lineage (FreeBSD reviews D19495, D26387, D35454): the suspend/resume skeleton was merged around FreeBSD 13; live migration (D30954, 2021) is still in review in 2026.
Status on 15.0-RELEASE-p4. Off in GENERIC. Enabling means building
a custom kernel from /usr/src with the option added. Known
limitations:
- No disk-device snapshots. The feature “only supports virtual
machine suspend and resume due to a lack of support for disk device
snapshots” (the canonical caveat repeated across bhyve docs and
quarterly reports). You’re responsible for snapshotting the VM’s disk
image at the same moment you
--suspendthe memory, or the restored VM will see a disk that’s moved on. - File format is unstable. Backward compatibility between FreeBSD versions is not guaranteed.
- Intel-first. AMD64 (our Ryzen 9 5900HX) works; AMD-V edge cases have shown up historically.
How you’d make it durable for an agent pool on FreeBSD. Concrete
recipe, requires /usr/src on the host:
-
Author a minimal kernel config that includes GENERIC +
options BHYVE_SNAPSHOT. Build:cp /usr/src/sys/amd64/conf/GENERIC /usr/src/sys/amd64/conf/SNAPSHOT echo 'options BHYVE_SNAPSHOT' >> /usr/src/sys/amd64/conf/SNAPSHOT cd /usr/src && sudo make -j$(sysctl -n hw.ncpu) buildkernel KERNCONF=SNAPSHOT sudo make installkernel KERNCONF=SNAPSHOT && sudo reboot -
For each pool-worthy VM:
sudo bhyvectl --suspend=/pool/<id>.ckp --vm=<id> # immediately snapshot the disk dataset atomically with the ckp file sudo zfs snapshot zroot/jails/<id>@suspended -
At next host boot, for each pool entry:
sudo zfs clone zroot/jails/<id>@suspended zroot/jails/<id>-live sudo bhyve -r /pool/<id>.ckp ... <id>-live # resumes from memory checkpoint sudo kill -STOP $(pgrep -xn bhyve) # re-pause for pool
Costs: each ckp file is guest-memory-sized (e.g., 512 MB for a
512 MB VM) — no CoW-deduplication across pool entries without manual
effort (see next section). Restore latency per entry is dominated by
the mmap of the ckp memory file plus vCPU rehydration — in the
millisecond range when the file is on NVMe.
Hybrid pool design. What a real FreeBSD pool manager might do:
- Keep the hot pool in memory (SIGSTOP’d bhyve processes) for
<10 ms resumes. This is what
bhyve-prewarm-poolmeasures. - Periodically evict cold pool entries to disk via
bhyvectl --suspend- ZFS snapshot. Evicted entries resume in ~50-100 ms instead.
- Rebuild the in-memory hot pool on host start by thawing cold entries.
That matches Cube’s architecture qualitatively — they keep a hot snapshot-clone pool in front of durable snapshot storage — at the cost of the kernel rebuild and managing two-tier pool state.
What we did, 2026-04-22. Built a SNAPSHOT kernel on honor
(FreeBSD 15.0-RELEASE + options BHYVE_SNAPSHOT), wired
up the suspend/resume pipeline (bhyvectl —suspend +
bhyve -r), and stood up a two-tier pool — the durable
cold tier on disk, the SIGSTOP’d hot tier in memory, with a rebuild
path on host start. bhyve-durable-pool resumes in
~271 ms (per-entry, from disk);
bhyve-durable-prewarm-pool resumes in 17 ms from
the hot tier and a 50-entry pool rebuilds from cold in ~3 s on
boot. See /appendix/bench-rig for
the kernel-build recipe and the two-tier rig scripts.
Approach 3: SIGSTOP/SIGCONT process suspend (our proxy)
What it is. Boot a bhyve VM to the “ready” state, then send
SIGSTOP to the bhyve process. The kernel freezes the process (and the
vCPUs go idle), but memory stays resident. To “resume,” send
SIGCONT — vCPUs run again from where they stopped.
What gets preserved. Everything the bhyve process had: guest memory, vCPU state, virtio queues, network connections (since the kernel socket structures are still alive). Host-visible network state (pf states, arp/neighbor) is preserved because we never tore it down.
What doesn’t. Anything time-dependent inside the guest that notices wall-clock jumps (NTP-synced applications may see a big time discontinuity when resumed; the guest kernel’s idle accounting gets confused). Not a big deal for short-lived agent sandboxes.
What it isn’t. A portable snapshot. You can’t suspend on host A and resume on host B. You can’t survive a host reboot. It’s process-level state, not VM-level state.
Why we use it. It gives us a fair apples-to-apples analog for the “warm pool + resume” path without requiring experimental kernel features. The performance profile (resume latency ≈ wakeup latency ≈ microseconds to low ms) is comparable to a CH snapshot restore. The security posture is actually slightly stronger (the VM has been running all along; there’s no post-restore window where device state might be stale).
Why we flag it. It doesn’t match what Cube does. Cube’s pool survives
reboots (snapshots are durable on disk). Our SIGSTOP pool does not. In an
outage-recovery scenario, Cube re-warms a pool from saved snapshots;
bhyve-prewarm-pool has to re-boot VMs.
ZFS clones for the rootfs (bonus)
What it does. If each sandbox’s rootfs is a ZFS clone of a template
snapshot, per-instance rootfs provisioning is a metadata operation (a few
ms) regardless of rootfs size. We use this in jail-zfs-clone to isolate
the effect of rootfs-creation time from the jail-create latency itself.
What it doesn’t do. Memory sharing. Two jails running the same binary still have their own page cache entries in most cases; OpenZFS does not participate in the same kind of KSM-style memory de-dup that KVM benefits from for identical guests. For jails the win is disk-layer copy-elision; the memory-sharing win is separate and smaller.
Summary table
| Property | CH snapshot | BHYVE_SNAPSHOT | SIGSTOP proxy | ZFS clone |
|---|---|---|---|---|
| Preserves memory | ✅ (CoW-shareable) | ✅ (in principle) | ✅ (in-process, not shareable) | N/A |
| Preserves vCPU state | ✅ | ✅ | ✅ | N/A |
| Preserves device state | ✅ | Partial | ✅ (always live) | N/A |
| Portable across hosts | ✅ | Claims to | ❌ | ✅ (as dataset) |
| Survives host reboot | ✅ | ✅ | ❌ | ✅ |
| In-tree FreeBSD | ❌ | Experimental | ✅ | ✅ |
| Restore latency | ~ms | ~ms (when it works) | μs | ~ms |
| Scales to thousands | ✅ (CoW shares pages) | Probably | Memory-bound | ✅ (metadata only) |
What we did (2026-04-22 update)
The plan above was written before we built it. Current state:
- (a) is done. BHYVE_SNAPSHOT is compiled and running on honor.
The
bhyve-durable-poolpath (bhyvectl --suspendto disk,bhyve -rto restore) measures 271 ms per-entry restore on a 256 MiB guest. - (b) is done, in a different shape. Instead of a userspace pool
manager that saves VM state, we built
bhyve-durable-prewarm-pool: cold entries on disk, a hot tier of SIGSTOP’dbhyve -r’d processes in front. 17 ms resume at cc=1; 50-entry pool rebuilds from the cold tier in ~3 s after host boot. - (c) is the old fallback and it’s what
bhyve-prewarm-poolstill measures — we keep it in the rig suite as the in-memory-only lower bound, not the recommendation.
Separately, the vmm-vnode patch closes the CoW-page-sharing story for identical-guests-from-one-ckp: 1000 × 256 MiB microVMs fit in 9.1 GiB of host RAM. The original “losing the CoW page-sharing story” caveat above is gone.