Durable snapshots

Of everything on CubeSandbox’s README, one capability is explicitly labelled coming soon: durable snapshots. Capture a running sandbox as a reusable named fork-point, then clone that point into N sibling sandboxes later. Cube’s docs describe the shape but haven’t shipped the endpoint. We have now, on the jail backend, and the receipts are below. Filesystem state is preserved; in-memory process state is not — the bhyve path that does live-memory resume (17 ms p50, see snapshot-cloning) is a separate wave and this v1 cold-starts the cloned rootfs.

What a snapshot is

A snapshot on the jail backend is a named zfs snapshot of the live sandbox’s dataset, plus a small metadata record persisted to /var/lib/coppice/snapshots.json. The ZFS snapshot itself is the cheap part: ZFS marks the current block-allocation tree read-only, creation is constant-time regardless of dataset size, and subsequent writes to the source dataset or any clone pay only the block-deltas. A handful of hex chars in the filename is the only thing that shows up in zfs list.

The metadata record carries what the backend can’t recover from zfs list alone:

The file is rewritten atomically (tmpfile + rename) on every mutation. Corrupt file at startup → logged-and-ignored, registry starts empty rather than crash-looping the gateway.

What a fork is

A fork is a new sandbox whose root dataset is a zfs clone of the snapshot. The rest of the setup — epair pair, attach to coppicenet0, jail -c with VNET, rctl caps — is identical to a fresh create, literally the same code path (a shared stand_up_vnet_jail helper on the backend). That’s deliberate: fork parity with create means every feature that lands on the create path (DNS wiring, lo0-up, pf anchors) shows up on fork for free.

The caller addresses the snapshot, not the source sandbox. Once a snapshot exists, the source can be destroyed and the snapshot remains a valid fork target. The inverse is also true: if a fork is alive and the snapshot hasn’t been deleted, destroying the snapshot hits ZFS’s dataset is busy (because of the dependent clone) and the gateway surfaces that as 409 Conflict. Destroy the forks first; then the snapshot deletes cleanly.

Measured fork latency

The rig at benchmarks/rigs/snapshot-fork.sh times the full POST /snapshots/:id/forkjexec <jail> true-reachable round-trip. On honor against the python template the wall clock is ~15 s — but that figure is dominated by the gateway’s best-effort ipykernel-spawn poll (same 15 s budget as a cold POST /sandboxes, and a cold create also returns in 15.1 s via curl timing, so fork is at parity). The underlying ZFS-clone + epair + jail -c step completes in ~40 ms, observable in the gateway’s trace span between sandbox.snapshot and sandbox.fork. A fork of a snapshot from a template where ipykernel comes up cleanly (the browser template) returns in that 40 ms band end-to-end. The kernel-spawn timeout is a known artifact of the python shorthand’s template on this host, not a property of the fork path.

The bhyve durable-prewarm-pool (see snapshot-cloning) gets 17 ms p50 resume by mmap-restoring a suspended vCPU against an already- booted guest kernel. That’s the live-memory-resume path, which preserves in-memory state too — a different shape of capability, a different receipt, and a different substrate (vmm-vnode-patched bhyve rather than a VNET jail). Out of scope for v1; tracked for a later wave.

API surface

Five endpoints, one CLI family.

POST   /sandboxes/:id/snapshots   → 201 {snapshotID, ...}
GET    /snapshots                 → 200 [{...}]
GET    /snapshots/:snapshot_id    → 200 {...} | 404
DELETE /snapshots/:snapshot_id    → 204 | 409 (in use)
POST   /snapshots/:snapshot_id/fork
                                  → 201 {sandboxID, ...}
coppice snapshot create <sandbox-id> [--description "..."]
coppice snapshot list [--json]
coppice snapshot show <snapshot-id>
coppice snapshot fork <snapshot-id> [--cpu N] [--mem MiB]
                                    [--disk MiB] [--json]
coppice snapshot delete <snapshot-id>

Request body for create is optional; description is the only field. Fork body accepts cpuCount, memoryMB, diskSizeMB — same shape as the create request but no templateID, since that’s implied by the snapshot. Response shapes are camelCase throughout (snapshotID, sourceSandboxID, createdAt, zfsSnapshot) so the E2B SDK’s JSON-decoder conventions apply with no custom plumbing.

Startup reconstitution

The backend loads /var/lib/coppice/snapshots.json synchronously in FreeBSDJailBackend::new(), then (once the tokio runtime is up) main.rs calls reconcile_snapshots_with_zfs, which shells out to zfs list -t snapshot -H -o name and drops any entry whose underlying snap vanished between runs. That covers the failure mode where someone zfs destroy’d a snapshot by hand, or a pool import partially failed. The registry stays in lock-step with reality.

The source-sandbox side has no such cross-check. A snapshot whose sourceSandboxID points at a destroyed sandbox is still a valid fork target — the ZFS snap is the fork source, the live sandbox was only ever a convenience for reading the current rootfs. Destroying the source is deliberately allowed.

Out of scope (v1)

Cross-refs