Of everything on CubeSandbox’s README, one capability is explicitly labelled coming soon: durable snapshots. Capture a running sandbox as a reusable named fork-point, then clone that point into N sibling sandboxes later. Cube’s docs describe the shape but haven’t shipped the endpoint. We have now, on the jail backend, and the receipts are below. Filesystem state is preserved; in-memory process state is not — the bhyve path that does live-memory resume (17 ms p50, see snapshot-cloning) is a separate wave and this v1 cold-starts the cloned rootfs.
What a snapshot is
A snapshot on the jail backend is a named zfs
snapshot of the live sandbox’s dataset, plus a small metadata
record persisted to /var/lib/coppice/snapshots.json.
The ZFS snapshot itself is the cheap part: ZFS marks the current
block-allocation tree read-only, creation is constant-time
regardless of dataset size, and subsequent writes to the source
dataset or any clone pay only the block-deltas. A handful of hex
chars in the filename is the only thing that shows up in
zfs list.
The metadata record carries what the backend can’t recover from
zfs list alone:
snapshotID— 12-hex short uuid, the caller’s handle.sourceSandboxID— for provenance. Deleted if the source sandbox is destroyed, but the snapshot lives on.templateID— the registry entry the source was cloned from. Forks inherit it soGET /sandboxes/:idreturns the right template string.createdAt,zfsSnapshot,description— plain bookkeeping.
The file is rewritten atomically (tmpfile + rename) on every mutation. Corrupt file at startup → logged-and-ignored, registry starts empty rather than crash-looping the gateway.
What a fork is
A fork is a new sandbox whose root dataset is a
zfs clone of the snapshot. The rest of the setup —
epair pair, attach to coppicenet0, jail -c
with VNET, rctl caps — is identical to a fresh create, literally
the same code path (a shared stand_up_vnet_jail
helper on the backend). That’s deliberate: fork parity with create
means every feature that lands on the create path (DNS wiring,
lo0-up, pf anchors) shows up on fork for free.
The caller addresses the snapshot, not the source sandbox. Once
a snapshot exists, the source can be destroyed and the snapshot
remains a valid fork target. The inverse is also true: if a fork
is alive and the snapshot hasn’t been deleted, destroying the
snapshot hits ZFS’s dataset is busy (because of the
dependent clone) and the gateway surfaces that as
409 Conflict. Destroy the forks first; then the
snapshot deletes cleanly.
Measured fork latency
The rig at benchmarks/rigs/snapshot-fork.sh times the
full POST /snapshots/:id/fork → jexec <jail>
true-reachable round-trip. On honor against the python
template the wall clock is ~15 s — but that figure is dominated by
the gateway’s best-effort ipykernel-spawn poll (same 15 s budget as
a cold POST /sandboxes, and a cold create also returns
in 15.1 s via curl timing, so fork is at parity). The underlying
ZFS-clone + epair + jail -c step completes in
~40 ms, observable in the gateway’s trace span
between sandbox.snapshot and sandbox.fork.
A fork of a snapshot from a template where ipykernel comes up
cleanly (the browser template) returns in that 40 ms band
end-to-end. The kernel-spawn timeout is a known artifact of the
python shorthand’s template on this host, not a property
of the fork path.
The bhyve durable-prewarm-pool (see snapshot-cloning) gets 17 ms p50 resume by mmap-restoring a suspended vCPU against an already- booted guest kernel. That’s the live-memory-resume path, which preserves in-memory state too — a different shape of capability, a different receipt, and a different substrate (vmm-vnode-patched bhyve rather than a VNET jail). Out of scope for v1; tracked for a later wave.
API surface
Five endpoints, one CLI family.
POST /sandboxes/:id/snapshots → 201 {snapshotID, ...}
GET /snapshots → 200 [{...}]
GET /snapshots/:snapshot_id → 200 {...} | 404
DELETE /snapshots/:snapshot_id → 204 | 409 (in use)
POST /snapshots/:snapshot_id/fork
→ 201 {sandboxID, ...}
coppice snapshot create <sandbox-id> [--description "..."]
coppice snapshot list [--json]
coppice snapshot show <snapshot-id>
coppice snapshot fork <snapshot-id> [--cpu N] [--mem MiB]
[--disk MiB] [--json]
coppice snapshot delete <snapshot-id>
Request body for create is optional; description is the
only field. Fork body accepts cpuCount,
memoryMB, diskSizeMB — same shape as the
create request but no templateID, since that’s
implied by the snapshot. Response shapes are camelCase throughout
(snapshotID, sourceSandboxID,
createdAt, zfsSnapshot) so the E2B SDK’s
JSON-decoder conventions apply with no custom plumbing.
Startup reconstitution
The backend loads /var/lib/coppice/snapshots.json
synchronously in FreeBSDJailBackend::new(), then (once
the tokio runtime is up) main.rs calls
reconcile_snapshots_with_zfs, which shells out to
zfs list -t snapshot -H -o name and drops any entry
whose underlying snap vanished between runs. That covers the
failure mode where someone zfs destroy’d a snapshot by
hand, or a pool import partially failed. The registry stays in
lock-step with reality.
The source-sandbox side has no such cross-check. A snapshot whose
sourceSandboxID points at a destroyed sandbox is
still a valid fork target — the ZFS snap is the fork source, the
live sandbox was only ever a convenience for reading the current
rootfs. Destroying the source is deliberately allowed.
Out of scope (v1)
- In-memory resume.
jail -cstarts a fresh process table in the forked jail. The python or chromium processes that were running in the source at snapshot time do not come back. Every practical agent workflow we’ve measured treats the sandbox’s filesystem state as the useful carrier — the file the model wrote, the venv that got populated — and spawns fresh processes on fork. When live-memory resume matters (interactive REPL paused mid-computation), use the bhyve path. - Cross-host snapshot transfer.
zfs send | ssh honor2 zfs recvis the obvious mechanic and the registry is JSON so the metadata is portable. Not wired today. Tracked in the cluster-overlay row on cubesandbox-feature-audit.
Cross-refs
- snapshot-cloning — the bhyve live-memory-resume path (17 ms p50).
- vnet-jail — the per-sandbox VNET layout the fork path reuses verbatim.
- cubesandbox-feature-audit — the row that moved from open to closed with this page’s shipping.