Cube’s docs call out signed OCI images as the supply-chain story for
their template pipeline. On Coppice the analogue is FreeBSD’s
signify(1) — already in base, already audited, already
what the OS uses for release sets — pointed at the thing that
uniquely names a ZFS snapshot: its guid. Every template
@base snapshot has a companion <name>.sig
on disk; the gateway reads it on the create hot path and refuses to
zfs clone anything whose signature doesn’t match.
Why sign the guid, not the send stream
The tempting thing is to zfs send a template dataset,
sign that stream, and ship .zfs.sig alongside the
.zfs file. We tried that first. The problems:
- It’s slow. Verifying means re-reading the whole
dataset through
zfs sendto feedsignify -V. For the chromium template that’s ~2 GB per sandbox create — the create hot path would grow a minute of disk I/O for a property that costs microseconds on paper. - It’s brittle. ZFS send streams aren’t stable across minor version differences in the encoder; a host upgrade can invalidate every sig without touching the underlying data.
The guid is the right thing to sign instead. Every ZFS snapshot has
a 64-bit guid that’s assigned at creation and preserved across
zfs send | zfs receive. zfs get -Hpo value guid
<snap> returns it in under a millisecond. Signing the
guid binds the signature to a specific on-disk tree without
re-reading that tree, so verify is two shell-outs:
zfs get guid + signify -V. Both finish
well under 10 ms.
The trust chain is: operator generates a keypair on their laptop;
pubkey gets installed at /etc/coppice/pubkey on the
gateway; privkey never leaves the laptop. Every template mutation
(build a new chromium, patch python) ends with one local
coppice tpl sign <name> that writes the new
.sig file. The gateway’s view of which templates
exist is unchanged; the only new artefact is the sig next to the
dataset.
The wire
Sig files are keyed on the full snapshot identity — template name and the ZFS snapshot name the signature covers:
// Sig file layout (new scheme)
/var/db/coppice/sigs/
_template@base-dns-20260422.sig # dated @base snapshot for the default
browser@base.sig
vscode@base.sig
python@base.sig
// Legacy layout (still honored as a fallback)
_template.sig
browser.sig
// Pubkey (verify side)
/etc/coppice/pubkey
Keying on <name>@<snap> fixes a gotcha that
used to bite us when a template re-cut its @base with a
dated suffix (e.g. @base-dns-20260422). Under the old
name-only layout the sig on disk still read _template.sig,
held the previous snapshot’s guid, and the next
zfs get guid zroot/jails/_template@base-dns-20260422
returned a different value — so sandbox-create 403’d with a confusing
guid-mismatch even though the dataset was healthy. The new keying
lets every dated snapshot carry its own sig side-by-side.
The sig file is a signify(1) embedded-message signature
(signify -S -e), which means the signed payload (the
guid) is carried inside the sig envelope and recovered on verify
with signify -V -e. The double-check — signify’s own
cryptographic verify plus our equality check of recovered-guid vs
live-guid — catches both “wrong pubkey / tampered sig” and “signed
a different snapshot than the one we’re about to clone” in a single
pass.
The create-time gate
FreeBSDJailBackend::create_with_limits runs the verify before
the zfs clone:
// e2b-compat/src/backend/freebsd_jail.rs (condensed)
match self.verify_template(&template_name, &clone_source).await {
VerifyOutcome::Ok => {}
VerifyOutcome::Missing => {
if signify::require_signed_env() {
return Err(BackendError::Unauthorized(...)); // → 403
}
// else: warn, proceed
}
VerifyOutcome::Invalid(reason) => {
return Err(BackendError::Unauthorized(...)); // → 403
}
}
self.run("zfs", &["clone", &clone_source, &ds]).await?;
The policy is deliberately asymmetric. A missing sig is a soft
warning by default (development convenience: a freshly-built
template works without a signing round-trip); set
COPPICE_REQUIRE_SIGNED_TEMPLATES=1 in the gateway’s
environment to harden it into a rejection. An invalid sig
is always fatal — a tampered sig is strictly worse than a missing
one because someone tried to lie.
Every verify bumps
coppice_template_verifications_total{template,status}
on /metrics. Status is one of ok /
ok_legacy / missing / invalid;
the Prometheus side can alert on
rate(…{status=“invalid”}[5m]) > 0 as a
first-class tampering signal, and
…{status=“ok_legacy”} > 0 as a soft prompt
to re-sign a template with the new-scheme filename.
Migration from name-only keying
Existing <name>.sig files are still honored — the
gateway and coppice-verify-template.sh both try the
new-scheme path first and fall back to the legacy one, so a gateway
rolled forward before an operator has re-signed sees no service
disruption. The only visible difference is that
/metrics reports status=“ok_legacy” for
those templates, and the gateway logs a
falling back to legacy-named sig; re-sign with new-scheme filename
warning once per create.
To clear that warning, re-sign each template against its current canonical snapshot:
coppice tpl sign _template@base-dns-20260422
coppice tpl sign browser # default @base
coppice tpl sign vscode
# Optional rollback for the cautious operator:
coppice-sign-template.sh --legacy-filename <name>
# → writes the old <name>.sig path. Useful only for migration testing.
Leave the legacy <name>.sig files on disk while the
new ones roll out — they’re the fallback safety net. Once
ok_legacy has stayed at zero across a few days of
deploys, the old sigs are an operator chore to delete.
Operator workflow
One-time setup on the operator’s laptop:
# Laptop (privkey never leaves this machine)
signify -G -c "coppice template signer" \
-p /tmp/coppice-pub \
-s /tmp/coppice-priv
scp /tmp/coppice-pub honor:/tmp/
ssh honor "sudo install -o root -m 0644 /tmp/coppice-pub /etc/coppice/pubkey"
Per-template signing (runs on any host with the privkey + the template dataset):
export COPPICE_SIGN_PRIVKEY=/tmp/coppice-priv
coppice tpl sign browser
coppice tpl sign vscode
coppice tpl sign _template
# Which shells out to:
# tools/coppice-sign-template.sh <name>
#
# which writes /var/db/coppice/sigs/<name>.sig.
coppice tpl verify browser
# signify -V success; guid match; exit 0
The sign tool is deliberately a shell wrapper rather than a Rust
binary so the signify command line stays audit-visible. The verify
tool exists mostly for operators — the gateway has its own
signify.rs module that does the same check on the create
hot path.
Threat model
What this closes:
- Tampered on-disk template. An attacker who gets
write access to
zroot/jails/browser-template(bad ZFS ACL, compromised backup restore) changes the dataset’s contents. The guid changes; the signed guid in the sig file does not. Verify fails; the gateway refuses to clone; alert fires. - Swapped sig file. Attacker replaces
browser.sigwith a sig from a different template (or an older version of the same template). signify’s verify still succeeds — the pubkey is right — but the recovered guid doesn’t match the live snapshot’s guid. Verify fails; refuses to clone. - Registry spoofing. An operator accidentally publishes a template under a name that already exists. The existing sig still binds to the old guid; the new dataset’s guid is different; verify fails.
What it doesn’t close:
- Signed bad content. If the operator signs a template that already has a backdoor, signify says yes. Supply- chain hygiene up to the sign step is the operator’s problem; the gateway only asserts identity after signing.
- Stolen privkey. If the laptop is compromised and
the privkey leaks, the attacker can sign their own templates and
the gateway believes them. Rotate with a new keypair +
install -m 0644of the replacement pubkey; pre-existing sigs fail verify on the next create. - Live-dataset mutation after verify. We verify
the guid, not the blocks. In practice nothing writes to a
template dataset between verify and
zfs clone— clones land underzroot/jails/e2b-*, a separate namespace — but a sufficiently motivated attacker with root on honor could race the two. The honest answer is that if you have root on the gateway, the sig check is the least of your problems.
Audit: row flipped
The feature audit’s
“Image signing / template provenance” row flips from open to
closed. The receipt: the signify roundtrip lives in
e2b-compat/src/backend/signify.rs (happy + tamper
tests), the create gate in freebsd_jail.rs, the
operator CLI at coppice tpl sign|verify, and the
/metrics counter for monitoring. #74-sign.