Image signing

Cube’s docs call out signed OCI images as the supply-chain story for their template pipeline. On Coppice the analogue is FreeBSD’s signify(1) — already in base, already audited, already what the OS uses for release sets — pointed at the thing that uniquely names a ZFS snapshot: its guid. Every template @base snapshot has a companion <name>.sig on disk; the gateway reads it on the create hot path and refuses to zfs clone anything whose signature doesn’t match.

Why sign the guid, not the send stream

The tempting thing is to zfs send a template dataset, sign that stream, and ship .zfs.sig alongside the .zfs file. We tried that first. The problems:

The guid is the right thing to sign instead. Every ZFS snapshot has a 64-bit guid that’s assigned at creation and preserved across zfs send | zfs receive. zfs get -Hpo value guid <snap> returns it in under a millisecond. Signing the guid binds the signature to a specific on-disk tree without re-reading that tree, so verify is two shell-outs: zfs get guid + signify -V. Both finish well under 10 ms.

The trust chain is: operator generates a keypair on their laptop; pubkey gets installed at /etc/coppice/pubkey on the gateway; privkey never leaves the laptop. Every template mutation (build a new chromium, patch python) ends with one local coppice tpl sign <name> that writes the new .sig file. The gateway’s view of which templates exist is unchanged; the only new artefact is the sig next to the dataset.

The wire

Sig files are keyed on the full snapshot identity — template name and the ZFS snapshot name the signature covers:

// Sig file layout (new scheme)
/var/db/coppice/sigs/
_template@base-dns-20260422.sig   # dated @base snapshot for the default
browser@base.sig
vscode@base.sig
python@base.sig

// Legacy layout (still honored as a fallback)
_template.sig
browser.sig

// Pubkey (verify side)
/etc/coppice/pubkey

Keying on <name>@<snap> fixes a gotcha that used to bite us when a template re-cut its @base with a dated suffix (e.g. @base-dns-20260422). Under the old name-only layout the sig on disk still read _template.sig, held the previous snapshot’s guid, and the next zfs get guid zroot/jails/_template@base-dns-20260422 returned a different value — so sandbox-create 403’d with a confusing guid-mismatch even though the dataset was healthy. The new keying lets every dated snapshot carry its own sig side-by-side.

The sig file is a signify(1) embedded-message signature (signify -S -e), which means the signed payload (the guid) is carried inside the sig envelope and recovered on verify with signify -V -e. The double-check — signify’s own cryptographic verify plus our equality check of recovered-guid vs live-guid — catches both “wrong pubkey / tampered sig” and “signed a different snapshot than the one we’re about to clone” in a single pass.

The create-time gate

FreeBSDJailBackend::create_with_limits runs the verify before the zfs clone:

// e2b-compat/src/backend/freebsd_jail.rs (condensed)
match self.verify_template(&template_name, &clone_source).await {
  VerifyOutcome::Ok => {}
  VerifyOutcome::Missing => {
      if signify::require_signed_env() {
          return Err(BackendError::Unauthorized(...)); // → 403
      }
      // else: warn, proceed
  }
  VerifyOutcome::Invalid(reason) => {
      return Err(BackendError::Unauthorized(...)); // → 403
  }
}
self.run("zfs", &["clone", &clone_source, &ds]).await?;

The policy is deliberately asymmetric. A missing sig is a soft warning by default (development convenience: a freshly-built template works without a signing round-trip); set COPPICE_REQUIRE_SIGNED_TEMPLATES=1 in the gateway’s environment to harden it into a rejection. An invalid sig is always fatal — a tampered sig is strictly worse than a missing one because someone tried to lie.

Every verify bumps coppice_template_verifications_total{template,status} on /metrics. Status is one of ok / ok_legacy / missing / invalid; the Prometheus side can alert on rate(…{status=“invalid”}[5m]) > 0 as a first-class tampering signal, and …{status=“ok_legacy”} > 0 as a soft prompt to re-sign a template with the new-scheme filename.

Migration from name-only keying

Existing <name>.sig files are still honored — the gateway and coppice-verify-template.sh both try the new-scheme path first and fall back to the legacy one, so a gateway rolled forward before an operator has re-signed sees no service disruption. The only visible difference is that /metrics reports status=“ok_legacy” for those templates, and the gateway logs a falling back to legacy-named sig; re-sign with new-scheme filename warning once per create.

To clear that warning, re-sign each template against its current canonical snapshot:

coppice tpl sign _template@base-dns-20260422
coppice tpl sign browser        # default @base
coppice tpl sign vscode

# Optional rollback for the cautious operator:
coppice-sign-template.sh --legacy-filename <name>
# → writes the old <name>.sig path. Useful only for migration testing.

Leave the legacy <name>.sig files on disk while the new ones roll out — they’re the fallback safety net. Once ok_legacy has stayed at zero across a few days of deploys, the old sigs are an operator chore to delete.

Operator workflow

One-time setup on the operator’s laptop:

# Laptop (privkey never leaves this machine)
signify -G -c "coppice template signer" \
  -p /tmp/coppice-pub \
  -s /tmp/coppice-priv
scp /tmp/coppice-pub honor:/tmp/
ssh honor "sudo install -o root -m 0644 /tmp/coppice-pub /etc/coppice/pubkey"

Per-template signing (runs on any host with the privkey + the template dataset):

export COPPICE_SIGN_PRIVKEY=/tmp/coppice-priv
coppice tpl sign browser
coppice tpl sign vscode
coppice tpl sign _template

# Which shells out to:
#   tools/coppice-sign-template.sh <name>
#
# which writes /var/db/coppice/sigs/<name>.sig.

coppice tpl verify browser
# signify -V success; guid match; exit 0

The sign tool is deliberately a shell wrapper rather than a Rust binary so the signify command line stays audit-visible. The verify tool exists mostly for operators — the gateway has its own signify.rs module that does the same check on the create hot path.

Threat model

What this closes:

What it doesn’t close:

Audit: row flipped

The feature audit’s “Image signing / template provenance” row flips from open to closed. The receipt: the signify roundtrip lives in e2b-compat/src/backend/signify.rs (happy + tamper tests), the create gate in freebsd_jail.rs, the operator CLI at coppice tpl sign|verify, and the /metrics counter for monitoring. #74-sign.