Bench rig

The chain is simple. A mise run bench:<config> invocation SCPs benchmark shell scripts to $HONOR_HOST, runs them over SSH under sudo, captures TSV samples, and wraps them in a typed BenchmarkRun JSON file in benchmarks/results/. The Chart component reads those JSON files at build time. Reproducibility is one shell command per chart.

Host comparison: honor vs. Tencent's reference

Tencent's disclosed host

From the repo at pinned commit c439bb5, the whole catalog of hardware disclosure for the published <60ms / p95=90ms / p99=137ms figures amounts to the word "bare-metal". README.md:94: "Cold start benchmarked on bare-metal. 60ms at single concurrency; under 50 concurrent creations, avg 67ms, P95 90ms, P99 137ms — consistently sub-150ms." That sentence is not elaborated anywhere in README.md, README_zh.md, docs/**, any closed/open GitHub issue, or the v0.1.0 release notes. The only external data point — an aibase.com summary of the Tencent blog — mentions a "96-core physical server" in the density context ("2000+ sandboxes on one machine"), not the cold-start measurements. CPU vendor/model/clock, RAM size/type, storage medium, host kernel, guest vmlinux, and guest rootfs byte size are never published.

More consequential: the in-tree benchmark at CubeAPI/benchmark/runner.go:25-88 times a single HTTP POST /sandboxes round trip — clock starts right before client.Do(req), stops when the response headers return. The published 60ms is create-request latency from an HTTP client, not guest userspace readiness. It includes CubeAPI parsing, CubeMaster scheduling, Cubelet snapshot-clone + VMM fork, CubeVS network-agent plumbing, and the API's response write. It does not wait for a guest-side /ready probe or an exec check. The number is also heavily assisted by "resource pool pre-provisioning and snapshot cloning" (README.md:73) — the instance isn't booting a kernel; it's cloning a pre-warmed VMM snapshot. Apples-to-apples against "time-to-boot" is not what this number measures.

honor

From ssh honor 'sysctl -n hw.model hw.ncpu hw.physmem; zpool list zroot':

CPU: AMD Ryzen 9 5900HX (Zen 3, mobile APU with integrated Radeon graphics), 16 logical CPUs (8C/16T). Base 3.3 GHz / boost 4.6 GHz — a laptop-class part.
RAM: 32 GB (DDR4-3200 SO-DIMM, non-ECC).
Storage: single-vdev ZFS pool zroot on nda0p4.eli — a GELI-encrypted NVMe partition, 888 G total. Template lives on the same dataset.
Host OS: FreeBSD 15.0-RELEASE-p4, amd64.
Virtualization primitive: jails (OS-level isolation, shared host kernel). No guest kernel. Template is base.txz extracted → 374 MB on ZFS.

What we can conclude

Three overlapping caveats make any direct comparison apples-to-kumquats:

Different isolation primitive. CubeSandbox = KVM microVM with a dedicated guest kernel. honor = jails sharing the host kernel. Jails skip the entire guest-kernel / VMM / virtio plumbing that CubeSandbox's 60ms includes. Any "we're faster" finding on honor is partly measuring "jails have less to do," not "our software is better."
Different hardware class. Tencent benchmarks on unspecified bare-metal (the only external source mentions a server with 96 cores). honor is a laptop-class Ryzen 9 5900HX with non-ECC DDR4 and a single consumer NVMe behind GELI + ZFS. Expect honor to win on single-thread latency (Zen 3 at 4.6 GHz beats most server cores' boost) and lose on concurrency past ~16 jobs, memory bandwidth, and sustained I/O.
Different clock definition. CubeSandbox's 60ms is HTTP create-call latency against a pre-warmed snapshot pool. The jail rigs time jail -c through exec.start="/bin/echo ready" returning inside the jail — a stricter clock that includes actual in-jail exec. Our number with the same definition applied to CubeSandbox would be larger than 60ms.

The ethical comparison isn't a head-to-head table. It's "here's what honor does under our clearly-defined methodology" next to "here's what Tencent claims under their under-specified methodology" — which is what /freebsd-jails and /claims do, with this caveat linked prominently.

References: README.md:94, README.md:142, README.md:73, CubeAPI/benchmark/runner.go:25-88, deploy/guest-image/Dockerfile:1, deploy/one-click/build-vm-assets.sh:219-355, deploy/one-click/assets/kernel-artifacts/README.md.

Tasks

The canonical task list is .mise.toml at the repo root. Install mise, then from the project directory:

# Single config, end-to-end (sync rigs, capture host info, run all concurrencies + RSS):
mise run bench:jail-raw
mise run bench:jail-vnet-pf
mise run bench:jail-zfs-clone

# All implemented configs:
mise run bench:all-jails

Each bench:* task depends on bench:sync (scp the rigs to honor:/tmp/bench-rigs/) and bench:host-info (capture kernel/release/CPU/RAM so the summary JSON embeds a full host record). The internal bench:_run helper iterates over concurrencies 1/10/50 for cold-start and one RSS pass at cc=32.

.mise.toml 843 lines mise tasks

# .mise.toml — task runner for this research notebook.
#
# Invoke with: `mise run <task>` or `mise run <task-a> <task-b>`.
# Reproducibility is the point — if a number is on the site, the mise
# task that produced it is in this file, and the underlying shell rig
# is in benchmarks/rigs/. The site reads both at build time.

[tools]
node = "22"
go = "1.25"
pnpm = "10"
python = "3.12"

[env]
CUBESANDBOX_COMMIT = "c439bb513f5124d4d9389451b31b8aeb87ab539c"
HONOR_HOST         = "honor"
HONOR_RIG_DIR      = "/tmp/bench-rigs"
# Shallow defaults for first-pass signal. Bump these (200/100/50) once
# we like the shape. Jail creates aren't cheap — cc=1 at ~1s each.
BENCH_ITERS_CC1    = "30"
BENCH_ITERS_CC10   = "30"
BENCH_ITERS_CC50   = "50"

# ──────────────────────────── site ─────────────────────────────

[tasks.dev]
description = "Run the Astro dev server"
run = "pnpm dev"

[tasks."admin:dev-honor"]
description = "Run the Astro admin dev server on :4327 with honor's gateway tunnel and bearer token"
run = '''
set -eu
if ! curl -fsS http://127.0.0.1:3001/health >/dev/null 2>&1; then
  ssh -fN -o ExitOnForwardFailure=yes -L 127.0.0.1:3001:127.0.0.1:3000 "$HONOR_HOST"
fi
COPPICE_ADMIN_TOKEN="$(ssh "$HONOR_HOST" 'sudo cat /var/lib/coppice/bench-token')"
export COPPICE_ADMIN_TOKEN
exec pnpm dev -- --host 0.0.0.0 --port 4327
'''

[tasks.build]
description = "Build the static site"
run = "pnpm build"

[tasks.test]
description = "Run component + schema tests"
run = "pnpm test"

[tasks.check]
description = "Run astro check"
run = "pnpm check"

[tasks."site:deploy-honor"]
description = "Build the static site and rsync dist/ to $HONOR_HOST:/usr/local/share/coppice-site/. Used for in-progress UX review at http://honor:4322/admin/."
depends = ["build"]
run = """
set -eu
ssh "$HONOR_HOST" 'sudo mkdir -p /usr/local/share/coppice-site && sudo chown $(id -un):$(id -gn) /usr/local/share/coppice-site'
rsync -a --delete dist/ "$HONOR_HOST":/usr/local/share/coppice-site/
echo "deployed dist → $HONOR_HOST:/usr/local/share/coppice-site/"
ssh "$HONOR_HOST" 'pgrep -f "http.server 4322" >/dev/null || (cd /usr/local/share/coppice-site && daemon -f -o /tmp/coppice-site.log /usr/local/bin/python3 -m http.server 4322 --bind 127.0.0.1)'
echo "site live: ssh -L 4322:localhost:4322 $HONOR_HOST → http://localhost:4322/admin/"
"""

[tasks.links]
description = "Check built site for broken links"
run = ["pnpm build", "pnpm links"]

# ──────────────────────────── research ──────────────────────────

[tasks."research:clone"]
description = "Clone CubeSandbox at the pinned commit into /tmp/cubesandbox-research"
run = '''
set -eu
mkdir -p /tmp/cubesandbox-research
cd /tmp/cubesandbox-research
[ -d CubeSandbox/.git ] || git clone https://github.com/TencentCloud/CubeSandbox.git
cd CubeSandbox
git fetch origin main
git checkout "$CUBESANDBOX_COMMIT"
echo "Pinned at $(git rev-parse HEAD)"
'''

# ──────────────────────────── benchmarks ────────────────────────

[tasks."bench:sync"]
description = "Sync benchmarks/rigs to $HONOR_HOST:$HONOR_RIG_DIR"
run = 'scp -r benchmarks/rigs "$HONOR_HOST:$HONOR_RIG_DIR"'

[tasks."bench:setup-pf"]
description = "Apply the benchmark pf ruleset on $HONOR_HOST with a dead-man switch that auto-disables pf after DMS_TIMEOUT seconds if we lose control."
depends = ["bench:sync"]
run = '''
set -eu
: "${DMS_TIMEOUT:=60}"
ssh "$HONOR_HOST" "sudo sh -c 'DMS_TIMEOUT=$DMS_TIMEOUT sh $HONOR_RIG_DIR/setup-pf.sh $HONOR_RIG_DIR/pf.bench.conf'"
# Independent verification from the dev machine — if this fails, the
# dead-man is running and pf will self-disable within DMS_TIMEOUT.
sleep 2
ssh "$HONOR_HOST" 'echo "SSH still alive: $(hostname)"'
'''

[tasks."bench:host-info"]
description = "Capture $HONOR_HOST's kernel/release/CPU/RAM into /tmp/honor-host.json"
run = '''
ssh "$HONOR_HOST" 'python3 - <<PY
import json, platform, subprocess
def s(k): return subprocess.run(["sysctl","-n",k], capture_output=True, text=True).stdout.strip()
print(json.dumps({
  "hostname": platform.node(), "kernel": platform.system(),
  "release": platform.release(), "cpuModel": s("hw.model"),
  "cpuCount": int(s("hw.ncpu") or 0),
  "memGB": round(int(s("hw.physmem") or 0) / (1024**3), 2),
}))
PY' > /tmp/honor-host.json
cat /tmp/honor-host.json
'''

# Internal helper: execute one config × (cc1/10/50 cold-start + idle RSS).
# Takes $CONFIG as env var. RSS concurrency defaults to 32 (jails); set
# $RSS_CC for configs where 32 is impractical (bhyve VMs at 512MB each
# would reserve 16GB — we use cc=8 there). Cold-start iteration counts
# per concurrency may also be overridden via $ITERS_CC1/10/50 for configs
# where a boot takes seconds, not milliseconds.
[tasks."bench:_run"]
description = "Internal: run one config's full sweep (cold-start cc1/10/50 + idle RSS)"
hide = true
run = '''
set -eu
: "${CONFIG:?must set CONFIG=jail-raw|jail-vnet-pf|jail-zfs-clone|bhyve-*}"
: "${RSS_CC:=32}"
: "${ITERS_CC1:=$BENCH_ITERS_CC1}"
: "${ITERS_CC10:=$BENCH_ITERS_CC10}"
: "${ITERS_CC50:=$BENCH_ITERS_CC50}"
: "${RIG_ENV:=}"
mkdir -p benchmarks/results
HOST_JSON=$(cat /tmp/honor-host.json)

for CC in 1 10 50; do
  case $CC in 1) ITERS=$ITERS_CC1;; 10) ITERS=$ITERS_CC10;; 50) ITERS=$ITERS_CC50;; esac
  echo "▸ $CONFIG cold-start @ cc=$CC (iters=$ITERS)"
  ssh "$HONOR_HOST" "cd $HONOR_RIG_DIR && sudo sh -c '$RIG_ENV sh $CONFIG.sh $CC $ITERS'" > "/tmp/$CONFIG-cc$CC.tsv"
  python3 benchmarks/rigs/summarize.py \
    --config "$CONFIG" --metric cold-start-ms --concurrency "$CC" \
    --script "benchmarks/rigs/$CONFIG.sh" --input "/tmp/$CONFIG-cc$CC.tsv" \
    --output "benchmarks/results/${CONFIG}_cold-start-ms_cc${CC}.json" \
    --host-info-json "$HOST_JSON"
done

echo "▸ $CONFIG idle RSS @ cc=$RSS_CC"
ssh "$HONOR_HOST" "cd $HONOR_RIG_DIR && sudo sh $CONFIG-rss.sh" > "/tmp/$CONFIG-rss.tsv"
python3 benchmarks/rigs/summarize.py \
  --config "$CONFIG" --metric rss-kb-idle-1s --concurrency "$RSS_CC" \
  --script "benchmarks/rigs/$CONFIG-rss.sh" --input "/tmp/$CONFIG-rss.tsv" \
  --output "benchmarks/results/${CONFIG}_rss-kb-idle-1s_cc${RSS_CC}.json" \
  --host-info-json "$HOST_JSON"
'''

[tasks."bench:jail-raw"]
description = "jail-raw: cold-start @ cc=1,10,50 + idle RSS @ cc=32"
depends = ["bench:sync", "bench:host-info"]
env = { CONFIG = "jail-raw" }
run = "mise run bench:_run"

[tasks."bench:jail-vnet-pf"]
description = "jail-vnet-pf (VNET + pf egress filter): cold-start + idle RSS"
depends = ["bench:sync", "bench:host-info"]
env = { CONFIG = "jail-vnet-pf" }
run = "mise run bench:_run"

[tasks."bench:jail-zfs-clone"]
description = "jail-zfs-clone (per-jail ZFS clone rootfs): cold-start + idle RSS"
depends = ["bench:sync", "bench:host-info"]
env = { CONFIG = "jail-zfs-clone" }
run = "mise run bench:_run"

[tasks."bench:all-jails"]
description = "All three jail configurations"
depends = ["bench:jail-raw", "bench:jail-vnet-pf", "bench:jail-zfs-clone"]

# bhyve tasks. Each config follows the same pattern as the jail tasks:
# bench:_run runs cold-start at cc=1/10/50 plus an idle-RSS sweep.
# bhyve VMs dominate the runtime budget — cc=50 cold-start can take
# minutes at full guest memory. See the rig scripts for iter counts.

[tasks."bench:bhyve-full"]
description = "bhyve-full (full FreeBSD 15 GENERIC guest per iter): cold-start + idle RSS"
depends = ["bench:sync", "bench:host-info"]
env = { CONFIG = "bhyve-full", RSS_CC = "8", ITERS_CC1 = "10", ITERS_CC10 = "10", ITERS_CC50 = "50", RIG_ENV = "TIMEOUT_SEC=180" }
run = "mise run bench:_run"

[tasks."bench:bhyve-minimal"]
description = "bhyve-minimal (MINIMAL-BHYVE kernel — STUB: guest rc.conf not yet trimmed)"
depends = ["bench:sync", "bench:host-info"]
env = { CONFIG = "bhyve-minimal", RSS_CC = "8", ITERS_CC1 = "10", ITERS_CC10 = "10", ITERS_CC50 = "50", RIG_ENV = "TIMEOUT_SEC=60" }
run = "mise run bench:_run"

[tasks."bench:bhyve-prewarm-pool"]
description = "bhyve-prewarm-pool (SIGSTOP/SIGCONT proxy for CH snapshot-restore): cold-start + idle RSS"
depends = ["bench:sync", "bench:host-info"]
env = { CONFIG = "bhyve-prewarm-pool", RSS_CC = "8", ITERS_CC1 = "30", ITERS_CC10 = "30", ITERS_CC50 = "50", RIG_ENV = "POOL_SIZE=50 BOOT_TIMEOUT=240" }
run = "mise run bench:_run"

[tasks."bench:bhyve-durable-pool-setup"]
description = "One-time pool-setup: boot N VMs, bhyvectl --suspend each to /vms/pool/ (requires SNAPSHOT kernel)"
depends = ["bench:sync"]
run = 'ssh "$HONOR_HOST" "sudo sh $HONOR_RIG_DIR/bhyve-durable-pool-setup.sh"'

[tasks."bench:bhyve-durable-pool"]
description = "bhyve-durable-pool: resume from on-disk bhyvectl --suspend checkpoint (SNAPSHOT kernel)"
depends = ["bench:sync", "bench:host-info"]
env = { CONFIG = "bhyve-durable-pool" }
run = "mise run bench:_run"

[tasks."bench:all"]
description = "Run every bench config that's implemented"
depends = ["bench:all-jails"]

[tasks."bench:clean-remote"]
description = "Remove /tmp/bench-rigs and bench tempfiles on $HONOR_HOST"
run = 'ssh "$HONOR_HOST" "rm -rf $HONOR_RIG_DIR /tmp/*.tsv /tmp/bhyve-*.log" || true'

[tasks."sdk:node-roundtrip"]
description = "Run the Node SDK round-trip rig against honor and capture a receipt"
run = 'sh benchmarks/rigs/sdk-node-roundtrip.sh'

[tasks."sdk:go-roundtrip"]
description = "Run the Go round-trip rig against honor and capture a receipt"
run = 'sh benchmarks/rigs/sdk-go-roundtrip.sh'

[tasks."sdk:roundtrip"]
description = "Run both Node and Go SDK round-trip rigs and capture a combined receipt"
run = 'sh benchmarks/rigs/sdk-roundtrip.sh'

[tasks."sdk:files-watch"]
description = "Run the filesystem watch SDK rig against honor and capture a receipt"
run = 'sh benchmarks/rigs/files-watch.sh'

[tasks."agents:openai-smoke"]
description = "Run the OpenAI Agents SDK + Coppice session/tool smoke receipt"
run = 'sh benchmarks/rigs/openai-agents-coppice-smoke.sh'

[tasks."agents:openai-e2b-client-smoke"]
description = "Run the competitor-shaped OpenAI Agents SDK + E2B client receipt"
run = 'sh benchmarks/rigs/openai-agents-e2b-client-smoke.sh'

[tasks."agents:openai-code-interpreter-smoke"]
description = "Run the competitor-shaped OpenAI Agents code-interpreter receipt"
run = 'sh benchmarks/rigs/openai-agents-code-interpreter-smoke.sh'

[tasks."auth:api-key-smoke"]
description = "Start a local gateway with COPPICE_API_KEYS and prove API/envd auth enforcement"
run = 'sh benchmarks/rigs/api-key-auth-smoke.sh'

[tasks."auth:cli-smoke"]
description = "Run the coppice CLI login/whoami/logout credential-flow receipt"
run = 'sh benchmarks/rigs/cli-auth-smoke.sh'

[tasks."agents:mini-rl-smoke"]
description = "Run the SWE/mini-RL agent-loop smoke receipt"
run = 'sh benchmarks/rigs/mini-rl-training-smoke.sh'

[tasks."agents:python-decorator-smoke"]
description = "Run the Modal-style Python decorator ergonomics smoke receipt"
run = 'sh benchmarks/rigs/python-decorator-smoke.sh'

[tasks."agents:codex-cli-smoke"]
description = "Run a local Codex CLI subscription-backed Coppice lifecycle receipt"
run = 'sh benchmarks/rigs/codex-cli-coppice-smoke.sh'

[tasks."agents:demos"]
description = "Run OpenAI Agents SDK and mini-RL/SWE-style demo receipts"
run = 'sh benchmarks/rigs/agent-demos.sh'

[tasks."backend:parity-honor"]
description = "Run backend/API parity smoke receipts on $HONOR_HOST; BACKEND_PARITY_SUITE=quick|full|recovery"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
: "${BACKEND_PARITY_SUITE:=quick}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo GATEWAY=http://127.0.0.1:3000 BACKEND_PARITY_SUITE='$BACKEND_PARITY_SUITE' sh benchmarks/rigs/backend-parity-honor.sh '$BACKEND_PARITY_SUITE'"
'''

[tasks."limits:live-resize-honor"]
description = "Create a fresh jail on honor and prove PATCH /sandboxes/:id/limits updates rctl + ZFS quota live"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo sh benchmarks/rigs/live-limits-resize.sh"
'''

[tasks."lifecycle:auto-suspend-resume-honor"]
description = "Create a fresh jail on honor and prove idle pause + SDK traffic auto-resume"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo sh benchmarks/rigs/auto-suspend-resume-smoke.sh"
'''

[tasks."lifecycle:bhyve-pause-resume-honor"]
description = "Create a fresh bhyve sandbox on honor and prove gateway pause/resume preserves the owned pool slot"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo sh benchmarks/rigs/bhyve-pause-resume-smoke.sh"
'''

[tasks."lifecycle:events-honor"]
description = "Create a fresh jail on honor and prove the E2B-shaped lifecycle events API"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo sh benchmarks/rigs/lifecycle-events-smoke.sh"
'''

[tasks."lifecycle:webhooks-honor"]
description = "Create a fresh jail on honor and prove E2B-shaped signed lifecycle webhooks"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo sh benchmarks/rigs/lifecycle-webhooks-smoke.sh"
'''

[tasks."gateway:restart-recovery-honor"]
description = "Create a fresh jail on honor, restart e2b-compat, and prove the live jail is recovered into /sandboxes"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sh benchmarks/rigs/gateway-restart-recovery-smoke.sh"
'''

[tasks."gateway:fork-restart-recovery-honor"]
description = "Create a parent jail, snapshot+fork it, restart e2b-compat, and prove fork lineage survives recovery"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sh benchmarks/rigs/snapshot-fork-restart-recovery-smoke.sh"
'''

[tasks."snapshot:retained-source-cleanup-honor"]
description = "Create a fresh jail on honor and prove deleting the last durable snapshot removes the retained source dataset"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo sh benchmarks/rigs/snapshot-retained-source-cleanup-smoke.sh"
'''

# ─────────────────────────── e2b-compat ─────────────────────────

[tasks."e2b:build"]
description = "Build the e2b-compat binary (cargo build --release)"
run = "cargo build --release --manifest-path e2b-compat/Cargo.toml"

[tasks."e2b:check"]
description = "cargo check the e2b-compat crate"
run = "cargo check --manifest-path e2b-compat/Cargo.toml"

[tasks."e2b:serve"]
description = "Run e2b-compat locally (requires root or sudoers for zfs/jail/jls/ps/kill/jexec)"
run = '''
cargo run --release --manifest-path e2b-compat/Cargo.toml -- \
  --listen 127.0.0.1:3000 \
  --zfs-pool zroot/jails \
  --template-snapshot zroot/jails/_template@base \
  --jails-root /jails
'''

[tasks."ui:dev"]
description = "Vite dev server for the React demo portal (:5173, proxies API to :3000)"
# The gateway serves the production bundle at /ui/ from --ui-dir; this
# task is the React inner loop. Proxy target defaults to localhost:3000
# (the gateway). Override with VITE_API_TARGET=http://honor:3000 etc.
run = 'pnpm --dir e2b-compat/ui-src dev'

[tasks."ui:build"]
description = "Production build of the React demo portal into e2b-compat/ui/"
# Output path is pinned in vite.config.ts (../ui); this task is a thin
# wrapper so e2b:sync-honor can depend on it cleanly.
run = 'pnpm --dir e2b-compat/ui-src build'

[tasks."ui:install"]
description = "pnpm install inside e2b-compat/ui-src/"
run = 'pnpm --dir e2b-compat/ui-src install'

[tasks."e2b:sync-honor"]
description = "Build React UI, then rsync e2b-compat sources to $HONOR_HOST:/tmp/e2b-compat-src/"
# Depend on ui:build so the rsync always carries the latest Vite output
# under ui/. A sync without the prior build would push a stale bundle
# from the previous run, or no bundle at all on a fresh checkout.
depends = ["ui:build"]
run = '''
ssh "$HONOR_HOST" 'mkdir -p /tmp/e2b-compat-src'
rsync -az --delete --exclude=target/ --exclude=Cargo.lock --exclude=ui-src/node_modules/ e2b-compat/ "$HONOR_HOST:/tmp/e2b-compat-src/"
'''

[tasks."e2b:build-honor"]
description = "Build the e2b-compat binary on $HONOR_HOST (FreeBSD-native)"
depends = ["e2b:sync-honor"]
run = 'ssh "$HONOR_HOST" "cd /tmp/e2b-compat-src && cargo build --release"'

[tasks."e2b:serve-honor"]
description = "Run e2b-compat on $HONOR_HOST as root, listen 127.0.0.1:3000"
depends = ["e2b:build-honor"]
run = 'ssh "$HONOR_HOST" "sudo /tmp/e2b-compat-src/target/release/e2b-compat --listen 127.0.0.1:3000 --zfs-pool zroot/jails --template-snapshot zroot/jails/_template@base --jails-root /jails"'

[tasks."e2b:smoke"]
description = "Hit the running e2b-compat with the example smoke-test script"
run = 'sh e2b-compat/examples/smoke-test.sh "${E2B_COMPAT_URL:-http://honor:3000}"'

[tasks."e2b:install-service-honor"]
description = "Install e2b-compat as an rc.d service on $HONOR_HOST and refresh the bhyve pool helper. Does not start the service."
depends = ["e2b:build-honor"]
# Staging via /tmp lets the install helper run under `sudo sh` without
# needing scp over an authenticated root channel; the rc.d script +
# installer are small and get replaced every invocation. The helper is
# idempotent, so rerunning after a fresh cargo build refreshes the
# binary + UI tree without touching /etc/rc.conf.
run = '''
set -eu
scp tools/rc.d/e2b-compat "$HONOR_HOST:/tmp/e2b-compat"
scp tools/install-e2b-compat-service.sh "$HONOR_HOST:/tmp/install-e2b-compat-service.sh"
scp tools/coppice-bhyve-pool-ctl.sh "$HONOR_HOST:/tmp/coppice-bhyve-pool-ctl.sh"
ssh "$HONOR_HOST" "sudo sh /tmp/install-e2b-compat-service.sh /tmp/e2b-compat /tmp/e2b-compat-src/target/release/e2b-compat /tmp/e2b-compat-src/ui"
ssh "$HONOR_HOST" "sudo install -m 0755 /tmp/coppice-bhyve-pool-ctl.sh /usr/local/sbin/coppice-bhyve-pool-ctl.sh"
echo
echo "installed on $HONOR_HOST. to cut over from an existing nohup'd gateway:"
echo "  ssh $HONOR_HOST 'sudo pkill -f /tmp/e2b-compat-src/target/release/e2b-compat || true'"
echo "  ssh $HONOR_HOST 'sudo service e2b-compat start'"
echo "  ssh $HONOR_HOST 'sudo service e2b-compat status'"
'''

[tasks."bhyve-pool:install-helper-honor"]
description = "Install the repo bhyve pool helper on $HONOR_HOST"
run = '''
set -eu
scp tools/coppice-bhyve-pool-ctl.sh "$HONOR_HOST:/tmp/coppice-bhyve-pool-ctl.sh"
ssh "$HONOR_HOST" "sudo install -m 0755 /tmp/coppice-bhyve-pool-ctl.sh /usr/local/sbin/coppice-bhyve-pool-ctl.sh"
'''

[tasks."net:setup-honor"]
description = "Restore honor's sandbox bridges + pf/local_unbound wiring for jail and bhyve sandboxes"
run = '''
set -eu
ssh "$HONOR_HOST" "sudo env EXT_IF=vm-public sh -s" < tools/coppice-net-setup.sh
'''

[tasks."net:install-service-honor"]
description = "Install the boot-time rc.d service that restores honor's jail+bhyve sandbox networking before e2b-compat starts"
run = '''
set -eu
scp tools/coppice-net-setup.sh "$HONOR_HOST:/tmp/coppice-net-setup.sh"
scp tools/rc.d/coppice-net-setup "$HONOR_HOST:/tmp/coppice-net-setup"
ssh "$HONOR_HOST" "sudo install -d -m 0755 /usr/local/sbin /usr/local/etc/rc.d"
ssh "$HONOR_HOST" "sudo install -m 0755 /tmp/coppice-net-setup.sh /usr/local/sbin/coppice-net-setup.sh"
ssh "$HONOR_HOST" "sudo install -m 0755 /tmp/coppice-net-setup /usr/local/etc/rc.d/coppice-net-setup"
ssh "$HONOR_HOST" "sudo sh -c 'grep -qE \"^coppice_net_setup_enable=\" /etc/rc.conf || echo \"coppice_net_setup_enable=\\\"YES\\\"\" >> /etc/rc.conf'"
ssh "$HONOR_HOST" "sudo service coppice-net-setup start && sudo service coppice-net-setup status"
'''

[tasks."dump:install-service-honor"]
description = "Install the boot-time rc.d service that re-arms dumpon on $HONOR_HOST"
run = '''
set -eu
scp tools/rc.d/coppice-dumpon "$HONOR_HOST:/tmp/coppice-dumpon"
ssh "$HONOR_HOST" "sudo install -d -m 0755 /usr/local/etc/rc.d"
ssh "$HONOR_HOST" "sudo install -m 0755 /tmp/coppice-dumpon /usr/local/etc/rc.d/coppice-dumpon"
ssh "$HONOR_HOST" "sudo sysrc coppice_dumpon_enable=YES coppice_dumpon_device=/dev/nda0p3.eli"
ssh "$HONOR_HOST" "sudo service coppice-dumpon start && sudo service coppice-dumpon status"
'''

# ──────────────────────────── template builds ────────────────────────────
#
# One task per pre-baked template. Use the repo checkout on honor,
# not ad-hoc copied scripts, so `git pull` there is the source of
# truth.

[tasks."desktop-template:build-honor"]
description = "Build /jails/desktop-template on $HONOR_HOST (tigervnc-server + xrdp + openbox + firefox)"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo sh benchmarks/rigs/desktop-template-build.sh build"
echo
echo "built desktop-template on $HONOR_HOST. to pick it up without a gateway restart:"
echo "  curl -X POST http://$HONOR_HOST:3000/templates/reload"
'''

[tasks."desktop-template:refresh-honor"]
description = "Refresh /jails/desktop-template@base on $HONOR_HOST in place for new clones"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo sh benchmarks/rigs/desktop-template-build.sh refresh"
echo
echo "refreshed desktop-template on $HONOR_HOST. to pick it up without a gateway restart:"
echo "  curl -X POST http://$HONOR_HOST:3000/templates/reload"
'''

[tasks."nginx-byoi:smoke-honor"]
description = "Build and prove the cubesandbox-base-nginx jail template + <port>-<id>.coppice.lan routing on $HONOR_HOST"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo sh benchmarks/rigs/nginx-byoi-smoke.sh"
'''

[tasks."preview-url:smoke"]
description = "Run the coppiceproxy signed preview URL verifier/mint smoke"
run = 'sh benchmarks/rigs/signed-preview-url-smoke.sh'

[tasks."preview-url:gateway-honor"]
description = "Run the gateway signed preview URL mint smoke on $HONOR_HOST"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && GATEWAY=http://127.0.0.1:3000 sh benchmarks/rigs/gateway-preview-url-smoke.sh"
'''

[tasks."ide:ssh-access-honor"]
description = "Run the bhyve Remote-SSH access smoke on $HONOR_HOST"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && GATEWAY=http://127.0.0.1:3000 sh benchmarks/rigs/ide-ssh-access-smoke.sh"
'''

[tasks."scheduled-sandboxes:honor"]
description = "Run the gateway scheduled-sandbox smoke on $HONOR_HOST"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && GATEWAY=http://127.0.0.1:3000 sh benchmarks/rigs/scheduled-sandboxes-smoke.sh"
'''

[tasks."live-volume-mount:honor"]
description = "Run the live ZFS volume hot-mount smoke on $HONOR_HOST"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && GATEWAY=http://127.0.0.1:3000 sh benchmarks/rigs/live-volume-mount-smoke.sh"
'''

[tasks."directory-snapshot:honor"]
description = "Run the subtree directory snapshot/restore smoke on $HONOR_HOST"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && GATEWAY=http://127.0.0.1:3000 sh benchmarks/rigs/directory-snapshot-smoke.sh"
'''

[tasks."secrets-store:honor"]
description = "Run the gateway secrets store smoke on $HONOR_HOST"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && GATEWAY=http://127.0.0.1:3000 sh benchmarks/rigs/secrets-store-smoke.sh"
'''

[tasks."credential-proxy:honor"]
description = "Run the host-side request credential proxy smoke on $HONOR_HOST"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && GATEWAY=http://127.0.0.1:3000 sh benchmarks/rigs/credential-proxy-smoke.sh"
'''

[tasks."git-checkout-api:honor"]
description = "Run the first-class git checkout API smoke on $HONOR_HOST"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
: "${GIT_CHECKOUT_TEMPLATE:=debian-12-bhyve}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && GATEWAY=http://127.0.0.1:3000 TEMPLATE='$GIT_CHECKOUT_TEMPLATE' sh benchmarks/rigs/git-checkout-api-smoke.sh"
'''

[tasks."windows11-template:prepare-honor"]
description = "Stage a Windows 11 bhyve template image + sidecar config on $HONOR_HOST"
run = '''
set -eu
: "${WINDOWS_ISO:?set WINDOWS_ISO to the ISO path on $HONOR_HOST}"
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo WINDOWS_ISO='$WINDOWS_ISO' sh benchmarks/rigs/windows11-eval-bhyve-template.sh prepare"
'''

[tasks."windows11-template:boot-install-honor"]
description = "Boot the staged Windows 11 install VM on $HONOR_HOST over bhyve VNC"
run = '''
set -eu
: "${WINDOWS_ISO:?set WINDOWS_ISO to the ISO path on $HONOR_HOST}"
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh -t "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo WINDOWS_ISO='$WINDOWS_ISO' sh benchmarks/rigs/windows11-eval-bhyve-template.sh boot-install"
'''

[tasks."windows-server-template:prepare-honor"]
description = "Download and import the official Windows Server 2025 eval VHDX on $HONOR_HOST"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo sh benchmarks/rigs/windows-server-eval-bhyve-template.sh prepare"
ssh "$HONOR_HOST" "curl -fsS -X POST http://127.0.0.1:3000/templates/reload >/dev/null || true"
'''

[tasks."windows-server-template:boot-console-honor"]
description = "Boot the imported Windows Server template on $HONOR_HOST over bhyve VNC"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh -t "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo sh benchmarks/rigs/windows-server-eval-bhyve-template.sh boot-console"
'''

[tasks."linux-template:prepare-honor"]
description = "Build the Debian Linux bhyve template on $HONOR_HOST from the repo checkout there"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo sh benchmarks/rigs/debian12-cloud-bhyve-template.sh prepare"
ssh "$HONOR_HOST" "curl -fsS -X POST http://127.0.0.1:3000/templates/reload >/dev/null || true"
'''

[tasks."linux-template:smoke-honor"]
description = "Warm, checkout, probe, and drain the Debian Linux bhyve template on $HONOR_HOST"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo sh benchmarks/rigs/linux-bhyve-smoke.sh"
'''

[tasks."gpu-template:smoke-honor"]
description = "Run the GPU passthrough smoke on $HONOR_HOST against the Debian GPU bhyve template"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
: "${GPU_BASE_TEMPLATE:=debian-12-bhyve-gpu}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo BASE_TEMPLATE='$GPU_BASE_TEMPLATE' sh benchmarks/rigs/gpu-passthrough-smoke.sh"
'''

[tasks."host-network:smoke"]
description = "Assert the live gateway exposes FreeBSD interface, route, bridge, and pf state for the Proxmox-style node Network tab"
run = "sh benchmarks/rigs/host-network-smoke.sh"

[tasks."host-network:smoke-honor"]
description = "Run the host network diagnostics smoke on $HONOR_HOST against the local gateway"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && COPPICE_ADMIN_TOKEN=\"\$(sudo cat /var/lib/coppice/bench-token)\" GATEWAY=http://127.0.0.1:3000 sh benchmarks/rigs/host-network-smoke.sh"
'''

[tasks."host-storage:smoke"]
description = "Assert the live gateway exposes FreeBSD ZFS, filesystem, and partition state for the Proxmox-style node Disks tab"
run = "sh benchmarks/rigs/host-storage-smoke.sh"

[tasks."host-storage:smoke-honor"]
description = "Run the host storage diagnostics smoke on $HONOR_HOST against the local gateway"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && COPPICE_ADMIN_TOKEN=\"\$(sudo cat /var/lib/coppice/bench-token)\" GATEWAY=http://127.0.0.1:3000 sh benchmarks/rigs/host-storage-smoke.sh"
'''

# ──────────────────────────── demos ────────────────────────────

[tasks."demo:notebook"]
description = "Execute examples/notebook-demo.ipynb against the Coppice gateway"
run = '''
set -eu
HONOR="${HONOR_HOST:-honor}"

# Gateway binds to 127.0.0.1 on honor. Tunnel API + envd through SSH
# so the local SDK reaches them as localhost:3000 / localhost:49999
# — which is what E2B_DEBUG=true hardcodes anyway.
echo "opening SSH tunnel: $HONOR → localhost 3000 + 49999"
ssh -fN -L 3000:127.0.0.1:3000 -L 49999:127.0.0.1:49999 "$HONOR"
ssh_pid=$(pgrep -f "ssh -fN -L 3000:127.0.0.1:3000.*$HONOR" | head -1 || true)
cleanup() { [ -n "${ssh_pid:-}" ] && kill "$ssh_pid" 2>/dev/null || true; }
trap cleanup EXIT

export E2B_API_URL="http://localhost:3000"
export E2B_DEBUG="true"
export E2B_API_KEY="${E2B_API_KEY:-local}"
echo "gateway: $E2B_API_URL (tunneled from $HONOR)"

# uv run with inline deps — no pip, no venv to manage.
uv run --with jupyter --with nbclient --with e2b-code-interpreter \
  jupyter nbconvert \
    --to notebook \
    --execute examples/notebook-demo.ipynb \
    --output notebook-demo.executed.ipynb \
    --ExecutePreprocessor.timeout=120
echo
echo "executed: examples/notebook-demo.executed.ipynb"
echo "render to HTML:  mise run demo:notebook:html"
echo "open live:       mise run demo:notebook:view"
'''

[tasks."demo:notebook:view"]
description = "Open the notebook live in nbclassic, tunnel + deps wired up"
run = '''
set -eu
HONOR="${HONOR_HOST:-honor}"

# Same tunnel as demo:notebook — keeps localhost:3000/:49999 pointed at
# the gateway on honor while nbclassic is running.
ssh -fN -L 3000:127.0.0.1:3000 -L 49999:127.0.0.1:49999 "$HONOR"
ssh_pid=$(pgrep -f "ssh -fN -L 3000:127.0.0.1:3000.*$HONOR" | head -1 || true)
cleanup() { [ -n "${ssh_pid:-}" ] && kill "$ssh_pid" 2>/dev/null || true; }
trap cleanup EXIT

export E2B_API_URL="http://localhost:3000"
export E2B_DEBUG="true"
export E2B_API_KEY="${E2B_API_KEY:-local}"
echo "gateway: $E2B_API_URL (tunneled from $HONOR)"

# Run both the server and the in-notebook kernel under the same uv
# environment so `from e2b_code_interpreter import Sandbox` resolves.
# The source ipynb is the live version — open that, not the
# .executed one (nbconvert-rendered outputs confuse readers trying
# to re-run cells).
exec uv run \
  --with nbclassic \
  --with e2b-code-interpreter \
  --with matplotlib --with pandas --with numpy \
  jupyter nbclassic examples/notebook-demo.ipynb
'''

[tasks."demo:notebook:html"]
description = "Render the executed notebook to a static HTML page"
depends = ["demo:notebook"]
run = '''
uv run --with jupyter --with nbconvert \
  jupyter nbconvert \
    --to html \
    examples/notebook-demo.executed.ipynb \
    --output notebook-demo.html
echo "rendered: examples/notebook-demo.html"
'''

# ──────────────────────────── metrics ──────────────────────────

[tasks."metrics:scrape"]
description = "One-shot scrape of honor's e2b-compat /metrics to data/metrics/YYYY-MM-DD.jsonl"
run = '''
: "${METRICS_URL:=http://localhost:3000/metrics}"
HONOR="${HONOR_HOST:-honor}"
# If hitting localhost, we need a tunnel. Otherwise talk direct.
case "$METRICS_URL" in
	*localhost*|*127.0.0.1*)
		ssh -fN -L 3000:127.0.0.1:3000 "$HONOR" 2>/dev/null || true
		ssh_pid=$(pgrep -f "ssh -fN -L 3000:127.0.0.1:3000.*$HONOR" | head -1 || true)
		trap '[ -n "${ssh_pid:-}" ] && kill "$ssh_pid" 2>/dev/null || true' EXIT
		;;
esac
exec tools/metrics-scraper.sh "$METRICS_URL"
'''

[tasks."metrics:watch"]
description = "Scrape every 30s until Ctrl-C — good enough for a demo day"
run = '''
: "${METRICS_URL:=http://localhost:3000/metrics}"
HONOR="${HONOR_HOST:-honor}"
case "$METRICS_URL" in
	*localhost*|*127.0.0.1*)
		ssh -fN -L 3000:127.0.0.1:3000 "$HONOR"
		ssh_pid=$(pgrep -f "ssh -fN -L 3000:127.0.0.1:3000.*$HONOR" | head -1 || true)
		trap '[ -n "${ssh_pid:-}" ] && kill "$ssh_pid" 2>/dev/null || true' EXIT
		;;
esac
echo "scraping $METRICS_URL every 30s → data/metrics/"
while :; do
	tools/metrics-scraper.sh "$METRICS_URL" || true
	sleep 30
done
'''

# ──────────────────────────── coppice CLI ───────────────────────

[tasks."coppice:install"]
description = "Build and install the `coppice` CLI (cubemastercli-equivalent) to ~/.cargo/bin"
# `--locked` pins to Cargo.lock so contributors get the same reqwest
# chain we test against; `--path` keeps the install local rather than
# from the registry. The binary lands at ~/.cargo/bin/coppice which is
# on $PATH in any standard rustup setup.
run = "cargo install --path e2b-compat --bin coppice --locked"


# ──────────────────────────── coppice MCP bridge ────────────────

[tasks."coppice:install-mcp-honor"]
description = "Build coppice-mcp on $HONOR_HOST and install it at /usr/local/bin/coppice-mcp"
# MCP clients (Claude Code, Desktop, Inspector) launch the bridge as
# `ssh honor /usr/local/bin/coppice-mcp` — having it on an absolute
# path the SSH default $PATH can find is the only install requirement.
# No rc.d service: MCP is stdio-only in v1, so the host spawns one
# process per session.
depends = ["e2b:build-honor"]
run = '''
set -eu
ssh "$HONOR_HOST" 'sudo install -m 0755 /tmp/e2b-compat-src/target/release/coppice-mcp /usr/local/bin/coppice-mcp'
ssh "$HONOR_HOST" '/usr/local/bin/coppice-mcp --version || true'
echo
echo "installed on $HONOR_HOST:/usr/local/bin/coppice-mcp"
echo "wire up Claude Code:"
echo "  claude mcp add coppice -- ssh $HONOR_HOST /usr/local/bin/coppice-mcp"
'''


# ──────────────────────────── UI deploy ─────────────────────────

[tasks."ui:deploy-honor"]
description = "Build + ship the /ui/ portal bundle to honor"
depends = ["ui:build"]
run = '''
set -eu
rsync -az --delete e2b-compat/ui/ "$HONOR_HOST:/tmp/ui/"
ssh "$HONOR_HOST" 'sudo rsync -a --delete /tmp/ui/ /usr/local/share/e2b-compat/ui/ && sudo rm -rf /usr/local/share/e2b-compat/ui-react'
echo "deployed ui → $HONOR_HOST:/usr/local/share/e2b-compat/ui/"
'''

[tasks."ui:windows-smoke"]
description = "Run the browser-driven Windows /ui smoke against a live gateway (defaults to localhost:3001)"
run = '''
set -eu
sh benchmarks/rigs/ui-windows-smoke.sh
'''

[tasks."ui:windows-smoke-honor"]
description = "Tunnel honor's gateway to localhost:3001 and run the Windows /ui smoke"
run = '''
set -eu
HONOR="${HONOR_HOST:-honor}"

ssh_pid=""
if ! curl -fsS http://127.0.0.1:3001/health >/dev/null 2>&1; then
  ssh -fN -L 3001:127.0.0.1:3000 "$HONOR"
  ssh_pid=$(pgrep -f "ssh -fN -L 3001:127.0.0.1:3000.*$HONOR" | head -1 || true)
fi
cleanup() { [ -n "${ssh_pid:-}" ] && kill "$ssh_pid" 2>/dev/null || true; }
trap cleanup EXIT

UI_BASE=http://127.0.0.1:3001/ui/ sh benchmarks/rigs/ui-windows-smoke.sh
'''

Driver helpers

common.sh is sourced by every per-config rig. It provides timestamp_ms (via Python for portability — FreeBSD date lacks %N), a run_concurrent wrapper, and a host_info_json probe.

benchmarks/rigs/common.sh 41 lines bash

#!/bin/sh
# Shared helpers for all rigs. Sourced, not run directly.
set -eu

timestamp_ms() {
  # FreeBSD date supports %N only through gdate; use python for portability.
  python3 -c 'import time; print(int(time.time()*1000))'
}

host_info_json() {
  python3 - <<'PY'
import json, platform, subprocess
def sysctl(k):
    return subprocess.run(['sysctl','-n',k], capture_output=True, text=True).stdout.strip()
print(json.dumps({
    'hostname': platform.node(),
    'kernel': platform.system(),
    'release': platform.release(),
    'cpuModel': sysctl('hw.model'),
    'cpuCount': int(sysctl('hw.ncpu') or 0),
    'memGB': round(int(sysctl('hw.physmem') or 0) / (1024**3), 2),
}))
PY
}

run_concurrent() {
  # Usage: run_concurrent N CMD...
  # Runs CMD N times in parallel, prints per-iteration elapsed_ms TSV.
  # The loop index (0..N-1) is appended as an extra argument so each
  # worker can form a unique name even though $$ is shared by subshells.
  # _rc_n/_rc_j are intentionally prefixed to avoid clobbering the
  # caller's loop variable (sh functions share scope with their callers).
  _rc_n=$1; shift
  _rc_j=0
  while [ $_rc_j -lt $_rc_n ]; do
    ( s=$(timestamp_ms); "$@" "$_rc_j" >/dev/null 2>&1; e=$(timestamp_ms); printf "%d\t%d\n" "$_rc_j" "$((e - s))" ) &
    _rc_j=$((_rc_j + 1))
  done
  wait
}

summarize.py reads the TSV a rig emits and writes a BenchmarkRun JSON file validated by the Zod schema in src/data/benchmarks.ts.

benchmarks/rigs/summarize.py 65 lines python

#!/usr/bin/env python3
"""Wrap raw TSV samples into the BenchmarkRun JSON schema."""
import argparse, json, statistics, datetime, sys, subprocess

def host_info():
    r = subprocess.run(['python3', '-c', '''
import json, platform, subprocess
def s(k): return subprocess.run(["sysctl","-n",k], capture_output=True, text=True).stdout.strip()
print(json.dumps({
    "hostname": platform.node(),
    "kernel": platform.system(),
    "release": platform.release(),
    "cpuModel": s("hw.model") or "unknown",
    "cpuCount": int(s("hw.ncpu") or 0),
    "memGB": round(int(s("hw.physmem") or 0) / (1024**3), 2),
}))
'''], capture_output=True, text=True)
    return json.loads(r.stdout)

def summarize(samples):
    ss = sorted(samples)
    def pct(p): return ss[min(len(ss) - 1, int(len(ss) * p / 100))]
    return {
        'mean': statistics.mean(ss),
        'p50': pct(50), 'p95': pct(95), 'p99': pct(99),
        'min': min(ss), 'max': max(ss), 'n': len(ss),
    }

def main():
    ap = argparse.ArgumentParser()
    ap.add_argument('--config', required=True)
    ap.add_argument('--metric', required=True)
    ap.add_argument('--concurrency', type=int, required=True)
    ap.add_argument('--script', required=True)
    ap.add_argument('--input', required=True)
    ap.add_argument('--output', required=True)
    ap.add_argument('--host-info-json', help='Pre-collected host_info_json; if omitted, shells out to collect')
    args = ap.parse_args()

    samples = []
    with open(args.input) as f:
        for line in f:
            line = line.strip()
            if not line: continue
            parts = line.split('\t')
            samples.append(float(parts[-1]))

    host = json.loads(args.host_info_json) if args.host_info_json else host_info()

    out = {
        'config': args.config,
        'host': host,
        'metric': args.metric,
        'concurrency': args.concurrency,
        'samples': samples,
        'summary': summarize(samples),
        'scriptPath': args.script,
        'runAt': datetime.datetime.utcnow().isoformat() + 'Z',
    }
    with open(args.output, 'w') as f:
        json.dump(out, f, indent=2)

if __name__ == '__main__':
    main()

Jail configurations

Jail — raw jail-raw

Plain jail, shared rootfs (cp -R from template), no VNET.

▸ reproduce · mise run bench:jail-raw

benchmarks/rigs/jail-raw.sh 46 lines cold-start rig

#!/bin/sh
# jail-raw.sh — plain jail with cp -R rootfs, no VNET, no pf.
# Usage: jail-raw.sh <concurrency> <total-iterations>
#
# Runs EXACTLY <total-iterations> jail create/destroy cycles, dispatched
# across a <concurrency>-sized worker pool. Emits one TSV line per
# iteration: <global-index>\t<elapsed-ms>.

set -eu
. "$(dirname "$0")/common.sh"

CONC=${1:-1}
ITERS=${2:-200}
TEMPLATE=${TEMPLATE:-/jails/_template}
[ -d "$TEMPLATE" ] || { echo "missing $TEMPLATE" >&2; exit 2; }

create_one() {
  id="bench-$$-$1"
  path="/jails/$id"
  cp -R "$TEMPLATE" "$path"
  jail -c name="$id" path="$path" host.hostname="$id" ip4=inherit persist \
       exec.start="/bin/echo ready" \
       >/dev/null
  jail -r "$id" >/dev/null 2>&1 || true
  rm -rf "$path"
}

i=0
while [ $i -lt "$ITERS" ]; do
  if [ "$CONC" -gt 1 ]; then
    batch_end=$(( i + CONC ))
    [ "$batch_end" -gt "$ITERS" ] && batch_end=$ITERS
    j=$i
    while [ "$j" -lt "$batch_end" ]; do
      ( s=$(timestamp_ms); create_one "$j" >/dev/null 2>&1; e=$(timestamp_ms); printf "%d\t%d\n" "$j" "$((e - s))" ) &
      j=$(( j + 1 ))
    done
    wait
    i=$batch_end
  else
    s=$(timestamp_ms); create_one "$i"; e=$(timestamp_ms)
    printf "%d\t%d\n" "$i" "$((e - s))"
    i=$(( i + 1 ))
  fi
done

benchmarks/rigs/jail-raw-rss.sh 38 lines idle-RSS rig

#!/bin/sh
# Start 32 jails, wait 1s, sum RSS of their init processes.
set -eu
. "$(dirname "$0")/common.sh"

TEMPLATE=${TEMPLATE:-/jails/_template}
N=32

i=0
while [ $i -lt $N ]; do
  id="bench-rss-$$-$i"; path="/jails/$id"
  cp -R "$TEMPLATE" "$path"
  jail -c name="$id" path="$path" host.hostname="$id" ip4=inherit persist \
       exec.start="/usr/sbin/daemon -f /bin/sleep 60" >/dev/null
  i=$((i + 1))
done

sleep 1
i=0
while [ $i -lt $N ]; do
  id="bench-rss-$$-$i"
  jid=$(jls -j "$id" jid 2>/dev/null || echo "")
  if [ -n "$jid" ]; then
    rss=$(ps -J "$jid" -o rss= | awk '{s+=$1} END {print s}')
    printf "%d\t%s\n" "$i" "$rss"
  fi
  i=$((i + 1))
done

# teardown
i=0
while [ $i -lt $N ]; do
  id="bench-rss-$$-$i"; path="/jails/$id"
  jail -r "$id" >/dev/null 2>&1 || true
  rm -rf "$path" 2>/dev/null || true
  i=$((i + 1))
done

Jail — VNET + pf jail-vnet-pf

Jail with VNET (per-jail epair network stack) + pf egress filter.

▸ reproduce · mise run bench:jail-vnet-pf

benchmarks/rigs/jail-vnet-pf.sh 50 lines cold-start rig

#!/bin/sh
# jail-vnet-pf.sh — jail with VNET (epair) + active pf.bench.conf.
# Usage: jail-vnet-pf.sh <concurrency> <total-iterations>
#
# Requires: if_epair loaded, pf active with safe ruleset (mise run bench:setup-pf).

set -eu
. "$(dirname "$0")/common.sh"

CONC=${1:-1}
ITERS=${2:-200}
TEMPLATE=${TEMPLATE:-/jails/_template}
[ -d "$TEMPLATE" ] || { echo "missing $TEMPLATE" >&2; exit 2; }

create_one() {
  id="benchvp-$$-$1"
  path="/jails/$id"
  cp -R "$TEMPLATE" "$path"
  epair_a=$(ifconfig epair create)
  epair_b=$(echo "$epair_a" | sed 's/a$/b/')
  oct=$(( $1 % 250 + 2 ))
  jail -c name="$id" path="$path" host.hostname="$id" vnet persist \
       vnet.interface="$epair_b" \
       exec.prestart="ifconfig $epair_b inet 10.88.$oct.2/24 up" \
       exec.start="/bin/echo ready" \
       >/dev/null
  jail -r "$id" >/dev/null 2>&1 || true
  ifconfig "$epair_a" destroy 2>/dev/null || true
  rm -rf "$path"
}

i=0
while [ $i -lt "$ITERS" ]; do
  if [ "$CONC" -gt 1 ]; then
    batch_end=$(( i + CONC ))
    [ "$batch_end" -gt "$ITERS" ] && batch_end=$ITERS
    j=$i
    while [ "$j" -lt "$batch_end" ]; do
      ( s=$(timestamp_ms); create_one "$j" >/dev/null 2>&1; e=$(timestamp_ms); printf "%d\t%d\n" "$j" "$((e - s))" ) &
      j=$(( j + 1 ))
    done
    wait
    i=$batch_end
  else
    s=$(timestamp_ms); create_one "$i"; e=$(timestamp_ms)
    printf "%d\t%d\n" "$i" "$((e - s))"
    i=$(( i + 1 ))
  fi
done

benchmarks/rigs/jail-vnet-pf-rss.sh 45 lines idle-RSS rig

#!/bin/sh
# Start 32 VNET jails, wait 1s, sum RSS of their init processes.
set -eu
. "$(dirname "$0")/common.sh"

TEMPLATE=${TEMPLATE:-/jails/_template}
N=32

i=0
while [ $i -lt $N ]; do
  id="benchvp-rss-$$-$i"; path="/jails/$id"
  cp -R "$TEMPLATE" "$path"
  ifconfig epair create >/tmp/ep.$$ 2>/dev/null
  epair=$(cat /tmp/ep.$$ | tr -d '\n'); epb=$(echo "$epair" | sed 's/a$/b/')
  rm -f /tmp/ep.$$
  jail -c name="$id" path="$path" host.hostname="$id" vnet persist \
       vnet.interface="$epb" \
       exec.prestart="ifconfig $epb inet 10.88.$(( $i % 250 )).2/24 up" \
       exec.start="/usr/sbin/daemon -f /bin/sleep 60" >/dev/null
  i=$((i + 1))
done

sleep 1
i=0
while [ $i -lt $N ]; do
  id="benchvp-rss-$$-$i"
  jid=$(jls -j "$id" jid 2>/dev/null || echo "")
  if [ -n "$jid" ]; then
    rss=$(ps -J "$jid" -o rss= | awk '{s+=$1} END {print s}')
    printf "%d\t%s\n" "$i" "$rss"
  fi
  i=$((i + 1))
done

# teardown
i=0
while [ $i -lt $N ]; do
  id="benchvp-rss-$$-$i"; path="/jails/$id"
  jail -r "$id" >/dev/null 2>&1 || true
  # destroy epair interfaces (they were moved to jails)
  ifconfig "epair${i}a" destroy 2>/dev/null || true
  rm -rf "$path" 2>/dev/null || true
  i=$((i + 1))
done

Jail — ZFS clone jail-zfs-clone

Jail with per-instance rootfs via ZFS clone of the template snapshot.

▸ reproduce · mise run bench:jail-zfs-clone

benchmarks/rigs/jail-zfs-clone.sh 47 lines cold-start rig

#!/bin/sh
# jail-zfs-clone.sh — per-iteration rootfs via ZFS clone of the template
# snapshot. No VNET, no pf.
# Usage: jail-zfs-clone.sh <concurrency> <total-iterations>
#
# Requires: zroot/jails/_template@base snapshot.

set -eu
. "$(dirname "$0")/common.sh"

CONC=${1:-1}
ITERS=${2:-200}
POOL=${POOL:-zroot/jails}
SNAP=${SNAP:-${POOL}/_template@base}

zfs list "$SNAP" >/dev/null 2>&1 || { echo "missing snapshot $SNAP" >&2; exit 2; }

create_one() {
  id="benchzfs-$$-$1"
  path="/jails/$id"
  zfs clone "$SNAP" "$POOL/$id"
  jail -c name="$id" path="$path" host.hostname="$id" ip4=inherit persist \
       exec.start="/bin/echo ready" \
       >/dev/null
  jail -r "$id" >/dev/null 2>&1 || true
  zfs destroy "$POOL/$id" 2>/dev/null || true
}

i=0
while [ $i -lt "$ITERS" ]; do
  if [ "$CONC" -gt 1 ]; then
    batch_end=$(( i + CONC ))
    [ "$batch_end" -gt "$ITERS" ] && batch_end=$ITERS
    j=$i
    while [ "$j" -lt "$batch_end" ]; do
      ( s=$(timestamp_ms); create_one "$j" >/dev/null 2>&1; e=$(timestamp_ms); printf "%d\t%d\n" "$j" "$((e - s))" ) &
      j=$(( j + 1 ))
    done
    wait
    i=$batch_end
  else
    s=$(timestamp_ms); create_one "$i"; e=$(timestamp_ms)
    printf "%d\t%d\n" "$i" "$((e - s))"
    i=$(( i + 1 ))
  fi
done

benchmarks/rigs/jail-zfs-clone-rss.sh 38 lines idle-RSS rig

#!/bin/sh
# Start 32 ZFS-clone jails, wait 1s, sum RSS of their init processes.
set -eu
. "$(dirname "$0")/common.sh"

POOL=zroot/jails; SNAP=${POOL}/_template@base
N=32

i=0
while [ $i -lt $N ]; do
  id="benchzfs-rss-$$-$i"; path="/jails/$id"
  zfs clone "$SNAP" "$POOL/$id"
  jail -c name="$id" path="$path" host.hostname="$id" ip4=inherit persist \
       exec.start="/usr/sbin/daemon -f /bin/sleep 60" >/dev/null
  i=$((i + 1))
done

sleep 1
i=0
while [ $i -lt $N ]; do
  id="benchzfs-rss-$$-$i"
  jid=$(jls -j "$id" jid 2>/dev/null || echo "")
  if [ -n "$jid" ]; then
    rss=$(ps -J "$jid" -o rss= | awk '{s+=$1} END {print s}')
    printf "%d\t%s\n" "$i" "$rss"
  fi
  i=$((i + 1))
done

# teardown
i=0
while [ $i -lt $N ]; do
  id="benchzfs-rss-$$-$i"
  jail -r "$id" >/dev/null 2>&1 || true
  zfs destroy "$POOL/$id" 2>/dev/null || true
  i=$((i + 1))
done

Jail — VNET + pf + ZFS clone jail-vnet-zfs-clone

The fair VNET + pf comparison: ZFS-clone rootfs + per-jail VNET stack + active pf egress filter.

▸ reproduce · mise run bench:jail-vnet-zfs-clone

benchmarks/rigs/jail-vnet-zfs-clone.sh 56 lines cold-start rig

#!/bin/sh
# jail-vnet-zfs-clone.sh — jail with VNET (epair) + active pf.bench.conf,
# but per-iteration rootfs via ZFS clone of the template snapshot. This
# is the fair comparison for "network-isolated jail with dynamic egress
# policy", since jail-vnet-pf uses cp -R and is dominated by rootfs cost.
# Usage: jail-vnet-zfs-clone.sh <concurrency> <total-iterations>
#
# Requires: if_epair loaded; zroot/jails/_template@base snapshot;
# pf active with safe ruleset (`mise run bench:setup-pf`).

set -eu
. "$(dirname "$0")/common.sh"

CONC=${1:-1}
ITERS=${2:-200}
POOL=${POOL:-zroot/jails}
SNAP=${SNAP:-${POOL}/_template@base}

zfs list "$SNAP" >/dev/null 2>&1 || { echo "missing snapshot $SNAP" >&2; exit 2; }

create_one() {
  id="benchvpz-$$-$1"
  path="/jails/$id"
  zfs clone "$SNAP" "$POOL/$id"
  epair_a=$(ifconfig epair create)
  epair_b=$(echo "$epair_a" | sed 's/a$/b/')
  oct=$(( $1 % 250 + 2 ))
  jail -c name="$id" path="$path" host.hostname="$id" vnet persist \
       vnet.interface="$epair_b" \
       exec.prestart="ifconfig $epair_b inet 10.88.$oct.2/24 up" \
       exec.start="/bin/echo ready" \
       >/dev/null
  jail -r "$id" >/dev/null 2>&1 || true
  ifconfig "$epair_a" destroy 2>/dev/null || true
  zfs destroy "$POOL/$id" 2>/dev/null || true
}

i=0
while [ $i -lt "$ITERS" ]; do
  if [ "$CONC" -gt 1 ]; then
    batch_end=$(( i + CONC ))
    [ "$batch_end" -gt "$ITERS" ] && batch_end=$ITERS
    j=$i
    while [ "$j" -lt "$batch_end" ]; do
      ( s=$(timestamp_ms); create_one "$j" >/dev/null 2>&1; e=$(timestamp_ms); printf "%d\t%d\n" "$j" "$((e - s))" ) &
      j=$(( j + 1 ))
    done
    wait
    i=$batch_end
  else
    s=$(timestamp_ms); create_one "$i"; e=$(timestamp_ms)
    printf "%d\t%d\n" "$i" "$((e - s))"
    i=$(( i + 1 ))
  fi
done

benchmarks/rigs/jail-vnet-zfs-clone-rss.sh 46 lines idle-RSS rig

#!/bin/sh
# Start 32 VNET+ZFS-clone jails, wait 1s, sum RSS of their init processes.
set -eu
. "$(dirname "$0")/common.sh"

POOL=${POOL:-zroot/jails}
SNAP=${SNAP:-${POOL}/_template@base}
N=32

i=0
while [ $i -lt $N ]; do
  id="benchvpz-rss-$$-$i"; path="/jails/$id"
  zfs clone "$SNAP" "$POOL/$id"
  epair_a=$(ifconfig epair create)
  epair_b=$(echo "$epair_a" | sed 's/a$/b/')
  jail -c name="$id" path="$path" host.hostname="$id" vnet persist \
       vnet.interface="$epair_b" \
       exec.prestart="ifconfig $epair_b inet 10.88.$(( i + 2 )).2/24 up" \
       exec.start="/usr/sbin/daemon -f /bin/sleep 60" >/dev/null
  i=$((i + 1))
done
wait

sleep 1
i=0
while [ $i -lt $N ]; do
  id="benchvpz-rss-$$-$i"
  jid=$(jls -j "$id" jid 2>/dev/null || echo "")
  if [ -n "$jid" ]; then
    rss=$(ps -J "$jid" -o rss= | awk '{s+=$1} END {print s}')
    printf "%d\t%s\n" "$i" "$rss"
  fi
  i=$((i + 1))
done

# teardown
for d in /jails/benchvpz-rss-$$-*; do
  [ -d "$d" ] || continue
  name=$(basename "$d")
  jail -r "$name" 2>/dev/null || true
  zfs list "$POOL/$name" >/dev/null 2>&1 && zfs destroy -f "$POOL/$name" 2>/dev/null || rm -rf "$d" 2>/dev/null
done
for ep in $(ifconfig -l | tr ' ' '\n' | grep '^epair'); do
  ifconfig "$ep" destroy 2>/dev/null || true
done

bhyve configurations

All bhyve configs share a FreeBSD 15 VM image fetched once:

ssh honor 'mkdir -p /tmp/bhyve-images
cd /tmp/bhyve-images && fetch https://download.freebsd.org/releases/VM-IMAGES/15.0-RELEASE/amd64/Latest/FreeBSD-15.0-RELEASE-amd64-ufs.raw.xz && xz -d FreeBSD-15.0-RELEASE-amd64-ufs.raw.xz'

Durable bhyve configs (bhyve-durable-pool and bhyve-durable-prewarm-pool) additionally require a host kernel compiled with options BHYVE_SNAPSHOT, plus a bhyve + bhyvectl userspace built with WITH_BHYVE_SNAPSHOT=YES. The option is not in GENERIC on FreeBSD 15.0-RELEASE, so we built it from source. The full kernel reproduction:

# 1. source (if not already installed)
sudo fetch -o /tmp/src.txz https://download.freebsd.org/releases/amd64/15.0-RELEASE/src.txz
sudo tar -C /usr/src -xf /tmp/src.txz   # roughly 250 MB

# 2. author SNAPSHOT config (GENERIC + options BHYVE_SNAPSHOT)
sudo tee /usr/src/sys/amd64/conf/SNAPSHOT > /dev/null <<EOF
include GENERIC
ident   SNAPSHOT
options BHYVE_SNAPSHOT
EOF

# 3. kernel build — ~5 min on a Ryzen 9 5900HX with -j16
sudo make -C /usr/src -j16 buildkernel KERNCONF=SNAPSHOT

# 4. SAFETY NET: create a ZFS boot environment snapshot before swapping kernels
sudo bectl create pre-snapshot-kernel-$(date +%Y-%m-%d)

# 5. install kernel (current kernel moves to /boot/kernel.old/)
sudo make -C /usr/src DESTDIR=/ installkernel KERNCONF=SNAPSHOT

# 6. reboot
sudo shutdown -r now

# 7. post-reboot: rebuild bhyve + bhyvectl userspace with the option
sudo make -C /usr/src/usr.sbin/bhyvectl WITH_BHYVE_SNAPSHOT=YES MK_BHYVE_SNAPSHOT=yes install
sudo make -C /usr/src/usr.sbin/bhyve    WITH_BHYVE_SNAPSHOT=YES MK_BHYVE_SNAPSHOT=yes install

# 8. verify
bhyvectl 2>&1 | grep -E '\-\-suspend|\-\-checkpoint'   # should now appear
sysctl kern.ident                                            # SNAPSHOT

The bectl create line is the recovery safety net for the kernel swap. If the new kernel fails to boot, press 8 at the loader menu (Boot Environments), pick the pre-kernel BE, press enter. The SNAPSHOT swap booted clean in our case, so we never invoked the rollback. (The earlier pf lockout below was recovered via physical console + pfctl -d, before the BE-safety-net pattern landed in this workflow.)

Legacy caveat — the following from an earlier pass still applies to the bhyve-minimal rig, which is still a stub:

for bhyve-full and bhyve-prewarm-pool a FreeBSD VM image must be fetched from freebsd.org; for bhyve-minimal a custom MINIMAL kernel needs to be built from /usr/src. The mise tasks exist as placeholders; see .mise.toml.

bhyve — full guest bhyve-full
Full FreeBSD guest booted per iteration (GENERIC kernel, baseline).
mise run bench:bhyve-full (returns exit 1 with TODO message)
bhyve — minimal bhyve-minimal
Stripped MINIMAL kernel + tiny initramfs — the apples-to-apples microVM config.
mise run bench:bhyve-minimal (returns exit 1 with TODO message)
bhyve — pre-warm pool bhyve-prewarm-pool
Pre-booted + SIGSTOP paused VMs; "create" == SIGCONT. Proxy for Cloud Hypervisor snapshot-restore.
mise run bench:bhyve-prewarm-pool (returns exit 1 with TODO message)
bhyve — durable pool bhyve-durable-pool
Resume from on-disk bhyvectl --suspend checkpoint (requires SNAPSHOT kernel). Survives reboot; the real analog to Cube durable snapshots.
mise run bench:bhyve-durable-pool (returns exit 1 with TODO message)
bhyve — durable + prewarm bhyve-durable-prewarm-pool
Two-tier pool: on-disk ckps as cold tier, N prewarmed-and-SIGSTOP'd VMs as hot tier. The actual Cube analog.
mise run bench:bhyve-durable-prewarm-pool (returns exit 1 with TODO message)

pf lockout safety

On first run, the jail-vnet-pf setup phase locked honor out of SSH for ~35 minutes at 2026-04-21T22:54. The ruleset was block out all / pass out on lo0 all with no explicit SSH pass rules; applying it with pfctl -e dropped the outbound half of the active SSH session and the host went dark until it was physically recovered.

The fix is layered. The committed ruleset at benchmarks/rigs/pf.bench.conf has:

set skip on lo0 — pf never filters loopback, period.
pass in quick proto tcp to port 22 keep state and pass out quick proto tcp from port 22 keep state — management SSH is permitted in both directions regardless of any later block rule.
pass quick proto icmp — ICMP stays open for quick sanity probes.
block out all — default-deny egress, which is the actual experimental rule we're measuring the cost of.

But even a correct ruleset has a bootstrap problem: if you typo the next version of it, you still lock yourself out. The committed setup-pf.sh wraps pfctl -f + pfctl -e in a daemon(8)-spawned dead-man switch. If the script doesn't reach its final kill line within DMS_TIMEOUT seconds (default 60), pfctl -d fires from the dead-man child and pf disables itself. The mise task bench:setup-pf drives this end-to-end and performs an independent SSH-reachability check from the dev machine post-apply — if that check fails, the dead-man handles the cleanup.

benchmarks/rigs/pf.bench.conf 31 lines pf ruleset

# pf.bench.conf — ruleset applied on honor for the jail-vnet-pf rig.
#
# IMPORTANT: designed to NEVER lock out the SSH control plane. The
# earlier version of this ruleset (`block out all` with no management
# pass rules) wedged honor at 2026-04-21T22:54 because the outbound
# half of the active SSH session got dropped. Do not reintroduce that
# shape. Both `pass in` and `pass out` for TCP/22 are explicit here,
# and `set skip on lo0` guarantees loopback is never filtered.
#
# Behavior: default-deny egress on the external NIC, except the SSH
# management traffic and ICMP. This matches the posture CubeVS
# enforces per-sandbox — block all except explicitly allowed.

# Loopback — never filter.
set skip on lo0

# SSH management plane, both directions. Keep-state required so once a
# session is established, reply traffic is authorized from the state
# entry rather than needing its own rule.
pass in  quick proto tcp to port 22 keep state
pass out quick proto tcp from port 22 keep state

# ICMP — lets ping-based health checks keep working.
pass quick proto icmp
pass quick proto icmp6

# Default egress: block everything else. This is the part we are
# actually measuring — the cost of per-packet pf filtering on jail
# egress traffic.
block out all

benchmarks/rigs/setup-pf.sh 52 lines dead-man-guarded applier

#!/bin/sh
# setup-pf.sh — apply pf.bench.conf with a dead-man's-switch that auto-
# reverts if SSH isn't confirmed alive within DMS_TIMEOUT seconds.
#
# Usage (as root, typically via `mise run bench:setup-pf`):
#     sh setup-pf.sh /path/to/pf.bench.conf
#
# Safety model:
#   1. Spawn a detached child via daemon(8) that sleeps DMS_TIMEOUT
#      seconds and then calls `pfctl -d`. Its PID is written to a
#      pidfile so we can cancel it.
#   2. Apply the ruleset with pfctl -f and enable pf.
#   3. If we reach the end of the script without the shell dying, cancel
#      the dead-man.
#
# If SSH goes away during (2) — e.g. because the rules were wrong — the
# dead-man fires, pf is disabled, and the host becomes reachable again.

set -eu

RULES=${1:?usage: setup-pf.sh /path/to/pf.bench.conf}
DMS_TIMEOUT=${DMS_TIMEOUT:-60}
DMS_PIDFILE=/tmp/bench-pf-deadman.pid

if [ ! -f "$RULES" ]; then
  echo "setup-pf: rules file not found: $RULES" >&2
  exit 2
fi

# Ensure pf kernel module is loaded.
kldload pf 2>/dev/null || true

# Fire the dead-man via daemon(8) so it survives our shell exiting.
# daemon -f detaches, -p writes the child's pid.
daemon -f -p "$DMS_PIDFILE" /bin/sh -c "sleep $DMS_TIMEOUT; pfctl -d; logger 'bench-safe: pf disabled by deadman after ${DMS_TIMEOUT}s'"
sleep 0.2  # let daemon write the pidfile

# Apply ruleset and enable pf. `pfctl -f` preserves the existing state
# table, so an active SSH session is kept alive across reload.
pfctl -f "$RULES"
pfctl -e 2>/dev/null || true
pfctl -s info | head -1

# Cancel the dead-man — we're alive and rules applied cleanly.
if [ -f "$DMS_PIDFILE" ]; then
  DMS_PID=$(cat "$DMS_PIDFILE")
  kill -TERM "$DMS_PID" 2>/dev/null || true
  rm -f "$DMS_PIDFILE"
fi

echo "setup-pf: bench ruleset active; deadman cancelled"

Known caveats

Host contention. honor is not dedicated to these benchmarks; other workloads on the host contribute variance. Re-run for statistical confidence, don't trust single passes.
Page-cache warmup. First iterations of each rig touch cold disk. We intentionally don't discard the first N samples — all samples are in the raw TSV and the summary percentiles reflect them. If you want a warm-only number, summarize over a trimmed slice.
"Sandbox delivery" definition. The clock starts when the rig invokes the VM/jail create and stops when exec "echo ready" returns successfully. Pool-warming, rootfs provisioning, and network setup are all inside that window.
Concurrency scaling. run_concurrent spawns N background shells and waits. The per-iteration elapsed time is measured end-to-end in each shell, so at high concurrency you are measuring best-effort parallel throughput, not serial latency.