Appendix · Methodology

Bench rig

Every number on the site traces to a script in benchmarks/rigs/ invoked by a mise task. This page shows both, read directly from the filesystem at build time.

The chain is simple. A mise run bench:<config> invocation SCPs benchmark shell scripts to $HONOR_HOST, runs them over SSH under sudo, captures TSV samples, and wraps them in a typed BenchmarkRun JSON file in benchmarks/results/. The Chart component reads those JSON files at build time. Reproducibility is one shell command per chart.

Host comparison: honor vs. Tencent's reference

Tencent's disclosed host

From the repo at pinned commit c439bb5, the whole catalog of hardware disclosure for the published <60ms / p95=90ms / p99=137ms figures amounts to the word "bare-metal". README.md:94: "Cold start benchmarked on bare-metal. 60ms at single concurrency; under 50 concurrent creations, avg 67ms, P95 90ms, P99 137ms — consistently sub-150ms." That sentence is not elaborated anywhere in README.md, README_zh.md, docs/**, any closed/open GitHub issue, or the v0.1.0 release notes. The only external data point — an aibase.com summary of the Tencent blog — mentions a "96-core physical server" in the density context ("2000+ sandboxes on one machine"), not the cold-start measurements. CPU vendor/model/clock, RAM size/type, storage medium, host kernel, guest vmlinux, and guest rootfs byte size are never published.

More consequential: the in-tree benchmark at CubeAPI/benchmark/runner.go:25-88 times a single HTTP POST /sandboxes round trip — clock starts right before client.Do(req), stops when the response headers return. The published 60ms is create-request latency from an HTTP client, not guest userspace readiness. It includes CubeAPI parsing, CubeMaster scheduling, Cubelet snapshot-clone + VMM fork, CubeVS network-agent plumbing, and the API's response write. It does not wait for a guest-side /ready probe or an exec check. The number is also heavily assisted by "resource pool pre-provisioning and snapshot cloning" (README.md:73) — the instance isn't booting a kernel; it's cloning a pre-warmed VMM snapshot. Apples-to-apples against "time-to-boot" is not what this number measures.

honor

From ssh honor 'sysctl -n hw.model hw.ncpu hw.physmem; zpool list zroot':

What we can conclude

Three overlapping caveats make any direct comparison apples-to-kumquats:

The ethical comparison isn't a head-to-head table. It's "here's what honor does under our clearly-defined methodology" next to "here's what Tencent claims under their under-specified methodology" — which is what /freebsd-jails and /claims do, with this caveat linked prominently.

References: README.md:94, README.md:142, README.md:73, CubeAPI/benchmark/runner.go:25-88, deploy/guest-image/Dockerfile:1, deploy/one-click/build-vm-assets.sh:219-355, deploy/one-click/assets/kernel-artifacts/README.md.

Tasks

The canonical task list is .mise.toml at the repo root. Install mise, then from the project directory:

# Single config, end-to-end (sync rigs, capture host info, run all concurrencies + RSS):
mise run bench:jail-raw
mise run bench:jail-vnet-pf
mise run bench:jail-zfs-clone

# All implemented configs:
mise run bench:all-jails

Each bench:* task depends on bench:sync (scp the rigs to honor:/tmp/bench-rigs/) and bench:host-info (capture kernel/release/CPU/RAM so the summary JSON embeds a full host record). The internal bench:_run helper iterates over concurrencies 1/10/50 for cold-start and one RSS pass at cc=32.

.mise.toml 843 lines mise tasks
# .mise.toml — task runner for this research notebook.
#
# Invoke with: `mise run <task>` or `mise run <task-a> <task-b>`.
# Reproducibility is the point — if a number is on the site, the mise
# task that produced it is in this file, and the underlying shell rig
# is in benchmarks/rigs/. The site reads both at build time.

[tools]
node = "22"
go = "1.25"
pnpm = "10"
python = "3.12"

[env]
CUBESANDBOX_COMMIT = "c439bb513f5124d4d9389451b31b8aeb87ab539c"
HONOR_HOST         = "honor"
HONOR_RIG_DIR      = "/tmp/bench-rigs"
# Shallow defaults for first-pass signal. Bump these (200/100/50) once
# we like the shape. Jail creates aren't cheap — cc=1 at ~1s each.
BENCH_ITERS_CC1    = "30"
BENCH_ITERS_CC10   = "30"
BENCH_ITERS_CC50   = "50"

# ──────────────────────────── site ─────────────────────────────

[tasks.dev]
description = "Run the Astro dev server"
run = "pnpm dev"

[tasks."admin:dev-honor"]
description = "Run the Astro admin dev server on :4327 with honor's gateway tunnel and bearer token"
run = '''
set -eu
if ! curl -fsS http://127.0.0.1:3001/health >/dev/null 2>&1; then
  ssh -fN -o ExitOnForwardFailure=yes -L 127.0.0.1:3001:127.0.0.1:3000 "$HONOR_HOST"
fi
COPPICE_ADMIN_TOKEN="$(ssh "$HONOR_HOST" 'sudo cat /var/lib/coppice/bench-token')"
export COPPICE_ADMIN_TOKEN
exec pnpm dev -- --host 0.0.0.0 --port 4327
'''

[tasks.build]
description = "Build the static site"
run = "pnpm build"

[tasks.test]
description = "Run component + schema tests"
run = "pnpm test"

[tasks.check]
description = "Run astro check"
run = "pnpm check"

[tasks."site:deploy-honor"]
description = "Build the static site and rsync dist/ to $HONOR_HOST:/usr/local/share/coppice-site/. Used for in-progress UX review at http://honor:4322/admin/."
depends = ["build"]
run = """
set -eu
ssh "$HONOR_HOST" 'sudo mkdir -p /usr/local/share/coppice-site && sudo chown $(id -un):$(id -gn) /usr/local/share/coppice-site'
rsync -a --delete dist/ "$HONOR_HOST":/usr/local/share/coppice-site/
echo "deployed dist → $HONOR_HOST:/usr/local/share/coppice-site/"
ssh "$HONOR_HOST" 'pgrep -f "http.server 4322" >/dev/null || (cd /usr/local/share/coppice-site && daemon -f -o /tmp/coppice-site.log /usr/local/bin/python3 -m http.server 4322 --bind 127.0.0.1)'
echo "site live: ssh -L 4322:localhost:4322 $HONOR_HOST → http://localhost:4322/admin/"
"""

[tasks.links]
description = "Check built site for broken links"
run = ["pnpm build", "pnpm links"]

# ──────────────────────────── research ──────────────────────────

[tasks."research:clone"]
description = "Clone CubeSandbox at the pinned commit into /tmp/cubesandbox-research"
run = '''
set -eu
mkdir -p /tmp/cubesandbox-research
cd /tmp/cubesandbox-research
[ -d CubeSandbox/.git ] || git clone https://github.com/TencentCloud/CubeSandbox.git
cd CubeSandbox
git fetch origin main
git checkout "$CUBESANDBOX_COMMIT"
echo "Pinned at $(git rev-parse HEAD)"
'''

# ──────────────────────────── benchmarks ────────────────────────

[tasks."bench:sync"]
description = "Sync benchmarks/rigs to $HONOR_HOST:$HONOR_RIG_DIR"
run = 'scp -r benchmarks/rigs "$HONOR_HOST:$HONOR_RIG_DIR"'

[tasks."bench:setup-pf"]
description = "Apply the benchmark pf ruleset on $HONOR_HOST with a dead-man switch that auto-disables pf after DMS_TIMEOUT seconds if we lose control."
depends = ["bench:sync"]
run = '''
set -eu
: "${DMS_TIMEOUT:=60}"
ssh "$HONOR_HOST" "sudo sh -c 'DMS_TIMEOUT=$DMS_TIMEOUT sh $HONOR_RIG_DIR/setup-pf.sh $HONOR_RIG_DIR/pf.bench.conf'"
# Independent verification from the dev machine — if this fails, the
# dead-man is running and pf will self-disable within DMS_TIMEOUT.
sleep 2
ssh "$HONOR_HOST" 'echo "SSH still alive: $(hostname)"'
'''

[tasks."bench:host-info"]
description = "Capture $HONOR_HOST's kernel/release/CPU/RAM into /tmp/honor-host.json"
run = '''
ssh "$HONOR_HOST" 'python3 - <<PY
import json, platform, subprocess
def s(k): return subprocess.run(["sysctl","-n",k], capture_output=True, text=True).stdout.strip()
print(json.dumps({
  "hostname": platform.node(), "kernel": platform.system(),
  "release": platform.release(), "cpuModel": s("hw.model"),
  "cpuCount": int(s("hw.ncpu") or 0),
  "memGB": round(int(s("hw.physmem") or 0) / (1024**3), 2),
}))
PY' > /tmp/honor-host.json
cat /tmp/honor-host.json
'''

# Internal helper: execute one config × (cc1/10/50 cold-start + idle RSS).
# Takes $CONFIG as env var. RSS concurrency defaults to 32 (jails); set
# $RSS_CC for configs where 32 is impractical (bhyve VMs at 512MB each
# would reserve 16GB — we use cc=8 there). Cold-start iteration counts
# per concurrency may also be overridden via $ITERS_CC1/10/50 for configs
# where a boot takes seconds, not milliseconds.
[tasks."bench:_run"]
description = "Internal: run one config's full sweep (cold-start cc1/10/50 + idle RSS)"
hide = true
run = '''
set -eu
: "${CONFIG:?must set CONFIG=jail-raw|jail-vnet-pf|jail-zfs-clone|bhyve-*}"
: "${RSS_CC:=32}"
: "${ITERS_CC1:=$BENCH_ITERS_CC1}"
: "${ITERS_CC10:=$BENCH_ITERS_CC10}"
: "${ITERS_CC50:=$BENCH_ITERS_CC50}"
: "${RIG_ENV:=}"
mkdir -p benchmarks/results
HOST_JSON=$(cat /tmp/honor-host.json)

for CC in 1 10 50; do
  case $CC in 1) ITERS=$ITERS_CC1;; 10) ITERS=$ITERS_CC10;; 50) ITERS=$ITERS_CC50;; esac
  echo "▸ $CONFIG cold-start @ cc=$CC (iters=$ITERS)"
  ssh "$HONOR_HOST" "cd $HONOR_RIG_DIR && sudo sh -c '$RIG_ENV sh $CONFIG.sh $CC $ITERS'" > "/tmp/$CONFIG-cc$CC.tsv"
  python3 benchmarks/rigs/summarize.py \
    --config "$CONFIG" --metric cold-start-ms --concurrency "$CC" \
    --script "benchmarks/rigs/$CONFIG.sh" --input "/tmp/$CONFIG-cc$CC.tsv" \
    --output "benchmarks/results/${CONFIG}_cold-start-ms_cc${CC}.json" \
    --host-info-json "$HOST_JSON"
done

echo "▸ $CONFIG idle RSS @ cc=$RSS_CC"
ssh "$HONOR_HOST" "cd $HONOR_RIG_DIR && sudo sh $CONFIG-rss.sh" > "/tmp/$CONFIG-rss.tsv"
python3 benchmarks/rigs/summarize.py \
  --config "$CONFIG" --metric rss-kb-idle-1s --concurrency "$RSS_CC" \
  --script "benchmarks/rigs/$CONFIG-rss.sh" --input "/tmp/$CONFIG-rss.tsv" \
  --output "benchmarks/results/${CONFIG}_rss-kb-idle-1s_cc${RSS_CC}.json" \
  --host-info-json "$HOST_JSON"
'''

[tasks."bench:jail-raw"]
description = "jail-raw: cold-start @ cc=1,10,50 + idle RSS @ cc=32"
depends = ["bench:sync", "bench:host-info"]
env = { CONFIG = "jail-raw" }
run = "mise run bench:_run"

[tasks."bench:jail-vnet-pf"]
description = "jail-vnet-pf (VNET + pf egress filter): cold-start + idle RSS"
depends = ["bench:sync", "bench:host-info"]
env = { CONFIG = "jail-vnet-pf" }
run = "mise run bench:_run"

[tasks."bench:jail-zfs-clone"]
description = "jail-zfs-clone (per-jail ZFS clone rootfs): cold-start + idle RSS"
depends = ["bench:sync", "bench:host-info"]
env = { CONFIG = "jail-zfs-clone" }
run = "mise run bench:_run"

[tasks."bench:all-jails"]
description = "All three jail configurations"
depends = ["bench:jail-raw", "bench:jail-vnet-pf", "bench:jail-zfs-clone"]

# bhyve tasks. Each config follows the same pattern as the jail tasks:
# bench:_run runs cold-start at cc=1/10/50 plus an idle-RSS sweep.
# bhyve VMs dominate the runtime budget — cc=50 cold-start can take
# minutes at full guest memory. See the rig scripts for iter counts.

[tasks."bench:bhyve-full"]
description = "bhyve-full (full FreeBSD 15 GENERIC guest per iter): cold-start + idle RSS"
depends = ["bench:sync", "bench:host-info"]
env = { CONFIG = "bhyve-full", RSS_CC = "8", ITERS_CC1 = "10", ITERS_CC10 = "10", ITERS_CC50 = "50", RIG_ENV = "TIMEOUT_SEC=180" }
run = "mise run bench:_run"

[tasks."bench:bhyve-minimal"]
description = "bhyve-minimal (MINIMAL-BHYVE kernel — STUB: guest rc.conf not yet trimmed)"
depends = ["bench:sync", "bench:host-info"]
env = { CONFIG = "bhyve-minimal", RSS_CC = "8", ITERS_CC1 = "10", ITERS_CC10 = "10", ITERS_CC50 = "50", RIG_ENV = "TIMEOUT_SEC=60" }
run = "mise run bench:_run"

[tasks."bench:bhyve-prewarm-pool"]
description = "bhyve-prewarm-pool (SIGSTOP/SIGCONT proxy for CH snapshot-restore): cold-start + idle RSS"
depends = ["bench:sync", "bench:host-info"]
env = { CONFIG = "bhyve-prewarm-pool", RSS_CC = "8", ITERS_CC1 = "30", ITERS_CC10 = "30", ITERS_CC50 = "50", RIG_ENV = "POOL_SIZE=50 BOOT_TIMEOUT=240" }
run = "mise run bench:_run"

[tasks."bench:bhyve-durable-pool-setup"]
description = "One-time pool-setup: boot N VMs, bhyvectl --suspend each to /vms/pool/ (requires SNAPSHOT kernel)"
depends = ["bench:sync"]
run = 'ssh "$HONOR_HOST" "sudo sh $HONOR_RIG_DIR/bhyve-durable-pool-setup.sh"'

[tasks."bench:bhyve-durable-pool"]
description = "bhyve-durable-pool: resume from on-disk bhyvectl --suspend checkpoint (SNAPSHOT kernel)"
depends = ["bench:sync", "bench:host-info"]
env = { CONFIG = "bhyve-durable-pool" }
run = "mise run bench:_run"

[tasks."bench:all"]
description = "Run every bench config that's implemented"
depends = ["bench:all-jails"]

[tasks."bench:clean-remote"]
description = "Remove /tmp/bench-rigs and bench tempfiles on $HONOR_HOST"
run = 'ssh "$HONOR_HOST" "rm -rf $HONOR_RIG_DIR /tmp/*.tsv /tmp/bhyve-*.log" || true'

[tasks."sdk:node-roundtrip"]
description = "Run the Node SDK round-trip rig against honor and capture a receipt"
run = 'sh benchmarks/rigs/sdk-node-roundtrip.sh'

[tasks."sdk:go-roundtrip"]
description = "Run the Go round-trip rig against honor and capture a receipt"
run = 'sh benchmarks/rigs/sdk-go-roundtrip.sh'

[tasks."sdk:roundtrip"]
description = "Run both Node and Go SDK round-trip rigs and capture a combined receipt"
run = 'sh benchmarks/rigs/sdk-roundtrip.sh'

[tasks."sdk:files-watch"]
description = "Run the filesystem watch SDK rig against honor and capture a receipt"
run = 'sh benchmarks/rigs/files-watch.sh'

[tasks."agents:openai-smoke"]
description = "Run the OpenAI Agents SDK + Coppice session/tool smoke receipt"
run = 'sh benchmarks/rigs/openai-agents-coppice-smoke.sh'

[tasks."agents:openai-e2b-client-smoke"]
description = "Run the competitor-shaped OpenAI Agents SDK + E2B client receipt"
run = 'sh benchmarks/rigs/openai-agents-e2b-client-smoke.sh'

[tasks."agents:openai-code-interpreter-smoke"]
description = "Run the competitor-shaped OpenAI Agents code-interpreter receipt"
run = 'sh benchmarks/rigs/openai-agents-code-interpreter-smoke.sh'

[tasks."auth:api-key-smoke"]
description = "Start a local gateway with COPPICE_API_KEYS and prove API/envd auth enforcement"
run = 'sh benchmarks/rigs/api-key-auth-smoke.sh'

[tasks."auth:cli-smoke"]
description = "Run the coppice CLI login/whoami/logout credential-flow receipt"
run = 'sh benchmarks/rigs/cli-auth-smoke.sh'

[tasks."agents:mini-rl-smoke"]
description = "Run the SWE/mini-RL agent-loop smoke receipt"
run = 'sh benchmarks/rigs/mini-rl-training-smoke.sh'

[tasks."agents:python-decorator-smoke"]
description = "Run the Modal-style Python decorator ergonomics smoke receipt"
run = 'sh benchmarks/rigs/python-decorator-smoke.sh'

[tasks."agents:codex-cli-smoke"]
description = "Run a local Codex CLI subscription-backed Coppice lifecycle receipt"
run = 'sh benchmarks/rigs/codex-cli-coppice-smoke.sh'

[tasks."agents:demos"]
description = "Run OpenAI Agents SDK and mini-RL/SWE-style demo receipts"
run = 'sh benchmarks/rigs/agent-demos.sh'

[tasks."backend:parity-honor"]
description = "Run backend/API parity smoke receipts on $HONOR_HOST; BACKEND_PARITY_SUITE=quick|full|recovery"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
: "${BACKEND_PARITY_SUITE:=quick}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo GATEWAY=http://127.0.0.1:3000 BACKEND_PARITY_SUITE='$BACKEND_PARITY_SUITE' sh benchmarks/rigs/backend-parity-honor.sh '$BACKEND_PARITY_SUITE'"
'''

[tasks."limits:live-resize-honor"]
description = "Create a fresh jail on honor and prove PATCH /sandboxes/:id/limits updates rctl + ZFS quota live"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo sh benchmarks/rigs/live-limits-resize.sh"
'''

[tasks."lifecycle:auto-suspend-resume-honor"]
description = "Create a fresh jail on honor and prove idle pause + SDK traffic auto-resume"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo sh benchmarks/rigs/auto-suspend-resume-smoke.sh"
'''

[tasks."lifecycle:bhyve-pause-resume-honor"]
description = "Create a fresh bhyve sandbox on honor and prove gateway pause/resume preserves the owned pool slot"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo sh benchmarks/rigs/bhyve-pause-resume-smoke.sh"
'''

[tasks."lifecycle:events-honor"]
description = "Create a fresh jail on honor and prove the E2B-shaped lifecycle events API"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo sh benchmarks/rigs/lifecycle-events-smoke.sh"
'''

[tasks."lifecycle:webhooks-honor"]
description = "Create a fresh jail on honor and prove E2B-shaped signed lifecycle webhooks"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo sh benchmarks/rigs/lifecycle-webhooks-smoke.sh"
'''

[tasks."gateway:restart-recovery-honor"]
description = "Create a fresh jail on honor, restart e2b-compat, and prove the live jail is recovered into /sandboxes"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sh benchmarks/rigs/gateway-restart-recovery-smoke.sh"
'''

[tasks."gateway:fork-restart-recovery-honor"]
description = "Create a parent jail, snapshot+fork it, restart e2b-compat, and prove fork lineage survives recovery"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sh benchmarks/rigs/snapshot-fork-restart-recovery-smoke.sh"
'''

[tasks."snapshot:retained-source-cleanup-honor"]
description = "Create a fresh jail on honor and prove deleting the last durable snapshot removes the retained source dataset"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo sh benchmarks/rigs/snapshot-retained-source-cleanup-smoke.sh"
'''

# ─────────────────────────── e2b-compat ─────────────────────────

[tasks."e2b:build"]
description = "Build the e2b-compat binary (cargo build --release)"
run = "cargo build --release --manifest-path e2b-compat/Cargo.toml"

[tasks."e2b:check"]
description = "cargo check the e2b-compat crate"
run = "cargo check --manifest-path e2b-compat/Cargo.toml"

[tasks."e2b:serve"]
description = "Run e2b-compat locally (requires root or sudoers for zfs/jail/jls/ps/kill/jexec)"
run = '''
cargo run --release --manifest-path e2b-compat/Cargo.toml -- \
  --listen 127.0.0.1:3000 \
  --zfs-pool zroot/jails \
  --template-snapshot zroot/jails/_template@base \
  --jails-root /jails
'''

[tasks."ui:dev"]
description = "Vite dev server for the React demo portal (:5173, proxies API to :3000)"
# The gateway serves the production bundle at /ui/ from --ui-dir; this
# task is the React inner loop. Proxy target defaults to localhost:3000
# (the gateway). Override with VITE_API_TARGET=http://honor:3000 etc.
run = 'pnpm --dir e2b-compat/ui-src dev'

[tasks."ui:build"]
description = "Production build of the React demo portal into e2b-compat/ui/"
# Output path is pinned in vite.config.ts (../ui); this task is a thin
# wrapper so e2b:sync-honor can depend on it cleanly.
run = 'pnpm --dir e2b-compat/ui-src build'

[tasks."ui:install"]
description = "pnpm install inside e2b-compat/ui-src/"
run = 'pnpm --dir e2b-compat/ui-src install'

[tasks."e2b:sync-honor"]
description = "Build React UI, then rsync e2b-compat sources to $HONOR_HOST:/tmp/e2b-compat-src/"
# Depend on ui:build so the rsync always carries the latest Vite output
# under ui/. A sync without the prior build would push a stale bundle
# from the previous run, or no bundle at all on a fresh checkout.
depends = ["ui:build"]
run = '''
ssh "$HONOR_HOST" 'mkdir -p /tmp/e2b-compat-src'
rsync -az --delete --exclude=target/ --exclude=Cargo.lock --exclude=ui-src/node_modules/ e2b-compat/ "$HONOR_HOST:/tmp/e2b-compat-src/"
'''

[tasks."e2b:build-honor"]
description = "Build the e2b-compat binary on $HONOR_HOST (FreeBSD-native)"
depends = ["e2b:sync-honor"]
run = 'ssh "$HONOR_HOST" "cd /tmp/e2b-compat-src && cargo build --release"'

[tasks."e2b:serve-honor"]
description = "Run e2b-compat on $HONOR_HOST as root, listen 127.0.0.1:3000"
depends = ["e2b:build-honor"]
run = 'ssh "$HONOR_HOST" "sudo /tmp/e2b-compat-src/target/release/e2b-compat --listen 127.0.0.1:3000 --zfs-pool zroot/jails --template-snapshot zroot/jails/_template@base --jails-root /jails"'

[tasks."e2b:smoke"]
description = "Hit the running e2b-compat with the example smoke-test script"
run = 'sh e2b-compat/examples/smoke-test.sh "${E2B_COMPAT_URL:-http://honor:3000}"'

[tasks."e2b:install-service-honor"]
description = "Install e2b-compat as an rc.d service on $HONOR_HOST and refresh the bhyve pool helper. Does not start the service."
depends = ["e2b:build-honor"]
# Staging via /tmp lets the install helper run under `sudo sh` without
# needing scp over an authenticated root channel; the rc.d script +
# installer are small and get replaced every invocation. The helper is
# idempotent, so rerunning after a fresh cargo build refreshes the
# binary + UI tree without touching /etc/rc.conf.
run = '''
set -eu
scp tools/rc.d/e2b-compat "$HONOR_HOST:/tmp/e2b-compat"
scp tools/install-e2b-compat-service.sh "$HONOR_HOST:/tmp/install-e2b-compat-service.sh"
scp tools/coppice-bhyve-pool-ctl.sh "$HONOR_HOST:/tmp/coppice-bhyve-pool-ctl.sh"
ssh "$HONOR_HOST" "sudo sh /tmp/install-e2b-compat-service.sh /tmp/e2b-compat /tmp/e2b-compat-src/target/release/e2b-compat /tmp/e2b-compat-src/ui"
ssh "$HONOR_HOST" "sudo install -m 0755 /tmp/coppice-bhyve-pool-ctl.sh /usr/local/sbin/coppice-bhyve-pool-ctl.sh"
echo
echo "installed on $HONOR_HOST. to cut over from an existing nohup'd gateway:"
echo "  ssh $HONOR_HOST 'sudo pkill -f /tmp/e2b-compat-src/target/release/e2b-compat || true'"
echo "  ssh $HONOR_HOST 'sudo service e2b-compat start'"
echo "  ssh $HONOR_HOST 'sudo service e2b-compat status'"
'''

[tasks."bhyve-pool:install-helper-honor"]
description = "Install the repo bhyve pool helper on $HONOR_HOST"
run = '''
set -eu
scp tools/coppice-bhyve-pool-ctl.sh "$HONOR_HOST:/tmp/coppice-bhyve-pool-ctl.sh"
ssh "$HONOR_HOST" "sudo install -m 0755 /tmp/coppice-bhyve-pool-ctl.sh /usr/local/sbin/coppice-bhyve-pool-ctl.sh"
'''

[tasks."net:setup-honor"]
description = "Restore honor's sandbox bridges + pf/local_unbound wiring for jail and bhyve sandboxes"
run = '''
set -eu
ssh "$HONOR_HOST" "sudo env EXT_IF=vm-public sh -s" < tools/coppice-net-setup.sh
'''

[tasks."net:install-service-honor"]
description = "Install the boot-time rc.d service that restores honor's jail+bhyve sandbox networking before e2b-compat starts"
run = '''
set -eu
scp tools/coppice-net-setup.sh "$HONOR_HOST:/tmp/coppice-net-setup.sh"
scp tools/rc.d/coppice-net-setup "$HONOR_HOST:/tmp/coppice-net-setup"
ssh "$HONOR_HOST" "sudo install -d -m 0755 /usr/local/sbin /usr/local/etc/rc.d"
ssh "$HONOR_HOST" "sudo install -m 0755 /tmp/coppice-net-setup.sh /usr/local/sbin/coppice-net-setup.sh"
ssh "$HONOR_HOST" "sudo install -m 0755 /tmp/coppice-net-setup /usr/local/etc/rc.d/coppice-net-setup"
ssh "$HONOR_HOST" "sudo sh -c 'grep -qE \"^coppice_net_setup_enable=\" /etc/rc.conf || echo \"coppice_net_setup_enable=\\\"YES\\\"\" >> /etc/rc.conf'"
ssh "$HONOR_HOST" "sudo service coppice-net-setup start && sudo service coppice-net-setup status"
'''

[tasks."dump:install-service-honor"]
description = "Install the boot-time rc.d service that re-arms dumpon on $HONOR_HOST"
run = '''
set -eu
scp tools/rc.d/coppice-dumpon "$HONOR_HOST:/tmp/coppice-dumpon"
ssh "$HONOR_HOST" "sudo install -d -m 0755 /usr/local/etc/rc.d"
ssh "$HONOR_HOST" "sudo install -m 0755 /tmp/coppice-dumpon /usr/local/etc/rc.d/coppice-dumpon"
ssh "$HONOR_HOST" "sudo sysrc coppice_dumpon_enable=YES coppice_dumpon_device=/dev/nda0p3.eli"
ssh "$HONOR_HOST" "sudo service coppice-dumpon start && sudo service coppice-dumpon status"
'''

# ──────────────────────────── template builds ────────────────────────────
#
# One task per pre-baked template. Use the repo checkout on honor,
# not ad-hoc copied scripts, so `git pull` there is the source of
# truth.

[tasks."desktop-template:build-honor"]
description = "Build /jails/desktop-template on $HONOR_HOST (tigervnc-server + xrdp + openbox + firefox)"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo sh benchmarks/rigs/desktop-template-build.sh build"
echo
echo "built desktop-template on $HONOR_HOST. to pick it up without a gateway restart:"
echo "  curl -X POST http://$HONOR_HOST:3000/templates/reload"
'''

[tasks."desktop-template:refresh-honor"]
description = "Refresh /jails/desktop-template@base on $HONOR_HOST in place for new clones"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo sh benchmarks/rigs/desktop-template-build.sh refresh"
echo
echo "refreshed desktop-template on $HONOR_HOST. to pick it up without a gateway restart:"
echo "  curl -X POST http://$HONOR_HOST:3000/templates/reload"
'''

[tasks."nginx-byoi:smoke-honor"]
description = "Build and prove the cubesandbox-base-nginx jail template + <port>-<id>.coppice.lan routing on $HONOR_HOST"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo sh benchmarks/rigs/nginx-byoi-smoke.sh"
'''

[tasks."preview-url:smoke"]
description = "Run the coppiceproxy signed preview URL verifier/mint smoke"
run = 'sh benchmarks/rigs/signed-preview-url-smoke.sh'

[tasks."preview-url:gateway-honor"]
description = "Run the gateway signed preview URL mint smoke on $HONOR_HOST"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && GATEWAY=http://127.0.0.1:3000 sh benchmarks/rigs/gateway-preview-url-smoke.sh"
'''

[tasks."ide:ssh-access-honor"]
description = "Run the bhyve Remote-SSH access smoke on $HONOR_HOST"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && GATEWAY=http://127.0.0.1:3000 sh benchmarks/rigs/ide-ssh-access-smoke.sh"
'''

[tasks."scheduled-sandboxes:honor"]
description = "Run the gateway scheduled-sandbox smoke on $HONOR_HOST"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && GATEWAY=http://127.0.0.1:3000 sh benchmarks/rigs/scheduled-sandboxes-smoke.sh"
'''

[tasks."live-volume-mount:honor"]
description = "Run the live ZFS volume hot-mount smoke on $HONOR_HOST"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && GATEWAY=http://127.0.0.1:3000 sh benchmarks/rigs/live-volume-mount-smoke.sh"
'''

[tasks."directory-snapshot:honor"]
description = "Run the subtree directory snapshot/restore smoke on $HONOR_HOST"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && GATEWAY=http://127.0.0.1:3000 sh benchmarks/rigs/directory-snapshot-smoke.sh"
'''

[tasks."secrets-store:honor"]
description = "Run the gateway secrets store smoke on $HONOR_HOST"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && GATEWAY=http://127.0.0.1:3000 sh benchmarks/rigs/secrets-store-smoke.sh"
'''

[tasks."credential-proxy:honor"]
description = "Run the host-side request credential proxy smoke on $HONOR_HOST"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && GATEWAY=http://127.0.0.1:3000 sh benchmarks/rigs/credential-proxy-smoke.sh"
'''

[tasks."git-checkout-api:honor"]
description = "Run the first-class git checkout API smoke on $HONOR_HOST"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
: "${GIT_CHECKOUT_TEMPLATE:=debian-12-bhyve}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && GATEWAY=http://127.0.0.1:3000 TEMPLATE='$GIT_CHECKOUT_TEMPLATE' sh benchmarks/rigs/git-checkout-api-smoke.sh"
'''

[tasks."windows11-template:prepare-honor"]
description = "Stage a Windows 11 bhyve template image + sidecar config on $HONOR_HOST"
run = '''
set -eu
: "${WINDOWS_ISO:?set WINDOWS_ISO to the ISO path on $HONOR_HOST}"
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo WINDOWS_ISO='$WINDOWS_ISO' sh benchmarks/rigs/windows11-eval-bhyve-template.sh prepare"
'''

[tasks."windows11-template:boot-install-honor"]
description = "Boot the staged Windows 11 install VM on $HONOR_HOST over bhyve VNC"
run = '''
set -eu
: "${WINDOWS_ISO:?set WINDOWS_ISO to the ISO path on $HONOR_HOST}"
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh -t "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo WINDOWS_ISO='$WINDOWS_ISO' sh benchmarks/rigs/windows11-eval-bhyve-template.sh boot-install"
'''

[tasks."windows-server-template:prepare-honor"]
description = "Download and import the official Windows Server 2025 eval VHDX on $HONOR_HOST"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo sh benchmarks/rigs/windows-server-eval-bhyve-template.sh prepare"
ssh "$HONOR_HOST" "curl -fsS -X POST http://127.0.0.1:3000/templates/reload >/dev/null || true"
'''

[tasks."windows-server-template:boot-console-honor"]
description = "Boot the imported Windows Server template on $HONOR_HOST over bhyve VNC"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh -t "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo sh benchmarks/rigs/windows-server-eval-bhyve-template.sh boot-console"
'''

[tasks."linux-template:prepare-honor"]
description = "Build the Debian Linux bhyve template on $HONOR_HOST from the repo checkout there"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo sh benchmarks/rigs/debian12-cloud-bhyve-template.sh prepare"
ssh "$HONOR_HOST" "curl -fsS -X POST http://127.0.0.1:3000/templates/reload >/dev/null || true"
'''

[tasks."linux-template:smoke-honor"]
description = "Warm, checkout, probe, and drain the Debian Linux bhyve template on $HONOR_HOST"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo sh benchmarks/rigs/linux-bhyve-smoke.sh"
'''

[tasks."gpu-template:smoke-honor"]
description = "Run the GPU passthrough smoke on $HONOR_HOST against the Debian GPU bhyve template"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
: "${GPU_BASE_TEMPLATE:=debian-12-bhyve-gpu}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && sudo BASE_TEMPLATE='$GPU_BASE_TEMPLATE' sh benchmarks/rigs/gpu-passthrough-smoke.sh"
'''

[tasks."host-network:smoke"]
description = "Assert the live gateway exposes FreeBSD interface, route, bridge, and pf state for the Proxmox-style node Network tab"
run = "sh benchmarks/rigs/host-network-smoke.sh"

[tasks."host-network:smoke-honor"]
description = "Run the host network diagnostics smoke on $HONOR_HOST against the local gateway"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && COPPICE_ADMIN_TOKEN=\"\$(sudo cat /var/lib/coppice/bench-token)\" GATEWAY=http://127.0.0.1:3000 sh benchmarks/rigs/host-network-smoke.sh"
'''

[tasks."host-storage:smoke"]
description = "Assert the live gateway exposes FreeBSD ZFS, filesystem, and partition state for the Proxmox-style node Disks tab"
run = "sh benchmarks/rigs/host-storage-smoke.sh"

[tasks."host-storage:smoke-honor"]
description = "Run the host storage diagnostics smoke on $HONOR_HOST against the local gateway"
run = '''
set -eu
: "${HONOR_REPO_DIR:=/home/jadams/src/gitlab.daringbit.com/josh/coppice}"
ssh "$HONOR_HOST" "cd '$HONOR_REPO_DIR' && git pull --ff-only && COPPICE_ADMIN_TOKEN=\"\$(sudo cat /var/lib/coppice/bench-token)\" GATEWAY=http://127.0.0.1:3000 sh benchmarks/rigs/host-storage-smoke.sh"
'''

# ──────────────────────────── demos ────────────────────────────

[tasks."demo:notebook"]
description = "Execute examples/notebook-demo.ipynb against the Coppice gateway"
run = '''
set -eu
HONOR="${HONOR_HOST:-honor}"

# Gateway binds to 127.0.0.1 on honor. Tunnel API + envd through SSH
# so the local SDK reaches them as localhost:3000 / localhost:49999
# — which is what E2B_DEBUG=true hardcodes anyway.
echo "opening SSH tunnel: $HONOR → localhost 3000 + 49999"
ssh -fN -L 3000:127.0.0.1:3000 -L 49999:127.0.0.1:49999 "$HONOR"
ssh_pid=$(pgrep -f "ssh -fN -L 3000:127.0.0.1:3000.*$HONOR" | head -1 || true)
cleanup() { [ -n "${ssh_pid:-}" ] && kill "$ssh_pid" 2>/dev/null || true; }
trap cleanup EXIT

export E2B_API_URL="http://localhost:3000"
export E2B_DEBUG="true"
export E2B_API_KEY="${E2B_API_KEY:-local}"
echo "gateway: $E2B_API_URL (tunneled from $HONOR)"

# uv run with inline deps — no pip, no venv to manage.
uv run --with jupyter --with nbclient --with e2b-code-interpreter \
  jupyter nbconvert \
    --to notebook \
    --execute examples/notebook-demo.ipynb \
    --output notebook-demo.executed.ipynb \
    --ExecutePreprocessor.timeout=120
echo
echo "executed: examples/notebook-demo.executed.ipynb"
echo "render to HTML:  mise run demo:notebook:html"
echo "open live:       mise run demo:notebook:view"
'''

[tasks."demo:notebook:view"]
description = "Open the notebook live in nbclassic, tunnel + deps wired up"
run = '''
set -eu
HONOR="${HONOR_HOST:-honor}"

# Same tunnel as demo:notebook — keeps localhost:3000/:49999 pointed at
# the gateway on honor while nbclassic is running.
ssh -fN -L 3000:127.0.0.1:3000 -L 49999:127.0.0.1:49999 "$HONOR"
ssh_pid=$(pgrep -f "ssh -fN -L 3000:127.0.0.1:3000.*$HONOR" | head -1 || true)
cleanup() { [ -n "${ssh_pid:-}" ] && kill "$ssh_pid" 2>/dev/null || true; }
trap cleanup EXIT

export E2B_API_URL="http://localhost:3000"
export E2B_DEBUG="true"
export E2B_API_KEY="${E2B_API_KEY:-local}"
echo "gateway: $E2B_API_URL (tunneled from $HONOR)"

# Run both the server and the in-notebook kernel under the same uv
# environment so `from e2b_code_interpreter import Sandbox` resolves.
# The source ipynb is the live version — open that, not the
# .executed one (nbconvert-rendered outputs confuse readers trying
# to re-run cells).
exec uv run \
  --with nbclassic \
  --with e2b-code-interpreter \
  --with matplotlib --with pandas --with numpy \
  jupyter nbclassic examples/notebook-demo.ipynb
'''

[tasks."demo:notebook:html"]
description = "Render the executed notebook to a static HTML page"
depends = ["demo:notebook"]
run = '''
uv run --with jupyter --with nbconvert \
  jupyter nbconvert \
    --to html \
    examples/notebook-demo.executed.ipynb \
    --output notebook-demo.html
echo "rendered: examples/notebook-demo.html"
'''

# ──────────────────────────── metrics ──────────────────────────

[tasks."metrics:scrape"]
description = "One-shot scrape of honor's e2b-compat /metrics to data/metrics/YYYY-MM-DD.jsonl"
run = '''
: "${METRICS_URL:=http://localhost:3000/metrics}"
HONOR="${HONOR_HOST:-honor}"
# If hitting localhost, we need a tunnel. Otherwise talk direct.
case "$METRICS_URL" in
	*localhost*|*127.0.0.1*)
		ssh -fN -L 3000:127.0.0.1:3000 "$HONOR" 2>/dev/null || true
		ssh_pid=$(pgrep -f "ssh -fN -L 3000:127.0.0.1:3000.*$HONOR" | head -1 || true)
		trap '[ -n "${ssh_pid:-}" ] && kill "$ssh_pid" 2>/dev/null || true' EXIT
		;;
esac
exec tools/metrics-scraper.sh "$METRICS_URL"
'''

[tasks."metrics:watch"]
description = "Scrape every 30s until Ctrl-C — good enough for a demo day"
run = '''
: "${METRICS_URL:=http://localhost:3000/metrics}"
HONOR="${HONOR_HOST:-honor}"
case "$METRICS_URL" in
	*localhost*|*127.0.0.1*)
		ssh -fN -L 3000:127.0.0.1:3000 "$HONOR"
		ssh_pid=$(pgrep -f "ssh -fN -L 3000:127.0.0.1:3000.*$HONOR" | head -1 || true)
		trap '[ -n "${ssh_pid:-}" ] && kill "$ssh_pid" 2>/dev/null || true' EXIT
		;;
esac
echo "scraping $METRICS_URL every 30s → data/metrics/"
while :; do
	tools/metrics-scraper.sh "$METRICS_URL" || true
	sleep 30
done
'''

# ──────────────────────────── coppice CLI ───────────────────────

[tasks."coppice:install"]
description = "Build and install the `coppice` CLI (cubemastercli-equivalent) to ~/.cargo/bin"
# `--locked` pins to Cargo.lock so contributors get the same reqwest
# chain we test against; `--path` keeps the install local rather than
# from the registry. The binary lands at ~/.cargo/bin/coppice which is
# on $PATH in any standard rustup setup.
run = "cargo install --path e2b-compat --bin coppice --locked"


# ──────────────────────────── coppice MCP bridge ────────────────

[tasks."coppice:install-mcp-honor"]
description = "Build coppice-mcp on $HONOR_HOST and install it at /usr/local/bin/coppice-mcp"
# MCP clients (Claude Code, Desktop, Inspector) launch the bridge as
# `ssh honor /usr/local/bin/coppice-mcp` — having it on an absolute
# path the SSH default $PATH can find is the only install requirement.
# No rc.d service: MCP is stdio-only in v1, so the host spawns one
# process per session.
depends = ["e2b:build-honor"]
run = '''
set -eu
ssh "$HONOR_HOST" 'sudo install -m 0755 /tmp/e2b-compat-src/target/release/coppice-mcp /usr/local/bin/coppice-mcp'
ssh "$HONOR_HOST" '/usr/local/bin/coppice-mcp --version || true'
echo
echo "installed on $HONOR_HOST:/usr/local/bin/coppice-mcp"
echo "wire up Claude Code:"
echo "  claude mcp add coppice -- ssh $HONOR_HOST /usr/local/bin/coppice-mcp"
'''


# ──────────────────────────── UI deploy ─────────────────────────

[tasks."ui:deploy-honor"]
description = "Build + ship the /ui/ portal bundle to honor"
depends = ["ui:build"]
run = '''
set -eu
rsync -az --delete e2b-compat/ui/ "$HONOR_HOST:/tmp/ui/"
ssh "$HONOR_HOST" 'sudo rsync -a --delete /tmp/ui/ /usr/local/share/e2b-compat/ui/ && sudo rm -rf /usr/local/share/e2b-compat/ui-react'
echo "deployed ui → $HONOR_HOST:/usr/local/share/e2b-compat/ui/"
'''

[tasks."ui:windows-smoke"]
description = "Run the browser-driven Windows /ui smoke against a live gateway (defaults to localhost:3001)"
run = '''
set -eu
sh benchmarks/rigs/ui-windows-smoke.sh
'''

[tasks."ui:windows-smoke-honor"]
description = "Tunnel honor's gateway to localhost:3001 and run the Windows /ui smoke"
run = '''
set -eu
HONOR="${HONOR_HOST:-honor}"

ssh_pid=""
if ! curl -fsS http://127.0.0.1:3001/health >/dev/null 2>&1; then
  ssh -fN -L 3001:127.0.0.1:3000 "$HONOR"
  ssh_pid=$(pgrep -f "ssh -fN -L 3001:127.0.0.1:3000.*$HONOR" | head -1 || true)
fi
cleanup() { [ -n "${ssh_pid:-}" ] && kill "$ssh_pid" 2>/dev/null || true; }
trap cleanup EXIT

UI_BASE=http://127.0.0.1:3001/ui/ sh benchmarks/rigs/ui-windows-smoke.sh
'''

Driver helpers

common.sh is sourced by every per-config rig. It provides timestamp_ms (via Python for portability — FreeBSD date lacks %N), a run_concurrent wrapper, and a host_info_json probe.

benchmarks/rigs/common.sh 41 lines bash
#!/bin/sh
# Shared helpers for all rigs. Sourced, not run directly.
set -eu

timestamp_ms() {
  # FreeBSD date supports %N only through gdate; use python for portability.
  python3 -c 'import time; print(int(time.time()*1000))'
}

host_info_json() {
  python3 - <<'PY'
import json, platform, subprocess
def sysctl(k):
    return subprocess.run(['sysctl','-n',k], capture_output=True, text=True).stdout.strip()
print(json.dumps({
    'hostname': platform.node(),
    'kernel': platform.system(),
    'release': platform.release(),
    'cpuModel': sysctl('hw.model'),
    'cpuCount': int(sysctl('hw.ncpu') or 0),
    'memGB': round(int(sysctl('hw.physmem') or 0) / (1024**3), 2),
}))
PY
}

run_concurrent() {
  # Usage: run_concurrent N CMD...
  # Runs CMD N times in parallel, prints per-iteration elapsed_ms TSV.
  # The loop index (0..N-1) is appended as an extra argument so each
  # worker can form a unique name even though $$ is shared by subshells.
  # _rc_n/_rc_j are intentionally prefixed to avoid clobbering the
  # caller's loop variable (sh functions share scope with their callers).
  _rc_n=$1; shift
  _rc_j=0
  while [ $_rc_j -lt $_rc_n ]; do
    ( s=$(timestamp_ms); "$@" "$_rc_j" >/dev/null 2>&1; e=$(timestamp_ms); printf "%d\t%d\n" "$_rc_j" "$((e - s))" ) &
    _rc_j=$((_rc_j + 1))
  done
  wait
}

summarize.py reads the TSV a rig emits and writes a BenchmarkRun JSON file validated by the Zod schema in src/data/benchmarks.ts.

benchmarks/rigs/summarize.py 65 lines python
#!/usr/bin/env python3
"""Wrap raw TSV samples into the BenchmarkRun JSON schema."""
import argparse, json, statistics, datetime, sys, subprocess

def host_info():
    r = subprocess.run(['python3', '-c', '''
import json, platform, subprocess
def s(k): return subprocess.run(["sysctl","-n",k], capture_output=True, text=True).stdout.strip()
print(json.dumps({
    "hostname": platform.node(),
    "kernel": platform.system(),
    "release": platform.release(),
    "cpuModel": s("hw.model") or "unknown",
    "cpuCount": int(s("hw.ncpu") or 0),
    "memGB": round(int(s("hw.physmem") or 0) / (1024**3), 2),
}))
'''], capture_output=True, text=True)
    return json.loads(r.stdout)

def summarize(samples):
    ss = sorted(samples)
    def pct(p): return ss[min(len(ss) - 1, int(len(ss) * p / 100))]
    return {
        'mean': statistics.mean(ss),
        'p50': pct(50), 'p95': pct(95), 'p99': pct(99),
        'min': min(ss), 'max': max(ss), 'n': len(ss),
    }

def main():
    ap = argparse.ArgumentParser()
    ap.add_argument('--config', required=True)
    ap.add_argument('--metric', required=True)
    ap.add_argument('--concurrency', type=int, required=True)
    ap.add_argument('--script', required=True)
    ap.add_argument('--input', required=True)
    ap.add_argument('--output', required=True)
    ap.add_argument('--host-info-json', help='Pre-collected host_info_json; if omitted, shells out to collect')
    args = ap.parse_args()

    samples = []
    with open(args.input) as f:
        for line in f:
            line = line.strip()
            if not line: continue
            parts = line.split('\t')
            samples.append(float(parts[-1]))

    host = json.loads(args.host_info_json) if args.host_info_json else host_info()

    out = {
        'config': args.config,
        'host': host,
        'metric': args.metric,
        'concurrency': args.concurrency,
        'samples': samples,
        'summary': summarize(samples),
        'scriptPath': args.script,
        'runAt': datetime.datetime.utcnow().isoformat() + 'Z',
    }
    with open(args.output, 'w') as f:
        json.dump(out, f, indent=2)

if __name__ == '__main__':
    main()

Jail configurations

Jail — raw jail-raw

Plain jail, shared rootfs (cp -R from template), no VNET.

▸ reproduce ·  mise run bench:jail-raw

benchmarks/rigs/jail-raw.sh 46 lines cold-start rig
#!/bin/sh
# jail-raw.sh — plain jail with cp -R rootfs, no VNET, no pf.
# Usage: jail-raw.sh <concurrency> <total-iterations>
#
# Runs EXACTLY <total-iterations> jail create/destroy cycles, dispatched
# across a <concurrency>-sized worker pool. Emits one TSV line per
# iteration: <global-index>\t<elapsed-ms>.

set -eu
. "$(dirname "$0")/common.sh"

CONC=${1:-1}
ITERS=${2:-200}
TEMPLATE=${TEMPLATE:-/jails/_template}
[ -d "$TEMPLATE" ] || { echo "missing $TEMPLATE" >&2; exit 2; }

create_one() {
  id="bench-$$-$1"
  path="/jails/$id"
  cp -R "$TEMPLATE" "$path"
  jail -c name="$id" path="$path" host.hostname="$id" ip4=inherit persist \
       exec.start="/bin/echo ready" \
       >/dev/null
  jail -r "$id" >/dev/null 2>&1 || true
  rm -rf "$path"
}

i=0
while [ $i -lt "$ITERS" ]; do
  if [ "$CONC" -gt 1 ]; then
    batch_end=$(( i + CONC ))
    [ "$batch_end" -gt "$ITERS" ] && batch_end=$ITERS
    j=$i
    while [ "$j" -lt "$batch_end" ]; do
      ( s=$(timestamp_ms); create_one "$j" >/dev/null 2>&1; e=$(timestamp_ms); printf "%d\t%d\n" "$j" "$((e - s))" ) &
      j=$(( j + 1 ))
    done
    wait
    i=$batch_end
  else
    s=$(timestamp_ms); create_one "$i"; e=$(timestamp_ms)
    printf "%d\t%d\n" "$i" "$((e - s))"
    i=$(( i + 1 ))
  fi
done
benchmarks/rigs/jail-raw-rss.sh 38 lines idle-RSS rig
#!/bin/sh
# Start 32 jails, wait 1s, sum RSS of their init processes.
set -eu
. "$(dirname "$0")/common.sh"

TEMPLATE=${TEMPLATE:-/jails/_template}
N=32

i=0
while [ $i -lt $N ]; do
  id="bench-rss-$$-$i"; path="/jails/$id"
  cp -R "$TEMPLATE" "$path"
  jail -c name="$id" path="$path" host.hostname="$id" ip4=inherit persist \
       exec.start="/usr/sbin/daemon -f /bin/sleep 60" >/dev/null
  i=$((i + 1))
done

sleep 1
i=0
while [ $i -lt $N ]; do
  id="bench-rss-$$-$i"
  jid=$(jls -j "$id" jid 2>/dev/null || echo "")
  if [ -n "$jid" ]; then
    rss=$(ps -J "$jid" -o rss= | awk '{s+=$1} END {print s}')
    printf "%d\t%s\n" "$i" "$rss"
  fi
  i=$((i + 1))
done

# teardown
i=0
while [ $i -lt $N ]; do
  id="bench-rss-$$-$i"; path="/jails/$id"
  jail -r "$id" >/dev/null 2>&1 || true
  rm -rf "$path" 2>/dev/null || true
  i=$((i + 1))
done

Jail — VNET + pf jail-vnet-pf

Jail with VNET (per-jail epair network stack) + pf egress filter.

▸ reproduce ·  mise run bench:jail-vnet-pf

benchmarks/rigs/jail-vnet-pf.sh 50 lines cold-start rig
#!/bin/sh
# jail-vnet-pf.sh — jail with VNET (epair) + active pf.bench.conf.
# Usage: jail-vnet-pf.sh <concurrency> <total-iterations>
#
# Requires: if_epair loaded, pf active with safe ruleset (mise run bench:setup-pf).

set -eu
. "$(dirname "$0")/common.sh"

CONC=${1:-1}
ITERS=${2:-200}
TEMPLATE=${TEMPLATE:-/jails/_template}
[ -d "$TEMPLATE" ] || { echo "missing $TEMPLATE" >&2; exit 2; }

create_one() {
  id="benchvp-$$-$1"
  path="/jails/$id"
  cp -R "$TEMPLATE" "$path"
  epair_a=$(ifconfig epair create)
  epair_b=$(echo "$epair_a" | sed 's/a$/b/')
  oct=$(( $1 % 250 + 2 ))
  jail -c name="$id" path="$path" host.hostname="$id" vnet persist \
       vnet.interface="$epair_b" \
       exec.prestart="ifconfig $epair_b inet 10.88.$oct.2/24 up" \
       exec.start="/bin/echo ready" \
       >/dev/null
  jail -r "$id" >/dev/null 2>&1 || true
  ifconfig "$epair_a" destroy 2>/dev/null || true
  rm -rf "$path"
}

i=0
while [ $i -lt "$ITERS" ]; do
  if [ "$CONC" -gt 1 ]; then
    batch_end=$(( i + CONC ))
    [ "$batch_end" -gt "$ITERS" ] && batch_end=$ITERS
    j=$i
    while [ "$j" -lt "$batch_end" ]; do
      ( s=$(timestamp_ms); create_one "$j" >/dev/null 2>&1; e=$(timestamp_ms); printf "%d\t%d\n" "$j" "$((e - s))" ) &
      j=$(( j + 1 ))
    done
    wait
    i=$batch_end
  else
    s=$(timestamp_ms); create_one "$i"; e=$(timestamp_ms)
    printf "%d\t%d\n" "$i" "$((e - s))"
    i=$(( i + 1 ))
  fi
done
benchmarks/rigs/jail-vnet-pf-rss.sh 45 lines idle-RSS rig
#!/bin/sh
# Start 32 VNET jails, wait 1s, sum RSS of their init processes.
set -eu
. "$(dirname "$0")/common.sh"

TEMPLATE=${TEMPLATE:-/jails/_template}
N=32

i=0
while [ $i -lt $N ]; do
  id="benchvp-rss-$$-$i"; path="/jails/$id"
  cp -R "$TEMPLATE" "$path"
  ifconfig epair create >/tmp/ep.$$ 2>/dev/null
  epair=$(cat /tmp/ep.$$ | tr -d '\n'); epb=$(echo "$epair" | sed 's/a$/b/')
  rm -f /tmp/ep.$$
  jail -c name="$id" path="$path" host.hostname="$id" vnet persist \
       vnet.interface="$epb" \
       exec.prestart="ifconfig $epb inet 10.88.$(( $i % 250 )).2/24 up" \
       exec.start="/usr/sbin/daemon -f /bin/sleep 60" >/dev/null
  i=$((i + 1))
done

sleep 1
i=0
while [ $i -lt $N ]; do
  id="benchvp-rss-$$-$i"
  jid=$(jls -j "$id" jid 2>/dev/null || echo "")
  if [ -n "$jid" ]; then
    rss=$(ps -J "$jid" -o rss= | awk '{s+=$1} END {print s}')
    printf "%d\t%s\n" "$i" "$rss"
  fi
  i=$((i + 1))
done

# teardown
i=0
while [ $i -lt $N ]; do
  id="benchvp-rss-$$-$i"; path="/jails/$id"
  jail -r "$id" >/dev/null 2>&1 || true
  # destroy epair interfaces (they were moved to jails)
  ifconfig "epair${i}a" destroy 2>/dev/null || true
  rm -rf "$path" 2>/dev/null || true
  i=$((i + 1))
done

Jail — ZFS clone jail-zfs-clone

Jail with per-instance rootfs via ZFS clone of the template snapshot.

▸ reproduce ·  mise run bench:jail-zfs-clone

benchmarks/rigs/jail-zfs-clone.sh 47 lines cold-start rig
#!/bin/sh
# jail-zfs-clone.sh — per-iteration rootfs via ZFS clone of the template
# snapshot. No VNET, no pf.
# Usage: jail-zfs-clone.sh <concurrency> <total-iterations>
#
# Requires: zroot/jails/_template@base snapshot.

set -eu
. "$(dirname "$0")/common.sh"

CONC=${1:-1}
ITERS=${2:-200}
POOL=${POOL:-zroot/jails}
SNAP=${SNAP:-${POOL}/_template@base}

zfs list "$SNAP" >/dev/null 2>&1 || { echo "missing snapshot $SNAP" >&2; exit 2; }

create_one() {
  id="benchzfs-$$-$1"
  path="/jails/$id"
  zfs clone "$SNAP" "$POOL/$id"
  jail -c name="$id" path="$path" host.hostname="$id" ip4=inherit persist \
       exec.start="/bin/echo ready" \
       >/dev/null
  jail -r "$id" >/dev/null 2>&1 || true
  zfs destroy "$POOL/$id" 2>/dev/null || true
}

i=0
while [ $i -lt "$ITERS" ]; do
  if [ "$CONC" -gt 1 ]; then
    batch_end=$(( i + CONC ))
    [ "$batch_end" -gt "$ITERS" ] && batch_end=$ITERS
    j=$i
    while [ "$j" -lt "$batch_end" ]; do
      ( s=$(timestamp_ms); create_one "$j" >/dev/null 2>&1; e=$(timestamp_ms); printf "%d\t%d\n" "$j" "$((e - s))" ) &
      j=$(( j + 1 ))
    done
    wait
    i=$batch_end
  else
    s=$(timestamp_ms); create_one "$i"; e=$(timestamp_ms)
    printf "%d\t%d\n" "$i" "$((e - s))"
    i=$(( i + 1 ))
  fi
done
benchmarks/rigs/jail-zfs-clone-rss.sh 38 lines idle-RSS rig
#!/bin/sh
# Start 32 ZFS-clone jails, wait 1s, sum RSS of their init processes.
set -eu
. "$(dirname "$0")/common.sh"

POOL=zroot/jails; SNAP=${POOL}/_template@base
N=32

i=0
while [ $i -lt $N ]; do
  id="benchzfs-rss-$$-$i"; path="/jails/$id"
  zfs clone "$SNAP" "$POOL/$id"
  jail -c name="$id" path="$path" host.hostname="$id" ip4=inherit persist \
       exec.start="/usr/sbin/daemon -f /bin/sleep 60" >/dev/null
  i=$((i + 1))
done

sleep 1
i=0
while [ $i -lt $N ]; do
  id="benchzfs-rss-$$-$i"
  jid=$(jls -j "$id" jid 2>/dev/null || echo "")
  if [ -n "$jid" ]; then
    rss=$(ps -J "$jid" -o rss= | awk '{s+=$1} END {print s}')
    printf "%d\t%s\n" "$i" "$rss"
  fi
  i=$((i + 1))
done

# teardown
i=0
while [ $i -lt $N ]; do
  id="benchzfs-rss-$$-$i"
  jail -r "$id" >/dev/null 2>&1 || true
  zfs destroy "$POOL/$id" 2>/dev/null || true
  i=$((i + 1))
done

Jail — VNET + pf + ZFS clone jail-vnet-zfs-clone

The fair VNET + pf comparison: ZFS-clone rootfs + per-jail VNET stack + active pf egress filter.

▸ reproduce ·  mise run bench:jail-vnet-zfs-clone

benchmarks/rigs/jail-vnet-zfs-clone.sh 56 lines cold-start rig
#!/bin/sh
# jail-vnet-zfs-clone.sh — jail with VNET (epair) + active pf.bench.conf,
# but per-iteration rootfs via ZFS clone of the template snapshot. This
# is the fair comparison for "network-isolated jail with dynamic egress
# policy", since jail-vnet-pf uses cp -R and is dominated by rootfs cost.
# Usage: jail-vnet-zfs-clone.sh <concurrency> <total-iterations>
#
# Requires: if_epair loaded; zroot/jails/_template@base snapshot;
# pf active with safe ruleset (`mise run bench:setup-pf`).

set -eu
. "$(dirname "$0")/common.sh"

CONC=${1:-1}
ITERS=${2:-200}
POOL=${POOL:-zroot/jails}
SNAP=${SNAP:-${POOL}/_template@base}

zfs list "$SNAP" >/dev/null 2>&1 || { echo "missing snapshot $SNAP" >&2; exit 2; }

create_one() {
  id="benchvpz-$$-$1"
  path="/jails/$id"
  zfs clone "$SNAP" "$POOL/$id"
  epair_a=$(ifconfig epair create)
  epair_b=$(echo "$epair_a" | sed 's/a$/b/')
  oct=$(( $1 % 250 + 2 ))
  jail -c name="$id" path="$path" host.hostname="$id" vnet persist \
       vnet.interface="$epair_b" \
       exec.prestart="ifconfig $epair_b inet 10.88.$oct.2/24 up" \
       exec.start="/bin/echo ready" \
       >/dev/null
  jail -r "$id" >/dev/null 2>&1 || true
  ifconfig "$epair_a" destroy 2>/dev/null || true
  zfs destroy "$POOL/$id" 2>/dev/null || true
}

i=0
while [ $i -lt "$ITERS" ]; do
  if [ "$CONC" -gt 1 ]; then
    batch_end=$(( i + CONC ))
    [ "$batch_end" -gt "$ITERS" ] && batch_end=$ITERS
    j=$i
    while [ "$j" -lt "$batch_end" ]; do
      ( s=$(timestamp_ms); create_one "$j" >/dev/null 2>&1; e=$(timestamp_ms); printf "%d\t%d\n" "$j" "$((e - s))" ) &
      j=$(( j + 1 ))
    done
    wait
    i=$batch_end
  else
    s=$(timestamp_ms); create_one "$i"; e=$(timestamp_ms)
    printf "%d\t%d\n" "$i" "$((e - s))"
    i=$(( i + 1 ))
  fi
done
benchmarks/rigs/jail-vnet-zfs-clone-rss.sh 46 lines idle-RSS rig
#!/bin/sh
# Start 32 VNET+ZFS-clone jails, wait 1s, sum RSS of their init processes.
set -eu
. "$(dirname "$0")/common.sh"

POOL=${POOL:-zroot/jails}
SNAP=${SNAP:-${POOL}/_template@base}
N=32

i=0
while [ $i -lt $N ]; do
  id="benchvpz-rss-$$-$i"; path="/jails/$id"
  zfs clone "$SNAP" "$POOL/$id"
  epair_a=$(ifconfig epair create)
  epair_b=$(echo "$epair_a" | sed 's/a$/b/')
  jail -c name="$id" path="$path" host.hostname="$id" vnet persist \
       vnet.interface="$epair_b" \
       exec.prestart="ifconfig $epair_b inet 10.88.$(( i + 2 )).2/24 up" \
       exec.start="/usr/sbin/daemon -f /bin/sleep 60" >/dev/null
  i=$((i + 1))
done
wait

sleep 1
i=0
while [ $i -lt $N ]; do
  id="benchvpz-rss-$$-$i"
  jid=$(jls -j "$id" jid 2>/dev/null || echo "")
  if [ -n "$jid" ]; then
    rss=$(ps -J "$jid" -o rss= | awk '{s+=$1} END {print s}')
    printf "%d\t%s\n" "$i" "$rss"
  fi
  i=$((i + 1))
done

# teardown
for d in /jails/benchvpz-rss-$$-*; do
  [ -d "$d" ] || continue
  name=$(basename "$d")
  jail -r "$name" 2>/dev/null || true
  zfs list "$POOL/$name" >/dev/null 2>&1 && zfs destroy -f "$POOL/$name" 2>/dev/null || rm -rf "$d" 2>/dev/null
done
for ep in $(ifconfig -l | tr ' ' '\n' | grep '^epair'); do
  ifconfig "$ep" destroy 2>/dev/null || true
done

bhyve configurations

All bhyve configs share a FreeBSD 15 VM image fetched once:

ssh honor 'mkdir -p /tmp/bhyve-images
cd /tmp/bhyve-images && fetch https://download.freebsd.org/releases/VM-IMAGES/15.0-RELEASE/amd64/Latest/FreeBSD-15.0-RELEASE-amd64-ufs.raw.xz && xz -d FreeBSD-15.0-RELEASE-amd64-ufs.raw.xz'

Durable bhyve configs (bhyve-durable-pool and bhyve-durable-prewarm-pool) additionally require a host kernel compiled with options BHYVE_SNAPSHOT, plus a bhyve + bhyvectl userspace built with WITH_BHYVE_SNAPSHOT=YES. The option is not in GENERIC on FreeBSD 15.0-RELEASE, so we built it from source. The full kernel reproduction:

# 1. source (if not already installed)
sudo fetch -o /tmp/src.txz https://download.freebsd.org/releases/amd64/15.0-RELEASE/src.txz
sudo tar -C /usr/src -xf /tmp/src.txz   # roughly 250 MB

# 2. author SNAPSHOT config (GENERIC + options BHYVE_SNAPSHOT)
sudo tee /usr/src/sys/amd64/conf/SNAPSHOT > /dev/null <<EOF
include GENERIC
ident   SNAPSHOT
options BHYVE_SNAPSHOT
EOF

# 3. kernel build — ~5 min on a Ryzen 9 5900HX with -j16
sudo make -C /usr/src -j16 buildkernel KERNCONF=SNAPSHOT

# 4. SAFETY NET: create a ZFS boot environment snapshot before swapping kernels
sudo bectl create pre-snapshot-kernel-$(date +%Y-%m-%d)

# 5. install kernel (current kernel moves to /boot/kernel.old/)
sudo make -C /usr/src DESTDIR=/ installkernel KERNCONF=SNAPSHOT

# 6. reboot
sudo shutdown -r now

# 7. post-reboot: rebuild bhyve + bhyvectl userspace with the option
sudo make -C /usr/src/usr.sbin/bhyvectl WITH_BHYVE_SNAPSHOT=YES MK_BHYVE_SNAPSHOT=yes install
sudo make -C /usr/src/usr.sbin/bhyve    WITH_BHYVE_SNAPSHOT=YES MK_BHYVE_SNAPSHOT=yes install

# 8. verify
bhyvectl 2>&1 | grep -E '\-\-suspend|\-\-checkpoint'   # should now appear
sysctl kern.ident                                            # SNAPSHOT

The bectl create line is the recovery safety net for the kernel swap. If the new kernel fails to boot, press 8 at the loader menu (Boot Environments), pick the pre-kernel BE, press enter. The SNAPSHOT swap booted clean in our case, so we never invoked the rollback. (The earlier pf lockout below was recovered via physical console + pfctl -d, before the BE-safety-net pattern landed in this workflow.)

Legacy caveat — the following from an earlier pass still applies to the bhyve-minimal rig, which is still a stub:

for bhyve-full and bhyve-prewarm-pool a FreeBSD VM image must be fetched from freebsd.org; for bhyve-minimal a custom MINIMAL kernel needs to be built from /usr/src. The mise tasks exist as placeholders; see .mise.toml.

pf lockout safety

On first run, the jail-vnet-pf setup phase locked honor out of SSH for ~35 minutes at 2026-04-21T22:54. The ruleset was block out all / pass out on lo0 all with no explicit SSH pass rules; applying it with pfctl -e dropped the outbound half of the active SSH session and the host went dark until it was physically recovered.

The fix is layered. The committed ruleset at benchmarks/rigs/pf.bench.conf has:

  1. set skip on lo0 — pf never filters loopback, period.
  2. pass in quick proto tcp to port 22 keep state and pass out quick proto tcp from port 22 keep state — management SSH is permitted in both directions regardless of any later block rule.
  3. pass quick proto icmp — ICMP stays open for quick sanity probes.
  4. block out all — default-deny egress, which is the actual experimental rule we're measuring the cost of.

But even a correct ruleset has a bootstrap problem: if you typo the next version of it, you still lock yourself out. The committed setup-pf.sh wraps pfctl -f + pfctl -e in a daemon(8)-spawned dead-man switch. If the script doesn't reach its final kill line within DMS_TIMEOUT seconds (default 60), pfctl -d fires from the dead-man child and pf disables itself. The mise task bench:setup-pf drives this end-to-end and performs an independent SSH-reachability check from the dev machine post-apply — if that check fails, the dead-man handles the cleanup.

benchmarks/rigs/pf.bench.conf 31 lines pf ruleset
# pf.bench.conf — ruleset applied on honor for the jail-vnet-pf rig.
#
# IMPORTANT: designed to NEVER lock out the SSH control plane. The
# earlier version of this ruleset (`block out all` with no management
# pass rules) wedged honor at 2026-04-21T22:54 because the outbound
# half of the active SSH session got dropped. Do not reintroduce that
# shape. Both `pass in` and `pass out` for TCP/22 are explicit here,
# and `set skip on lo0` guarantees loopback is never filtered.
#
# Behavior: default-deny egress on the external NIC, except the SSH
# management traffic and ICMP. This matches the posture CubeVS
# enforces per-sandbox — block all except explicitly allowed.

# Loopback — never filter.
set skip on lo0

# SSH management plane, both directions. Keep-state required so once a
# session is established, reply traffic is authorized from the state
# entry rather than needing its own rule.
pass in  quick proto tcp to port 22 keep state
pass out quick proto tcp from port 22 keep state

# ICMP — lets ping-based health checks keep working.
pass quick proto icmp
pass quick proto icmp6

# Default egress: block everything else. This is the part we are
# actually measuring — the cost of per-packet pf filtering on jail
# egress traffic.
block out all
benchmarks/rigs/setup-pf.sh 52 lines dead-man-guarded applier
#!/bin/sh
# setup-pf.sh — apply pf.bench.conf with a dead-man's-switch that auto-
# reverts if SSH isn't confirmed alive within DMS_TIMEOUT seconds.
#
# Usage (as root, typically via `mise run bench:setup-pf`):
#     sh setup-pf.sh /path/to/pf.bench.conf
#
# Safety model:
#   1. Spawn a detached child via daemon(8) that sleeps DMS_TIMEOUT
#      seconds and then calls `pfctl -d`. Its PID is written to a
#      pidfile so we can cancel it.
#   2. Apply the ruleset with pfctl -f and enable pf.
#   3. If we reach the end of the script without the shell dying, cancel
#      the dead-man.
#
# If SSH goes away during (2) — e.g. because the rules were wrong — the
# dead-man fires, pf is disabled, and the host becomes reachable again.

set -eu

RULES=${1:?usage: setup-pf.sh /path/to/pf.bench.conf}
DMS_TIMEOUT=${DMS_TIMEOUT:-60}
DMS_PIDFILE=/tmp/bench-pf-deadman.pid

if [ ! -f "$RULES" ]; then
  echo "setup-pf: rules file not found: $RULES" >&2
  exit 2
fi

# Ensure pf kernel module is loaded.
kldload pf 2>/dev/null || true

# Fire the dead-man via daemon(8) so it survives our shell exiting.
# daemon -f detaches, -p writes the child's pid.
daemon -f -p "$DMS_PIDFILE" /bin/sh -c "sleep $DMS_TIMEOUT; pfctl -d; logger 'bench-safe: pf disabled by deadman after ${DMS_TIMEOUT}s'"
sleep 0.2  # let daemon write the pidfile

# Apply ruleset and enable pf. `pfctl -f` preserves the existing state
# table, so an active SSH session is kept alive across reload.
pfctl -f "$RULES"
pfctl -e 2>/dev/null || true
pfctl -s info | head -1

# Cancel the dead-man — we're alive and rules applied cleanly.
if [ -f "$DMS_PIDFILE" ]; then
  DMS_PID=$(cat "$DMS_PIDFILE")
  kill -TERM "$DMS_PID" 2>/dev/null || true
  rm -f "$DMS_PIDFILE"
fi

echo "setup-pf: bench ruleset active; deadman cancelled"

Known caveats