The CubeSandbox examples include agent-shaped demos rather than new substrate primitives: create a sandbox, write workspace files, run commands, inspect artifacts, and iterate. Coppice now has reproducible receipts for that exact path, plus a subscription-backed Codex CLI smoke and a Modal-style Python decorator demo that prove developer tooling can drive Coppice directly.
OpenAI Agents SDK
The receipt is benchmarks/rigs/openai-agents-coppice-smoke.sh.
It installs openai-agents with uv, creates a
real Agent with two function_tool tools, and
drives the official SDK runner with a deterministic local model so no
OpenAI API key is needed.
The tools are backed by a live Coppice sandbox:
coppice_write_filewrites through the gateway’s/sandboxes/:id/filesroute.coppice_execruns through the envd/commandsstreaming route with a49999-<sandbox>.coppice.lanHost header.
The run writes /tmp/openai-agents/input.txt, executes
Python inside the sandbox to produce
/tmp/openai-agents/out.txt, reads the artifact back, and
asserts the final output.
The session target is not hardcoded. By default the rig creates a fresh
sandbox from E2B_TEMPLATE; set
COPPICE_SANDBOX_ID=<id> to attach the same Agents SDK
tool session to an existing sandbox instead. Set
COPPICE_KEEP_SANDBOX=1 when you want a newly-created
sandbox to survive the receipt for manual inspection.
Latest transcript:
benchmarks/results/agent-demos/latest-openai-agents.txt.
OpenAI Agents + E2B Client Shape
The competitor example does not use MCP. It wires the Agents SDK sandbox
abstraction directly to E2BSandboxClient:
SandboxAgent, SandboxRunConfig,
E2BSandboxClientOptions, and the bundled
Shell capability.
The receipt is
benchmarks/rigs/openai-agents-e2b-client-smoke.sh. In the
default client-smoke mode it uses the official
openai-agents[e2b] stack to create or attach a sandbox,
start the sandbox session, write coppice-agent.txt, run
uname -s plus cat through the E2B command
client, and shut the sandbox down. This mode does not need an
OPENAI_API_KEY; it proves the exact E2B client transport.
Run the keyless transport receipt:
mise run agents:openai-e2b-client-smoke
Run the real OpenAI agent path from a desktop with an API key:
OPENAI_API_KEY=... \
OPENAI_AGENTS_E2B_MODE=agent \
OPENAI_MODEL=gpt-4.1-mini \
mise run agents:openai-e2b-client-smoke
The rig opens or reuses the normal desktop tunnels
127.0.0.1:3001 -> honor:3000 and
127.0.0.1:49999 -> honor:49999. It also starts a tiny
local envd proxy that injects the per-sandbox Host header, so E2B SDK
traffic routes to the intended sandbox instead of relying on the
gateway’s most-recent-sandbox fallback.
Useful knobs:
E2B_TEMPLATE=pythonselects the sandbox template.COPPICE_SANDBOX_ID=<id>attaches the client-smoke mode to an existing sandbox.COPPICE_KEEP_SANDBOX=1leaves a newly-created client-smoke sandbox running.OPENAI_AGENTS_E2B_QUESTION=…overrides the real-agent prompt.
Latest transcript:
benchmarks/results/agent-demos/latest-openai-agents-e2b-client.txt.
OpenAI Agents Code Interpreter Shape
The second Cube example layers custom sandbox capabilities on top of the
same E2B client: a shell-inspection tool, a Python runner tool, and a
Manifest that seeds workspace data before the agent starts.
Coppice ports that shape in
benchmarks/rigs/openai-agents-code-interpreter-smoke.sh.
The default deterministic mode does not require an
OPENAI_API_KEY. It still uses a real
SandboxAgent, SandboxRunConfig, custom
Capability classes, and official E2BSandboxClient
transport. The local deterministic model invokes run_python
against a manifest-seeded sales.csv, writes generated
artifacts under output/, then invokes shell to
inspect them.
Run the keyless receipt:
mise run agents:openai-code-interpreter-smoke
Run the real OpenAI version:
OPENAI_API_KEY=... \
OPENAI_AGENTS_CODE_MODE=agent \
OPENAI_MODEL=gpt-4.1 \
mise run agents:openai-code-interpreter-smoke
Both modes assert that every Python/shell tool call exits 0 and that
all of these sandbox artifacts exist:
output/monthly_revenue.csv,
output/monthly_revenue.svg,
output/top_products.md, and
output/summary.txt.
The data schema is pinned in the prompt and manifest as
date,product,units,unit_price, and the top-products
receipt checks exactly three product rows sorted by revenue by reading
output/top_products.md directly, not by asking the model for
JSON or a specific markdown-table style.
The harness forces headless plotting (MPLBACKEND=Agg) and
the prompt tells the real model to generate the SVG directly so a GUI
backend cannot leak into the receipt.
Latest transcript:
benchmarks/results/agent-demos/latest-openai-agents-code-interpreter.txt.
Codex CLI subscription smoke
The receipt is benchmarks/rigs/codex-cli-coppice-smoke.sh.
It runs the local codex exec binary non-interactively, so it
uses the desktop Codex login/subscription state rather than an
OPENAI_API_KEY. The nested Codex agent receives a constrained
prompt plus a JSON output schema, then drives the live Coppice gateway:
health check, create a sandbox, execute a command in it, and delete it.
This is deliberately separate from the OpenAI Agents SDK receipt above. The SDK receipt proves our tool/session shape against the SDK APIs; the Codex CLI receipt proves subscription-backed agent tooling can operate Coppice end to end through the public gateway surface.
Run it:
mise run agents:codex-cli-smoke
Useful knobs:
E2B_API_URL=http://127.0.0.1:3001targets an existing desktop SSH tunnel; if that URL is down, the rig opens127.0.0.1:3001 → honor:127.0.0.1:3000.E2B_TEMPLATE=pythonselects the sandbox template.CODEX_MODEL=<model>overrides the local Codex model.CODEX_BYPASS_SANDBOX=0uses Codex’s full-auto sandbox instead of the explicit bypass mode.
Latest transcript:
benchmarks/results/agent-demos/latest-codex-cli.txt.
The raw nested Codex event stream is preserved beside it as
benchmarks/results/agent-demos/latest-codex-cli.events.jsonl,
and the schema-validated final object is
benchmarks/results/agent-demos/latest-codex-cli.final.json.
Python decorator ergonomics
The receipt is benchmarks/rigs/python-decorator-smoke.sh.
It proves the Modal-shaped ergonomics row without adding a gateway
primitive: examples/16-python-decorator.py defines a
@sandboxed helper whose .remote(…) method
creates or attaches to a Coppice sandbox, ships the Python function
source plus JSON-native arguments through /sandboxes/:id/exec,
parses a JSON result marker, and tears down newly-created sandboxes.
Run it:
mise run agents:python-decorator-smoke
Useful knobs:
E2B_API_URL=http://127.0.0.1:3001targets an existing desktop SSH tunnel.COPPICE_DECORATOR_TEMPLATE=pythonselects the template.COPPICE_SANDBOX_ID=<id>makes the decorator attach to an existing sandbox instead of creating a fresh one.
Latest transcript:
benchmarks/results/python-decorator/latest.txt.
Mini-RL / SWE-style loop
The receipt is benchmarks/rigs/mini-rl-training-smoke.sh.
It creates a sandbox workspace with a deliberately broken
policy.py and a tiny trainer/test harness. The first
command run fails with reward below threshold. The rig then patches the
policy, reruns the same command stream, and verifies the checkpoint:
best_arm=1, score=1.000.
This is intentionally small, but it exercises the same control loop as larger SWE-bench or RL demos: write workspace, run tests, inspect failure, patch, rerun, and preserve the receipt.
Latest transcript:
benchmarks/results/agent-demos/latest-mini-rl.txt.
Run the keyless agent receipts, including the decorator and mini-RL receipts:
mise run agents:demos
Combined latest transcript:
benchmarks/results/agent-demos/latest-summary.txt.
What this closes
These receipts close the old “agent demos are blocked on commands” audit rows. The underlying gateway pieces were already closed separately: file operations, command streaming, logs, code execution, and sandbox lifecycle. This page ties them together in example-shaped flows that can be rerun against honor. The Codex CLI receipt is not a CubeSandbox feature row by itself; it is operational evidence that subscription-authenticated desktop agents can use the same public gateway surface without API-key plumbing.