run_code — the WebSocket E2B actually runs on

The REST surface is the skeleton. The nervous system — the thing that makes sandbox.run_code(“print(‘hi’)”) feel like a live Jupyter cell — is a WebSocket upgrade that proxies a Jupyter-kernel messaging protocol. Our /appendix/e2b-compat page only names it. This page lays it out, because it’s the biggest single piece of work in a real E2B port and the site under-counts it.

Where the clients actually connect

Two SDKs, two surfaces:

The e2b-code-interpreter path is the one most agent developers consume — it’s what maps directly to ChatGPT’s “Code Interpreter” UX.

What the Jupyter messaging protocol is

The canonical reference is jupyter-client.readthedocs.io/en/latest/messaging.html. Over a WebSocket, each frame is a JSON envelope:

{
  "header":        { "msg_id": "...", "msg_type": "execute_request",
                     "session": "...", "username": "...", "version": "5.3",
                     "date": "2026-04-22T...Z" },
  "parent_header": {},
  "metadata":      {},
  "content":       { "code": "print('hi')", "silent": false, "store_history": true,
                     "user_expressions": {}, "allow_stdin": false,
                     "stop_on_error": true },
  "buffers":       []
}

The protocol is strongly asynchronous. Client sends one execute_request; server emits a stream of reply messages tagged with parent_header.msg_id referring back to the request. The canonical sequence for print(‘hi’); 1+1:

  1. statusexecution_state: busy
  2. execute_input — echoes the code so the client can display it
  3. streamname: stdout, text: “hi\n”
  4. execute_result — the value of the last expression as a MIME bundle (e.g. {"text/plain": "2"}) — the client picks the richest renderer it supports
  5. execute_replystatus: ok, execution_count: N
  6. statusexecution_state: idle

For a cell that produces an image (matplotlib), a DataFrame (pandas), or a LaTeX expression, the stream adds:

The MIME bundle is where the richness lives. A competent client picks image/png over text/plain, renders text/html for a DataFrame as a real HTML table, and falls back to text/plain for everything else.

Why it’s structured this way

Because it’s IPython. ipykernel is a thin framing layer on top of a Python interpreter that exposes the REPL state over ZMQ sockets; Jupyter’s server front-ends that ZMQ with WebSockets for browser clients. e2b-code-interpreter is — at the SDK/client level — a Jupyter-Kernel client. E2B runs ipykernel inside the sandbox.

What a real FreeBSD backend has to do

Four pieces, in order of difficulty:

1. Run ipykernel in the jail

The sandbox template must include python3, ipython, and ipykernel — plus whatever packages the agent workflow needs (numpy / pandas / matplotlib for the common case). Concretely, a FreeBSD jail template that’s a peer to E2B’s default Python templates requires:

Rough template size: 500 MB to 1 GB after trim. Still ZFS-cloneable in milliseconds so per-sandbox provisioning stays cheap.

2. Manage the kernel lifecycle

Starting and stopping Jupyter kernels. Kernels can die — agent code might call os._exit(0), run a deadlock, or OOM. The gateway has to:

3. Proxy Jupyter WebSocket ↔ ZMQ

The kernel itself speaks ZMQ over a connection.json file with five sockets (shell, iopub, stdin, control, heartbeat). Jupyter Server projects these onto a single WebSocket with a channel field on each message. The gateway does the protocol translation.

Rust has decent ZMQ support via zeromq crate. The code is mechanical: for each sandbox.run_code(), open a WebSocket, attach the five ZMQ sockets to the in-jail kernel, forward messages bidirectionally with the channel-name translation layer.

4. Filesystem + process APIs

Separate WebSocket or REST endpoints (E2B uses both):

On FreeBSD, kqueue covers the filesystem-watch case. commands.run is a jexec call with WS as the transport.

What we actually observed running the SDK against our MVP

On 2026-04-22 we pip-installed e2b-code-interpreter on honor and pointed it at our e2b-compat server. The transcript:

[1] Sandbox.create(template="default", timeout=30)
    OK: id=5dca34bc59fe436d816deac361577b8c
[2] run_code("print(1+1)")
    FAILED ConnectError: [Errno 8] Name does not resolve
[3] kill
    OK

Create + list + kill via the real SDK works. That’s a meaningful drop-in proof: the E2B Python Sandbox.create call reaches our axum server, our server creates a ZFS clone + jail, our response deserializes correctly, and the SDK constructs a Sandbox object it’s happy with.

run_code fails at DNS resolution because the SDK routes its envd traffic by sandbox-scoped hostname. From e2b.connection_config.ConnectionConfig.get_host:

def get_host(self, sandbox_id, sandbox_domain, port):
    if self.debug:
        return f"localhost:{port}"
    return f"{port}-{sandbox_id}.{sandbox_domain}"

For e2b-code-interpreter, port = JUPYTER_PORT = 49999, so a run_code call tries to open an HTTP connection to e.g. 49999-5dca34bc....honor/execute. That’s the same host-header-based routing scheme CubeProxy (nginx+lua) implements in CubeSandbox. We don’t have that proxy, so the hostname doesn’t resolve.

The gap is concrete and small. To make run_code actually work we need three pieces, in order:

  1. envd (or an equivalent) running inside the jail, listening on 49999 for POST /execute + POST /contexts + their friends. Inside e2b-code-interpreter, run_code is a plain HTTP call against the envd’s /execute endpoint with the code in the body and stdout/stderr streamed back. It is not a Jupyter-ZMQ-over-WebSocket call from the client side — E2B’s cloud hides that behind a REST façade. (This is a pleasant surprise; ZMQ-multiplexing would be harder.)
  2. A reverse proxy that routes <port>-<sandbox_id>.<domain> to the right jail’s envd on port <port>. This is ~30 lines of nginx config, or any Go/Rust HTTP router that parses the Host header. Cube does this in CubeProxy.
  3. DNS setup. Either a wildcard *.sandboxes.local → proxy host, or we have agents configure E2B_DOMAIN to something the client can resolve and set E2B_DEBUG=1 to make the SDK use localhost:49999 directly (fine for single-box experiments).

What’s inside envd: Looking at the E2B open-source runtime (e2b-dev/infra’s envd/), it’s a Go binary that embeds an IPython-Jupyter kernel, exposes /execute / /contexts as streaming HTTP endpoints, and handles filesystem + process APIs on sibling ports. So the “run_code” path on the E2B side is: client → REST /execute on envd → envd internally drives an IPython kernel → streams text/event-stream results back. The client parses SSE frames for stdout, stderr, and the final result bundle.

What a FreeBSD port owes: A FreeBSD-flavored envd equivalent. In scope:

None of this is hard; it’s just plumbing. The earlier “weeks of ZMQ multiplexer work” framing in the run_code section below assumed WebSocket + ZMQ, which is a deeper stack. The reality — envd exposes REST, client consumes SSE — is cheaper to port.

Update (2026-04-22): run_code works end-to-end

After the first pass of this appendix we built an envd-compat endpoint directly into e2b-compat — a second Axum listener on port 49999 that handles POST /execute, POST /contexts, etc., and streams the NDJSON protocol the SDK consumes. With E2B_DEBUG=true the SDK routes <jupyter_url> at localhost:49999 and the pipe goes straight to our handler.

Transcript (2026-04-22, honor):

sandbox: 4dae9f7f532c4fc7b23bdff8500b5f47

[1] hello world (python)
    stdout: ['hello, world\n']

[2] arithmetic
    stdout: ['0\n', '1\n', '4\n', '9\n', '16\n']

[3] stderr
    stdout: ['and some stdout\n']
    stderr: ['an error line\n']

[4] NameError → SDK error
    stderr: ['Traceback (most recent call last):\n',
             '  File "<string>", line 1, in <module>\n',
             "NameError: name 'undefined_variable' is not defined\n"]
    error: NonZeroExit: exit code 1

[5] shell
    stdout: ['bash\n', 'DAEMON\n', 'FILESYSTEMS\n', 'LOGIN\n']

The first pass covers everything a stateless agent-code-interpreter demo needs: stdout streaming, stderr streaming, error surfacing with traceback, language switching. We installed Python 3.11 into the jail template (pkg -c /jails/_template install -y python311 + symlink /usr/local/bin/python3python3.11 + refresh the @base snapshot) so that ZFS clones of the template have Python already baked in.

Update (2026-04-22, later): persistent ipykernel, state across calls

The remaining gap in the first pass was that every run_code spawned a fresh jexec -l <jail> /usr/local/bin/python3 -u -c <code> subprocess. Variables, imports, open files — nothing survived across calls. That’s not a code-interpreter backend; it’s a one-shot.

Closing the gap was mostly plumbing, not protocol archaeology:

  1. Bake ipykernel into the template. pkg -r /jails/_template install -y py311-ipykernel py311-pyzmq py311-numpy py311-matplotlib py311-pandas + zfs snapshot zroot/jails/_template@base. All bindings clean out of the box on FreeBSD 15 — no pip-in-venv fallback needed. (pkg -c wants PROC_NO_NEW_PRIVS; we used pkg -r <rootdir> instead, which doesn’t require a chroot.)
  2. Spawn the kernel on sandbox create. jexec <jail> /usr/local/bin/python3 -m ipykernel_launcher -f /tmp/connection.json. Store its host-side PID in a HashMap<sandbox_id, KernelInfo> on AppState. On DELETE /sandboxes/:id we kill -TERM the PID before reaping the jail.
  3. Bridge to the kernel on /execute. An in-jail Python script (/usr/local/libexec/e2b-kernel-bridge.py) is spawned per request. It uses jupyter_client.BlockingKernelClient to load /tmp/connection.json, sends an execute_request on shell, and translates iopub messages into the envd NDJSON envelope format directly — streamstdout/stderr, execute_result / display_dataresult with MIME keys (text, html, png, svg, json, …), errorerror with name/value/traceback.

Why a Python bridge instead of a Rust ZMQ client: on FreeBSD the zeromq-crate + libzmq combo is passable but noisy to build; the Python jupyter_client library already does exactly the framing we need (HMAC signature over the wire, multipart ZMQ envelopes, msgpack-or-JSON payloads, …). The bridge is ~60 LoC. The gateway calls it with code on stdin and reads NDJSON on stdout — the same stream shape it already forwarded to the SDK.

Transcript (2026-04-22, honor, new kernel path):

sandbox: 5b71ba55c87e4826a366035c9db5d37d

[1] run_code("x = 42")
    stdout: []
    results: []

[2] run_code("print(x)")        # same kernel — state persists
    stdout: ['42\n']

[3] run_code("import numpy as np; np.array([1,2,3])")
    results: [Result(text='array([1, 2, 3])', is_main_result=True)]

[4] run_code("import pandas as pd; \
              from IPython.display import display; \
              display(pd.DataFrame({'a':[1,2], 'b':[3,4]}))")
    results: [Result(text='   a  b\\n0  1  3\\n1  2  4',
                     html='<div>...<table class="dataframe">...')]

[5] run_code("matplotlib plot → display(Image(png))")
    results: [Result(png='iVBORw0KGgoAAAANSUhEUgAA...'   # PNG magic, 20 KB base64
                     is_main_result=False)]

[6] run_code("undefined_variable")
    error: ExecutionError(
      name='NameError',
      value="name 'undefined_variable' is not defined",
      traceback='...NameError: name \\'undefined_variable\\' is not defined')

All six cases above run against the same ipykernel inside the sandbox. The state-set/state-read pair in [1]/[2] is the cheapest possible proof of a long-lived kernel. [3] shows the MIME bundle coming through with text/plain. [4] adds text/html from display(pandas.DataFrame). [5] is the real payoff: a matplotlib figure arrives as image/png base64 and verifies (raw[:8] == b"\x89PNG\r\n\x1a\n"). [6] exercises the error envelope so the SDK can raise ExecutionError with a proper .name / .value / .traceback.

The rig lives at benchmarks/rigs/jupyter-e2e.sh; it drives the official e2b-code-interpreter Python SDK against a local e2b-compat instance (see the file for env-var knobs). Seven checks, seven passes.

Minimal reproduction scripts for agent developers: examples/02-persistent-kernel.py and friends under examples/ — 30-line SDK-against-our-gateway demos covering hello, persistent state, rich output, error handling, network isolation, diagnose, and N-way fanout.

Where the remaining gaps are

With the current e2b-compat:

The shape of the port is now fully proven. Everything that remains is packaging or incremental feature work, not protocol archaeology.

Secondary hatches in e2b-compat

Alongside the /execute endpoint on :49999 there are two older, simpler shims that predate the envd-compat work and that we kept for smoke testing:

Production clients use the /execute endpoint described in the Update section above. The E2B Python SDK round-trip passes through that path exclusively.

Why the site’s port-sketch is still honest without this

Because the REST surface — the thing that makes Sandbox.create → run_code → close routing work — is what our site’s “drop-in” claim references. The REST surface is portable; we’ve verified it. The WebSocket Jupyter protocol is the next tier of work, not a fatal obstacle. It’s just a week of Rust and a week of jail templating.