Wildcard DNS for SDK routing

The E2B SDKs construct envd URLs as <port>-<shortid>.<domain>. On the upstream service those hostnames resolve to per-sandbox IPs behind a fanout proxy. In Coppice there are no per-sandbox IPs — the gateway listens on one address for every sandbox, and when a request arrives it reads the Host header, extracts <shortid>, and jexecs into the right jail. So the only thing a dev machine has to do to use real SDK URLs (no E2B_DEBUG=true kludge) is resolve *.coppice.lan to wherever the gateway is listening. That’s it. This page is the recipe.

What the SDK wants

The Python/Node/Go SDKs compute the envd URL like this, once the lifecycle API hands them back a sandboxID:

https://<port>-<shortid>.<domain>/...

The <port> in the subdomain is a label, not the TCP port the SDK connects to — the TCP port comes from the URL scheme (443 for https) or from whatever is appended to <domain>. We smuggle the gateway’s port through E2B_DOMAIN, e.g. E2B_DOMAIN=coppice.lan:49999, so the URL becomes http://49999-&lt;id&gt;.coppice.lan:49999/. The subdomain 49999- tells the gateway which envd handler set the client wants; the :49999 authority port tells the kernel which TCP listener to connect to.

The SDK does not look up a per-sandbox IP. It sends the request to whatever DNS says 49999-<shortid>.coppice.lan resolves to, with that exact name in the Host header.

One wrinkle in the Python SDK: _jupyter_url ties scheme to debug mode (https when debug=False, http when debug=True). Our gateway serves plain HTTP — production-parity TLS is explicitly out of scope here — so the example code does a three-line override keeping debug=False but forcing http. See examples/02-persistent-kernel.py. Once the gateway gets a TLS frontend this override goes away.

Why the gateway alone is sufficient

Our gateway already does the routing in-process. The relevant bit is sandbox_from_host() in e2b-compat/src/envd.rs:79:

fn sandbox_from_host(headers: &HeaderMap) -> Option<String> {
    let host = headers.get("host")?.to_str().ok()?;
    let (hostpart, _port) = host.split_once(':').unwrap_or((host, ""));
    let (_port_prefix, rest) = hostpart.split_once('-')?;
    let (sandbox_id, _tail) = rest.split_once('.')?;
    Some(sandbox_id.to_string())
}

Each envd handler calls pick_sandbox(), which first tries sandbox_from_host(), falls back to most-recently-started only if the Host header can’t be parsed (that’s the E2B_DEBUG=true path, where the SDK sends Host: localhost:49999). With wildcard DNS the Host-header path always wins and per-sandbox routing Just Works.

Jails run with ip4=inherit and do not bind listeners of their own. The gateway owns all envd/files/execute traffic, so there is no per-sandbox IP to route to in the first place. An L7 reverse proxy like coppiceproxy is not in the loop for the SDK fanout case. (It exists as a fallback for exposing in-jail listeners — a user web app, Chromium’s CDP on :9222 — which need per-jail addressability via the VNET refactor tracked in #69.)

DNS setup on the three OSes we actually use

Linux dev laptop — NetworkManager + dnsmasq

If NetworkManager is running its own dnsmasq (stock on many distros, dns=dnsmasq in /etc/NetworkManager/NetworkManager.conf):

sudoedit /etc/NetworkManager/dnsmasq.d/coppice.conf
# add:
address=/coppice.lan/127.0.0.1

sudo nmcli general reload
getent hosts test.coppice.lan   # -> 127.0.0.1

Linux dev laptop — systemd-resolved

Systemd-resolved doesn’t support per-zone forwarding to a dnsmasq running on a non-standard port cleanly, so the practical kludge is: run a second dnsmasq on 127.0.0.1:5353, then tell systemd-resolved to send coppice.lan there.

sudo tee /etc/dnsmasq.d/coppice.conf <<'EOF'
port=5353
listen-address=127.0.0.1
bind-interfaces
no-resolv
address=/coppice.lan/127.0.0.1
EOF
sudo systemctl enable --now dnsmasq

sudo mkdir -p /etc/systemd/resolved.conf.d
sudo tee /etc/systemd/resolved.conf.d/coppice.conf <<'EOF'
[Resolve]
DNS=127.0.0.1:5353
Domains=~coppice.lan
EOF
sudo systemctl restart systemd-resolved
resolvectl query test.coppice.lan   # -> 127.0.0.1

macOS dev laptop

macOS honours per-zone resolvers via /etc/resolver/.

brew install dnsmasq
echo "address=/coppice.lan/127.0.0.1" \
  >> /opt/homebrew/etc/dnsmasq.conf
sudo brew services start dnsmasq

sudo mkdir -p /etc/resolver
echo "nameserver 127.0.0.1" \
  | sudo tee /etc/resolver/coppice.lan

dscacheutil -q host -a name test.coppice.lan   # -> 127.0.0.1

On-honor use

If you’re driving the SDK directly from honor itself rather than through an SSH tunnel, the options are:

  1. /etc/hosts lines per sandbox shortid. Cumbersome — you’d have to echo >> on every sandbox create and prune on every destroy. Workable for a one-off test, not for an agent loop.
  2. SDK in debug mode (E2B_DEBUG=true) against localhost:3000 / :49999. Routing collapses to “most-recently-started sandbox”, which is fine for single-sandbox scripts.
  3. SSH-forward the gateway ports to your laptop and run the SDK there with wildcard DNS. This is what we actually do.

Worked example

With wildcard DNS pointing coppice.lan at the gateway (or at a tunnel to it on 127.0.0.1), create two sandboxes, bind different Python state in each, and hit their envd surfaces by shortid:

$ curl -s http://localhost:3000/sandboxes -X POST \
    -H 'content-type: application/json' \
    -d '{"templateID":"default"}' | jq -r .sandboxID
# -> 9f3a64b7d4c349cfba1e2f0c8a1b77d1

$ curl -s http://localhost:3000/sandboxes -X POST \
    -H 'content-type: application/json' \
    -d '{"templateID":"default"}' | jq -r .sandboxID
# -> 5cd0e21b9e784a8c8b4a86d4f2c76a10

$ curl -s "http://49999-9f3a64b7d4c349cfba1e2f0c8a1b77d1.coppice.lan:49999/files/list?path=/root" \
    -H 'content-type: application/json'
{"entries":[...sandbox A rootfs...]}

$ curl -s "http://49999-5cd0e21b9e784a8c8b4a86d4f2c76a10.coppice.lan:49999/files/list?path=/root" \
    -H 'content-type: application/json'
{"entries":[...sandbox B rootfs — distinct from A...]}

The gateway’s Host: 49999-9f3a….coppice.lan:49999 parse picks out the shortid, looks up the jail, and serves against sandbox A’s ZFS clone. The second request hits a different clone.

Transcript from examples/02-persistent-kernel.py

This is the same example, minus E2B_DEBUG=true, relying on the wildcard-DNS setup described above plus an SSH tunnel localhost:49999 → honor:49999 and localhost:3000 → honor:3000. Full source: examples/02-persistent-kernel.py.

$ uv run --with e2b-code-interpreter examples/02-persistent-kernel.py
preflight: probe.coppice.lan -> 127.0.0.1
sandbox id: 95f0f655ac38447ea2538278b0fe2f54
[1] x = 42          stdout=[]  results=0
[2] print(x)        stdout='42'
[3] numpy result    text='array([1, 2, 3])'
[4] np still there  text='60'
[5] kernel restarted via Sandbox.restart_code_context(0da667ece45242e4b4896fd961661e48)
[6] print(x) after restart  error.name='NameError'
ok

The envd calls travelled to 49999-95f0….coppice.lan:49999, resolved to 127.0.0.1 (the local end of the SSH tunnel into honor’s gateway), and were jexec’d into the right jail by sandbox_from_host(). No E2B_DEBUG=true; the per-sandbox Host-header path won outright.

Out of scope

TLS termination. Coppice’s gateway serves plain HTTP. The wildcard-DNS trick is about routing, not confidentiality. Production-parity TLS would mean a cert with a *.coppice.lan SAN terminating on the gateway. Not done, not planned for this audit.

Per-sandbox IPs for user listeners. Hostname-only routing works because every sandbox surface we care about (envd, files, execute) is served by the gateway. A listener bound inside a jail on an arbitrary port — a user web app, a Jupyter-classic frontend, Chromium’s CDP on :9222 — is not routable this way, because the gateway has nothing to proxy to: jails share the host’s IP and bind nothing themselves. Fixing that requires the VNET refactor tracked in #69, at which point tools/coppiceproxy (the L7 Host-header splitter sibling to CubeSandbox’s CubeProxy) becomes the routing component.

Browser-sandbox specifically. Chromium and other in-jail listeners on arbitrary ports (CDP 9222, user web apps) are not yet routable through wildcard DNS — they’d need per-jail IPs via VNET, tracked separately.