Docker sandbox hangs on session.write()/apply_manifest() over a TLS DOCKER_HOST (DinD, remote daemon)

## Summary

`agents.sandbox` file materialization deadlocks whenever the Docker daemon is reached over **TLS** — e.g. a Docker-in-Docker sidecar or a remote `DOCKER_HOST=tcp://…:2376` with `DOCKER_TLS_VERIFY=1`. `session.write()` (and therefore `apply_manifest()` during workspace setup) hangs forever and never returns.

## Root cause

`DockerSandboxSession._stream_into_exec` (`src/agents/sandbox/sandboxes/docker.py`) writes the payload into a `docker exec` running `tar -x` / `cat` reading from **stdin**, then signals end-of-input with:

```python
try:
    if hasattr(raw_sock, "shutdown"):
        raw_sock.shutdown(socket.SHUT_WR)
    ...
except Exception:
    pass
```

Over a TLS transport, a half-close on the raw socket does **not** deliver a clean stdin-EOF to the container (there is no TLS `close_notify`, and the attempt is silently swallowed by the `except Exception: pass`). The in-container `tar -x` / `cat` therefore blocks forever waiting for input that never terminates, the exec never exits, the daemon never closes the hijacked stream, and the client's drain loop (`while raw_sock.recv(...)`) blocks indefinitely.

This is not hypothetical — it reproduces reliably against:
- a **Docker-in-Docker** sidecar exposing only TLS on `:2376` (common in CI / Kubernetes dev environments), and
- any **remote `DOCKER_HOST`** reached over TLS.

Over a unix socket the half-close works, which is why local runs don't hit it.

## Why not `put_archive()`

The obvious "use `docker cp`" fix is explicitly avoided in this file (see the comments in `read()`/`write()`): with volume-driver-backed mounts attached, daemon archive operations can re-run volume mount setup and some plugins reject the duplicate `Mount` call for the same container id. So the fix should keep the exec+stdin approach.

## Proposed fix

Make the in-container reader terminate on a **byte count** instead of a stdin half-close: measure the payload length and pipe the real command through `head -c <n>`:

```python
payload_length, stream = _measure_stream(stream)
framed_cmd = ["sh", "-c", 'n=$1; shift; head -c "$n" | "$@"', "sh", str(payload_length), *cmd]
```

`head -c <n>` stops after exactly `<n>` bytes and closes its stdout, so the downstream `tar`/`cat` gets EOF from the pipe regardless of whether the exec-stdin half-close is ever delivered. This works identically over unix sockets, TLS TCP, and DinD, and keeps the deliberate avoidance of `put_archive()`.

## Repro (minimal)

```python
# DOCKER_HOST=tcp://<tls-daemon>:2376, DOCKER_TLS_VERIFY=1, DOCKER_CERT_PATH=...
session = await <bring up a DockerSandboxSession>
await session.write(Path("/workspace/x"), io.BytesIO(b"hello"))  # hangs forever
```

## Environment

- `openai-agents` (reproduced on `main` @ current, and on `0.14.6` as pinned by downstream `strix-agent`)
- Docker daemon reached via TLS (`DOCKER_HOST=tcp://…:2376`, `DOCKER_TLS_VERIFY=1`)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Docker sandbox hangs on session.write()/apply_manifest() over a TLS DOCKER_HOST (DinD, remote daemon) #3718

Summary

Root cause

Why not `put_archive()`

Proposed fix

Repro (minimal)

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Docker sandbox hangs on session.write()/apply_manifest() over a TLS DOCKER_HOST (DinD, remote daemon) #3718

Description

Summary

Root cause

Why not put_archive()

Proposed fix

Repro (minimal)

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Why not `put_archive()`