data-eng: eval.sh wrapper to keep /opt/env on PATH for agent self-evals by shauryr · Pull Request #7 · LLM360/PostTrainBench

shauryr · 2026-05-29T21:44:26Z

Problem

Inside the apptainer container for the data-eng prompt, run_task.sh passes PATH="/opt/env/local/bin:/opt/env/bin:..." through apptainer exec --env so the bind-mounted Python env's CLIs (including vllm) are reachable. The vllm binary itself is present at /opt/env/local/bin/vllm.

However, codex CLI runs every shell command via bash -lc "..." (login shell). The login flag sources /etc/profile + ~/.bashrc, which overwrites PATH with the container's defaults and strips out /opt/env/local/bin. So the agent sees vllm: command not found even though the binary is mounted and executable.

Why it matters

The V2 discipline framework (separate, unmerged branch feature/v2-discipline) requires every experiment to self-eval via evaluate.py --json-output-file experiments/exp_<N>/eval_result.json. inspect_ai spawns a local vllm server. If vllm isn't on PATH, the eval fails, the agent can't fill in ## Outcome: eval_after: <X>, and publish_experiment.py refuses the row.

Observed in both V1 (6h) and V2 (12h clean-slate) data-eng pilots: agents rediscover the issue and manually prefix with PATH=/opt/env/local/bin:$PATH python3 evaluate.py ..., burning ~5 min of exploration each time.

Fix

A small eval.sh wrapper added to src/eval/general/ that re-asserts the bind-mounted PATH before exec'ing evaluate.py:

export PATH="/opt/env/local/bin:/opt/env/bin:${PATH}"
exec python3 /home/ben/task/evaluate.py "$@"

run_task.sh copies the wrapper into ${JOB_DIR}/task/ only when POST_TRAIN_BENCH_PROMPT=data_eng_prompt, so default-prompt runs are unchanged.

Agents call bash eval.sh ... instead of python3 evaluate.py ....

Scope

New file: src/eval/general/eval.sh (executable)
src/run_task.sh: +6 lines inside the existing data-eng-gated cp block

Diff: 2 files, +24 insertions.

Out of scope

The data_eng_prompt.txt update telling the agent to USE the wrapper lives in the V2 branch and will land with that PR. This PR ships only the infrastructure piece so it's usable independently and can land before V2 is ready.

Draft

Marked draft because:

V1 pilots already work around the issue manually, so this is a quality-of-life fix, not a blocker.
The V2 branch (which would actually exercise the wrapper at scale via the locked Step 7 self-eval) is still under review/iteration.

The codex CLI runs every shell command via `bash -lc "..."` (login shell), which sources /etc/profile + ~/.bashrc and overwrites PATH — stripping the apptainer-injected /opt/env/local/bin entry where the bind-mounted `vllm` CLI lives. As a result agents see `vllm: command not found` and inspect_ai can't spawn a local server. Observed in both V1 (6h) and V2 (12h clean-slate) data-eng pilots: agents rediscover the issue and manually prefix commands with `PATH=/opt/env/local/bin:$PATH ...`, burning ~5 min of exploration each time. This adds a small `eval.sh` wrapper that re-asserts PATH and execs `python3 evaluate.py "$@"`. Copied into the task workspace only when POST_TRAIN_BENCH_PROMPT=data_eng_prompt, so default-prompt runs are unchanged. The data_eng_prompt.txt update to actually USE the wrapper lives in feature/v2-discipline (separate PR).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-eng: eval.sh wrapper to keep /opt/env on PATH for agent self-evals#7

data-eng: eval.sh wrapper to keep /opt/env on PATH for agent self-evals#7
shauryr wants to merge 1 commit into
mainfrom
feature/eval-wrapper

shauryr commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shauryr commented May 29, 2026

Problem

Why it matters

Fix

Scope

Out of scope

Draft

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant