Skip to content

Conversation

@olesho
Copy link
Contributor

@olesho olesho commented Nov 11, 2025

No description provided.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 244 to 261
def __init__(self, config_dir: Optional[Path] = None):
"""
Initialize task loader.

Args:
config_dir: Path to WebArena config_files directory.
Defaults to submodules/webarena/config_files/
"""
if config_dir is None:
# Go from evals/lib/ to project root, then to submodules/webarena/config_files
project_root = Path(__file__).parent.parent.parent
webarena_dir = project_root / 'submodules' / 'webarena'
config_dir = webarena_dir / 'config_files'

self.config_dir = Path(config_dir)

if not self.config_dir.exists():
raise FileNotFoundError(f"Config directory not found: {self.config_dir}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Update WebArenaTaskLoader default path to new data location

The new WebArena utilities default to loading tasks from submodules/webarena/config_files, but that directory no longer exists in the repo after the restructure (only evals/webarena/... was added). Instantiating WebArenaTaskLoader() immediately raises FileNotFoundError, so run_webarena.py, test_webarena_integration.py, and the list/run helpers cannot load any tasks. The loader should default to the new in-repo path (or accept a parameter) instead of referencing a removed submodule.

Useful? React with 👍 / 👎.

This restructuring organizes the repository to support different deployment types
and evaluation frameworks, with all git submodules in a dedicated submodules/ folder.

Changes:
- Moved all submodules to submodules/ directory:
  - kernel-images → submodules/kernel-images
  - browser-operator-core → submodules/browser-operator-core
  - webarena → submodules/webarena

- Updated .gitmodules to point all submodules to submodules/ directory

- Updated all Dockerfiles to use submodules/ paths:
  - Dockerfile.devtools
  - Dockerfile.kernel-cloud
  - deployments/cloudrun/Dockerfile
  - deployments/local/Dockerfile
  - deployments/local-webarena/Dockerfile

- Updated Makefiles to initialize submodules from submodules/ directory:
  - deployments/local/Makefile
  - deployments/local-webarena/Makefile

- Moved WebArena config files to evals/webarena/config_files/
  - Copied 812 benchmark task configs from submodule

- Fixed WebArenaTaskLoader to try new location first with fallback:
  - evals/lib/webarena_adapter.py

- Fixed EvalLoader to support evals/native/data/ structure:
  - evals/lib/eval_loader.py (path resolution for restructured evals)

- Updated documentation to reflect new structure:
  - CLAUDE.md (main technical docs)
  - evals/CLAUDE.md (evals-specific docs)

The restructuring supports three deployment types:
- deployments/local/ - Local development
- deployments/local-webarena/ - Local with WebArena
- deployments/cloudrun/ - Google Cloud Run

All submodules are now properly registered and will download to submodules/
when running 'git submodule update --init'.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants