ColBench by ahmedheakl · Pull Request #37 · parameterlab/MASEval

ahmedheakl · 2026-02-24T07:00:29Z

Description

Add ColBench (Collaborative Agent Bench).

Modules added (maseval/benchmark/colbench/):

ColBenchBenchmark: orchestrates the task loop
ColBenchUser: LLM-backed human simulator
ColBenchAgentAdapter / ColBenchAgentInner: agent under test
ColBenchEnvironment: task state holder
ColBenchCodeEvaluator: unit-test scoring with sandboxed execution
OpenAIModelAdapter: ModelAdapter implementation for OpenAI-compatible APIs (vLLM, TGI, etc.)

(examples/colbench_benchmark):

colbench.py: CLI runner matching the original sweet_rl workflow

Output format is backward-compatible with the original [sweet_rl](https://github.com/facebookresearch/sweet_rl) evaluation scripts.

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Code quality improvement (refactoring, formatting, etc.)

Checklist

Contribution

I have read the [CONTRIBUTING.md](CONTRIBUTING.md) guide.
Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"

Documentation

Added/updated docstrings for new/modified functions as instructed [CONTRIBUTING.md](CONTRIBUTING.md)
Updated relevant documentation in docs/ (if applicable)
Tag github issue with this PR (if applicable)

Changelog

Added entry to CHANGELOG.md under [Unreleased] section
- Use Added section for new features
- Use Changed section for modifications to existing functionality
- Use Fixed section for bug fixes
- Use Removed section for deprecated/removed features
OR this is a documentation-only change (no changelog needed)

Example:
- Add ColBench benchmark for multi-turn collaborative agent evaluation

Architecture (if applicable)

Core/Interface separation: Changes in maseval/core/ do NOT import from maseval/interface/
Dependencies: New core dependencies added sparingly; framework integrations go to optional dependencies

Additional Notes

Requires openai package (already an optional dependency) and a running vLLM server.
Tested with meta-llama/Llama-3.1-8B-Instruct on both agent and simulator sides.

ahmedheakl added 2 commits February 14, 2026 23:22

added main code

f75faee

added colbench

a06db7b

ahmedheakl changed the title ~~Agent collab~~ ColBench Feb 24, 2026

cemde added enhancement New feature or request benchmarks regarding the `maseval/benchmark` package labels Mar 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ColBench#37

ColBench#37
ahmedheakl wants to merge 2 commits intoparameterlab:mainfrom
ahmedheakl:agent-collab

ahmedheakl commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ahmedheakl commented Feb 24, 2026

Description

Type of Change

Checklist

Contribution

Documentation

Changelog

Architecture (if applicable)

Additional Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants