Conversation
There was a problem hiding this comment.
Pull request overview
Adds Conbench-backed benchmarking to the Arrow JS CI pipeline, including scripts to run the existing perf/ benchmarks, adapt results to Conbench’s expected format, and upload results only from main.
Changes:
- Add a new GitHub Actions workflow to run benchmarks on PRs and upload results to Conbench on
pushtomain. - Introduce
ci/scripts/bench.shandci/scripts/bench_adapt.pyto run benchmarks and transform/submit results viabenchadapt. - Switch the
perfrunner totsxand addtsxas a dev dependency.
Reviewed changes
Copilot reviewed 4 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| package.json | Switch perf execution to tsx and add tsx dependency. |
| package-lock.json | Lockfile updates for newly added tsx dependency tree. |
| ci/scripts/bench_adapt.py | New Python adapter to transform perf --json output into Conbench/benchadapt results. |
| ci/scripts/bench.sh | New shell wrapper to run benchmarks (optionally emitting JSON to a file). |
| .gitignore | Ignore generated bench_stats.json and Python __pycache__/. |
| .github/workflows/benchmark.yaml | New CI workflow to run benchmarks and submit to Conbench on main. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| CONBENCH_REF: ${{ github.ref_name }} | ||
| CONBENCH_MACHINE_INFO_NAME: amd64-ubuntu-24 | ||
| run: | | ||
| python3 -m pip install benchadapt@git+https://github.com/conbench/conbench.git@main#subdirectory=benchadapt/python |
There was a problem hiding this comment.
The workflow installs benchadapt directly from the Conbench repo’s main branch. This is not reproducible and can cause CI breakages if upstream changes, and it also increases supply-chain risk compared to pinning. Consider pinning to a specific tag or commit SHA (and optionally using a constraints/requirements file) so benchmark submissions remain stable over time.
| python3 -m pip install benchadapt@git+https://github.com/conbench/conbench.git@main#subdirectory=benchadapt/python | |
| python3 -m pip install benchadapt@git+https://github.com/conbench/conbench.git@0123456789abcdef0123456789abcdef01234567#subdirectory=benchadapt/python |
| "rollup": "4.59.0", | ||
| "rxjs": "7.8.2", | ||
| "ts-jest": "29.1.4", | ||
| "tsx": "^4.19.3", |
There was a problem hiding this comment.
tsx is the only devDependency using a caret range (^4.19.3) while the rest of devDependencies are pinned to exact versions. This can cause the installed tsx version to drift over time (even with a lockfile, it makes version intent less clear and increases risk of breakage when regenerating the lockfile). Consider pinning tsx to an exact version to match the repository’s devDependency versioning pattern.
| "tsx": "^4.19.3", | |
| "tsx": "4.19.3", |
| # bench.sh writes bench_stats.json into the calling directory (repo root) | ||
| result_file = str(ARROW_ROOT / "bench_stats.json") | ||
| command = ["bash", str(SCRIPTS_PATH / "bench.sh"), str(ARROW_ROOT), "--json"] | ||
|
|
||
| def __init__(self, *args, **kwargs) -> None: |
There was a problem hiding this comment.
bench.sh writes bench_stats.json to the calling directory (${OLDPWD}), but JSAdapter.result_file always reads from ${ARROW_ROOT}/bench_stats.json. If bench_adapt.py is executed from a directory other than the repo root, the adapter will look in the wrong place and fail to parse results. Consider making bench.sh write to a path derived from source_dir (or pass an explicit output path) so production and local runs are consistent with result_file.
| # bench.sh writes bench_stats.json into the calling directory (repo root) | |
| result_file = str(ARROW_ROOT / "bench_stats.json") | |
| command = ["bash", str(SCRIPTS_PATH / "bench.sh"), str(ARROW_ROOT), "--json"] | |
| def __init__(self, *args, **kwargs) -> None: | |
| # bench.sh writes bench_stats.json into the calling directory of this | |
| # process, so resolve the result file from the current working directory | |
| # at runtime instead of assuming the repo root is the caller's cwd. | |
| command = ["bash", str(SCRIPTS_PATH / "bench.sh"), str(ARROW_ROOT), "--json"] | |
| def __init__(self, *args, **kwargs) -> None: | |
| self.result_file = str(Path.cwd() / "bench_stats.json") |
| if os.environ.get("CONBENCH_REF") == "main": | ||
| # Assume GitHub Actions CI. The environment variable lookups below are | ||
| # expected to fail when not running in GitHub Actions. | ||
| github_commit_info = { | ||
| "repository": f'{os.environ["GITHUB_SERVER_URL"]}/{os.environ["GITHUB_REPOSITORY"]}', | ||
| "commit": os.environ["GITHUB_SHA"], | ||
| "pr_number": None, # implying default branch | ||
| } |
There was a problem hiding this comment.
The script decides it’s running in GitHub Actions based on CONBENCH_REF == "main" and then unconditionally requires GITHUB_SERVER_URL, GITHUB_REPOSITORY, and GITHUB_SHA. This can crash in local runs if someone sets CONBENCH_REF=main for labeling/submission purposes outside of GitHub Actions. Consider detecting CI via GITHUB_ACTIONS (or checking for required GITHUB_* vars) instead of using CONBENCH_REF as the switch.
| parsed.run_name = ( | ||
| f"{parsed.run_reason}: {github_commit_info.get('commit')}" |
There was a problem hiding this comment.
run_name is always set to include github_commit_info.get('commit'). In the local-dev path github_commit_info does not include a commit, so the run name becomes localdev: None, which is likely not intended and makes Conbench runs harder to read/filter. Consider only appending the commit when it’s present (or use a local identifier like the generated run_id).
| parsed.run_name = ( | |
| f"{parsed.run_reason}: {github_commit_info.get('commit')}" | |
| commit = github_commit_info.get("commit") | |
| parsed.run_name = ( | |
| f"{parsed.run_reason}: {commit}" if commit else parsed.run_reason |
| if [[ "$2" = "--json" ]]; then | ||
| npm run perf -- --json 2>"${OLDPWD}/bench_stats.json" | ||
| else | ||
| npm run perf | ||
| fi |
There was a problem hiding this comment.
When --json is used, results are redirected to ${OLDPWD}/bench_stats.json (the directory where the script was invoked), not to ${source_dir}. Since other tooling (e.g. bench_adapt.py) expects the results file under the repo root, this can break if bench.sh is called from outside the repo root. Consider writing the results to a path based on source_dir (or accepting an explicit output file path) to make the output location deterministic.
|
You did this based on the Go's workflow apache/arrow-go#250 , right? @assignUser Could you set |
What's Changed
This PR adds a github action workflow to run benchmarks and upload them to conbench.
Note: Results are only submitted to conbench on push to main. This requires
CONBENCH_EMAILandCONBENCH_PASSto be configured as GitHub Actions secretsTest Plan
Ran against a locally running conbench container
Viewed runs in conbench UI


Closes #203