Skip to content

ci: add benchmarking via conbench#415

Open
kentkwu wants to merge 1 commit intoapache:mainfrom
kentkwu:gh-203
Open

ci: add benchmarking via conbench#415
kentkwu wants to merge 1 commit intoapache:mainfrom
kentkwu:gh-203

Conversation

@kentkwu
Copy link
Copy Markdown
Contributor

@kentkwu kentkwu commented Mar 17, 2026

What's Changed

This PR adds a github action workflow to run benchmarks and upload them to conbench.

Note: Results are only submitted to conbench on push to main. This requires CONBENCH_EMAIL and CONBENCH_PASS to be configured as GitHub Actions secrets

Test Plan

Ran against a locally running conbench container

bash ci/scripts/bench.sh $(pwd) --json

# Transform verification (without submitting):
python3.11 - <<'EOF'
import sys
sys.path.insert(0, '.')
from ci.scripts.bench_adapt import JSAdapter
adapter = JSAdapter(result_fields_override={"info": {}})
results = adapter._transform_results()
print(f"{len(results)} results parsed")
print("First result tags:", results[0].tags)
print("First result stats:", results[0].stats)
EOF

# Local conbench (Docker):
git clone https://github.com/conbench/conbench
cd conbench
# create buildinfo.json (required by the app)
echo '{"commit":"0000000000000000000000000000000000000000","branch_name":"main","build_time_rfc3339":"2026-03-10T00:00:00Z","build_hostname":"localhost","version_string"
:"local"}' > buildinfo.json
docker compose up --build -d
docker compose port app 5000  # get the port
# register at http://127.0.0.1:<port>/register?key=innocent-registration-key

# Full end-to-end submission:
CONBENCH_URL=http://127.0.0.1:<port> \
CONBENCH_EMAIL=<email> \
CONBENCH_PASSWORD=<password> \
CONBENCH_REF=main \
GITHUB_SERVER_URL=https://github.com \
GITHUB_REPOSITORY=apache/arrow-js \
GITHUB_SHA=<commit-sha> \
python3.11 ci/scripts/bench_adapt.py

Viewed runs in conbench UI
Screenshot 2026-03-16 at 8 24 25 PM
Screenshot 2026-03-16 at 8 24 41 PM

Closes #203

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Conbench-backed benchmarking to the Arrow JS CI pipeline, including scripts to run the existing perf/ benchmarks, adapt results to Conbench’s expected format, and upload results only from main.

Changes:

  • Add a new GitHub Actions workflow to run benchmarks on PRs and upload results to Conbench on push to main.
  • Introduce ci/scripts/bench.sh and ci/scripts/bench_adapt.py to run benchmarks and transform/submit results via benchadapt.
  • Switch the perf runner to tsx and add tsx as a dev dependency.

Reviewed changes

Copilot reviewed 4 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
package.json Switch perf execution to tsx and add tsx dependency.
package-lock.json Lockfile updates for newly added tsx dependency tree.
ci/scripts/bench_adapt.py New Python adapter to transform perf --json output into Conbench/benchadapt results.
ci/scripts/bench.sh New shell wrapper to run benchmarks (optionally emitting JSON to a file).
.gitignore Ignore generated bench_stats.json and Python __pycache__/.
.github/workflows/benchmark.yaml New CI workflow to run benchmarks and submit to Conbench on main.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

CONBENCH_REF: ${{ github.ref_name }}
CONBENCH_MACHINE_INFO_NAME: amd64-ubuntu-24
run: |
python3 -m pip install benchadapt@git+https://github.com/conbench/conbench.git@main#subdirectory=benchadapt/python
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow installs benchadapt directly from the Conbench repo’s main branch. This is not reproducible and can cause CI breakages if upstream changes, and it also increases supply-chain risk compared to pinning. Consider pinning to a specific tag or commit SHA (and optionally using a constraints/requirements file) so benchmark submissions remain stable over time.

Suggested change
python3 -m pip install benchadapt@git+https://github.com/conbench/conbench.git@main#subdirectory=benchadapt/python
python3 -m pip install benchadapt@git+https://github.com/conbench/conbench.git@0123456789abcdef0123456789abcdef01234567#subdirectory=benchadapt/python

Copilot uses AI. Check for mistakes.
Comment thread package.json
"rollup": "4.59.0",
"rxjs": "7.8.2",
"ts-jest": "29.1.4",
"tsx": "^4.19.3",
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tsx is the only devDependency using a caret range (^4.19.3) while the rest of devDependencies are pinned to exact versions. This can cause the installed tsx version to drift over time (even with a lockfile, it makes version intent less clear and increases risk of breakage when regenerating the lockfile). Consider pinning tsx to an exact version to match the repository’s devDependency versioning pattern.

Suggested change
"tsx": "^4.19.3",
"tsx": "4.19.3",

Copilot uses AI. Check for mistakes.
Comment thread ci/scripts/bench_adapt.py
Comment on lines +63 to +67
# bench.sh writes bench_stats.json into the calling directory (repo root)
result_file = str(ARROW_ROOT / "bench_stats.json")
command = ["bash", str(SCRIPTS_PATH / "bench.sh"), str(ARROW_ROOT), "--json"]

def __init__(self, *args, **kwargs) -> None:
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bench.sh writes bench_stats.json to the calling directory (${OLDPWD}), but JSAdapter.result_file always reads from ${ARROW_ROOT}/bench_stats.json. If bench_adapt.py is executed from a directory other than the repo root, the adapter will look in the wrong place and fail to parse results. Consider making bench.sh write to a path derived from source_dir (or pass an explicit output path) so production and local runs are consistent with result_file.

Suggested change
# bench.sh writes bench_stats.json into the calling directory (repo root)
result_file = str(ARROW_ROOT / "bench_stats.json")
command = ["bash", str(SCRIPTS_PATH / "bench.sh"), str(ARROW_ROOT), "--json"]
def __init__(self, *args, **kwargs) -> None:
# bench.sh writes bench_stats.json into the calling directory of this
# process, so resolve the result file from the current working directory
# at runtime instead of assuming the repo root is the caller's cwd.
command = ["bash", str(SCRIPTS_PATH / "bench.sh"), str(ARROW_ROOT), "--json"]
def __init__(self, *args, **kwargs) -> None:
self.result_file = str(Path.cwd() / "bench_stats.json")

Copilot uses AI. Check for mistakes.
Comment thread ci/scripts/bench_adapt.py
Comment on lines +42 to +49
if os.environ.get("CONBENCH_REF") == "main":
# Assume GitHub Actions CI. The environment variable lookups below are
# expected to fail when not running in GitHub Actions.
github_commit_info = {
"repository": f'{os.environ["GITHUB_SERVER_URL"]}/{os.environ["GITHUB_REPOSITORY"]}',
"commit": os.environ["GITHUB_SHA"],
"pr_number": None, # implying default branch
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script decides it’s running in GitHub Actions based on CONBENCH_REF == "main" and then unconditionally requires GITHUB_SERVER_URL, GITHUB_REPOSITORY, and GITHUB_SHA. This can crash in local runs if someone sets CONBENCH_REF=main for labeling/submission purposes outside of GitHub Actions. Consider detecting CI via GITHUB_ACTIONS (or checking for required GITHUB_* vars) instead of using CONBENCH_REF as the switch.

Copilot uses AI. Check for mistakes.
Comment thread ci/scripts/bench_adapt.py
Comment on lines +110 to +111
parsed.run_name = (
f"{parsed.run_reason}: {github_commit_info.get('commit')}"
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_name is always set to include github_commit_info.get('commit'). In the local-dev path github_commit_info does not include a commit, so the run name becomes localdev: None, which is likely not intended and makes Conbench runs harder to read/filter. Consider only appending the commit when it’s present (or use a local identifier like the generated run_id).

Suggested change
parsed.run_name = (
f"{parsed.run_reason}: {github_commit_info.get('commit')}"
commit = github_commit_info.get("commit")
parsed.run_name = (
f"{parsed.run_reason}: {commit}" if commit else parsed.run_reason

Copilot uses AI. Check for mistakes.
Comment thread ci/scripts/bench.sh
Comment on lines +36 to +40
if [[ "$2" = "--json" ]]; then
npm run perf -- --json 2>"${OLDPWD}/bench_stats.json"
else
npm run perf
fi
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When --json is used, results are redirected to ${OLDPWD}/bench_stats.json (the directory where the script was invoked), not to ${source_dir}. Since other tooling (e.g. bench_adapt.py) expects the results file under the repo root, this can break if bench.sh is called from outside the repo root. Consider writing the results to a path based on source_dir (or accepting an explicit output file path) to make the output location deterministic.

Copilot uses AI. Check for mistakes.
@kou
Copy link
Copy Markdown
Member

kou commented Apr 14, 2026

You did this based on the Go's workflow apache/arrow-go#250 , right?

@assignUser Could you set secrets.CONBENCH_EMAIL and secrets.CONBENCH_PASS like you did in apache/arrow-go apache/arrow-go#250 (comment) ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ci: Add benchmarking CI

3 participants