feat(config): add v6.1 ruleset and pin RNG seeds from submission ruleset by arekay-nv · Pull Request #389 · mlcommons/endpoints

arekay-nv · 2026-07-02T19:59:19Z

Adds the MLPerf Inference v6.1 RoundRuleset (schedule/sample_index seeds from loadgen/mlperf.conf; per-model latency targets unchanged from v5.1) and makes it CURRENT. The registry now registers every known round by version, so both v5.1 and v6.1 are resolvable.

When a config carries submission_ref, _resolve_and_validate now pins the runtime RNG seeds from the selected ruleset during construction — before the config is dumped to the report dir, so the persisted config.yaml, RuntimeSettings, and report all reflect the seeds that actually ran. The ruleset value wins over any user-supplied seed (mirrors LoadGen locking core seeds from user.conf). Lenient by design: an unregistered ruleset leaves the config unchanged, so placeholder/non-submission configs are unaffected.

Duration/latency overrides and the runtime-path integration (surfacing ruleset latency targets in the report) are deferred to a follow-up.

What does this PR do?

Type of change

Bug fix
New feature
Documentation update
Refactor/cleanup

Related issues

Testing

Tests added/updated
All tests pass locally
Manual testing completed

Checklist

Code follows project style
Pre-commit hooks pass
Documentation updated (if needed)

Adds the MLPerf Inference v6.1 RoundRuleset (schedule/sample_index seeds from loadgen/mlperf.conf; per-model latency targets unchanged from v5.1) and makes it CURRENT. The registry now registers every known round by version, so both v5.1 and v6.1 are resolvable. When a config carries submission_ref, _resolve_and_validate now pins the runtime RNG seeds from the selected ruleset during construction — before the config is dumped to the report dir, so the persisted config.yaml, RuntimeSettings, and report all reflect the seeds that actually ran. The ruleset value wins over any user-supplied seed (mirrors LoadGen locking core seeds from user.conf). Lenient by design: an unregistered ruleset leaves the config unchanged, so placeholder/non-submission configs are unaffected. Duration/latency overrides and the runtime-path integration (surfacing ruleset latency targets in the report) are deferred to a follow-up. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-07-02T19:59:28Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

gemini-code-assist

Code Review

This pull request introduces the MLPerf Inference v6.1 ruleset, including its official RNG seeds, and updates the default submission template to reference v6.1. It also implements a mechanism in BenchmarkConfig to automatically override and pin runtime RNG seeds based on the selected submission ruleset, ensuring compliance. Feedback is provided regarding the definition of _v6_1, where copying the mutable benchmark_rulesets dictionary instead of sharing its reference with _v5_1 is recommended to prevent potential side effects.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-07-02T20:01:11Z

+    version="v6.1",
+    scheduler_rng_seed=3936089224930324775,
+    sample_index_rng_seed=14276810075590677512,
+    benchmark_rulesets=_v5_1.benchmark_rulesets,


Sharing the mutable benchmark_rulesets dictionary reference between _v5_1 and _v6_1 can lead to unexpected side effects if the rulesets for one round are modified or extended in the future. It is safer to copy the nested dictionary to ensure round isolation.

Suggested change

benchmark_rulesets=_v5_1.benchmark_rulesets,

benchmark_rulesets={model: rules.copy() for model, rules in _v5_1.benchmark_rulesets.items()},

arekay-nv

Review Council — Multi-AI Code Review

Reviewed by: Codex + Claude | Depth: thorough

Codex: no actionable correctness issues. Claude: 5 posted (1 shared-dict finding on rules.py:282 was dropped as a duplicate of the existing gemini-code-assist comment). See the summary comment below for the tiered breakdown.

arekay-nv · 2026-07-02T21:31:07Z

+            return
+
+        updates: dict[str, int] = {}
+        if ruleset.scheduler_rng_seed is not None:


[Claude] medium (testing): The two is not None seed branches and the if not updates: return early-exit are never exercised — both registered MLCommons rulesets set both seeds. ruleset_base.py documents None as a valid "unseeded randomization" value, so a partial ruleset (one seed None) or a no-op ruleset (both None) is a supported, reachable state. Add a test that registers a throwaway RoundRuleset(scheduler_rng_seed=None, sample_index_rng_seed=<int>) (and one with both None) and asserts only the set seed is pinned / the config is left unchanged.

arekay-nv · 2026-07-02T21:31:07Z

+    """Only the round seeds rotate; per-model targets are identical."""
+    v5_1 = get_ruleset("mlperf-inference-v5.1")
+    v6_1 = get_ruleset("mlperf-inference-v6.1")
+    assert v6_1.benchmark_rulesets == v5_1.benchmark_rulesets


[Claude] low (testing): v6_1.benchmark_rulesets == v5_1.benchmark_rulesets is identity-trivially true — _v6_1 is built with benchmark_rulesets=_v5_1.benchmark_rulesets (the same dict object), so this passes without actually verifying the per-model targets. (If the shared reference is replaced by a copy — see the existing suggestion on rules.py:282 — this becomes a real structural check.) Consider asserting a concrete expected value, e.g. a specific model's max_tpot_latency_ms, instead.

arekay-nv · 2026-07-02T21:31:07Z

-    # Also register as "mlcommons-current" for convenience
+    # Register every known round under its version-specific name
+    for ruleset in mlcommons_rounds:
+        register_ruleset(f"mlperf-inference-{ruleset.version}", ruleset)


[Claude] low (design): register_ruleset does _RULESET_REGISTRY[name] = ruleset with no duplicate guard, so if two ALL_ROUNDS entries ever share a version string the earlier one is silently overwritten — a subtle trap for whoever adds the next round. Consider raising/logging on an already-present key, and/or a test asserting len(ALL_ROUNDS) == len({r.version for r in ALL_ROUNDS}).

arekay-nv · 2026-07-02T21:31:07Z

+# which the sample-index shuffle already covers).
+_v6_1 = RoundRuleset(
+    version="v6.1",
+    scheduler_rng_seed=3936089224930324775,


[Claude] low (verify before merge): These seeds are compliance-locked, so it's worth confirming them against the tagged v6.1 loadgen/mlperf.conf. They match the repo's own docs/RANDOM_SEEDS.md table (sourced from mlperf.conf @ v6.0.0pre) and the schedule_rng_seed→scheduler / sample_index_rng_seed→dataloader mapping is consistent — but I can't verify from here that the round didn't rotate seeds between v6.0.0pre and v6.1. A maintainer should confirm against the final round config.

arekay-nv · 2026-07-02T21:31:07Z

+            "settings",
+            self.settings.model_copy(update={"runtime": new_runtime}),
+        )
+        logger.info(


[Claude] low (log noise): logger.info("Pinned RNG seeds ...") fires on every construction. with_updates re-runs all validators, and the CLI reconstructs submission configs at least twice (from_yaml_file, then with_updates(timeout=...)/with_updates(datasets=...)), so a single run logs this line multiple times. The override itself is correctly idempotent — consider demoting to debug to cut the noise.

arekay-nv · 2026-07-02T21:31:27Z

Review Council — Multi-AI Code Review

Reviewed by: Codex (gpt-5.5, xhigh) + Claude | Depth: thorough

Codex: "changes consistently register the new MLCommons round and apply the selected ruleset seeds into runtime configuration without introducing an evident functional regression. No actionable correctness issues."

Claude: 6 findings; 1 dropped as a duplicate of the existing gemini-code-assist comment on rules.py:282 (shared mutable-dict reference). 5 posted inline.

No critical/high issues. The core logic (frozen-model mutation via model_copy + object.__setattr__, lenient resolution, ruleset-wins seed lock, before-dump ordering) reviewed clean under both reviewers.

🟡 Should Fix (medium)

#	File	Line	Category	Reviewer	Summary
1	`config/schema.py`	1055	testing	Claude	`is not None` seed branches + `if not updates` early-exit are untested — a `None`-seed ruleset is a documented, supported state

🔵 Consider (low)

#	File	Line	Category	Reviewer	Summary
2	`.../mlcommons/test_rules.py`	147	testing	Claude	`benchmark_rulesets ==` assertion is identity-trivially true (same dict object) — verifies nothing
3	`config/ruleset_registry.py`	85	design	Claude	No duplicate-`version` guard — a future round sharing a version string would be silently overwritten
4	`.../mlcommons/rules.py`	280	docs/verify	Claude	Compliance-locked seeds match `docs/RANDOM_SEEDS.md` (@ `v6.0.0pre`) but should be confirmed against the tagged v6.1 `mlperf.conf`
5	`config/schema.py`	1068	error-handling	Claude	`logger.info("Pinned RNG seeds …")` fires multiple times per run (`with_updates` re-validates) — consider `debug`

Related: the existing gemini-code-assist comment on rules.py:282 (copy the shared benchmark_rulesets dict) would also resolve finding #2 if applied.

Commit hygiene: 1 commit, 0 fixups — clean.

arekay-nv requested a review from a team July 2, 2026 19:59

github-actions Bot requested a review from nvzhihanj July 2, 2026 19:59

gemini-code-assist Bot reviewed Jul 2, 2026

View reviewed changes

arekay-nv commented Jul 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(config): add v6.1 ruleset and pin RNG seeds from submission ruleset#389

feat(config): add v6.1 ruleset and pin RNG seeds from submission ruleset#389
arekay-nv wants to merge 1 commit into
mainfrom
feat/random-seeds

arekay-nv commented Jul 2, 2026

Uh oh!

github-actions Bot commented Jul 2, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jul 2, 2026

Uh oh!

arekay-nv left a comment

Uh oh!

arekay-nv Jul 2, 2026

Uh oh!

arekay-nv Jul 2, 2026

Uh oh!

arekay-nv Jul 2, 2026

Uh oh!

arekay-nv Jul 2, 2026

Uh oh!

arekay-nv Jul 2, 2026

Uh oh!

arekay-nv commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	benchmark_rulesets=_v5_1.benchmark_rulesets,
	benchmark_rulesets={model: rules.copy() for model, rules in _v5_1.benchmark_rulesets.items()},

Uh oh!

Conversation

arekay-nv commented Jul 2, 2026

What does this PR do?

Type of change

Related issues

Testing

Checklist

Uh oh!

github-actions Bot commented Jul 2, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

arekay-nv left a comment

Choose a reason for hiding this comment

Review Council — Multi-AI Code Review

Uh oh!

arekay-nv Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

arekay-nv Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

arekay-nv Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

arekay-nv Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

arekay-nv Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

arekay-nv commented Jul 2, 2026

Review Council — Multi-AI Code Review

🟡 Should Fix (medium)

🔵 Consider (low)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant