feat(config): add v6.1 ruleset and pin RNG seeds from submission ruleset#389
feat(config): add v6.1 ruleset and pin RNG seeds from submission ruleset#389arekay-nv wants to merge 1 commit into
Conversation
Adds the MLPerf Inference v6.1 RoundRuleset (schedule/sample_index seeds from loadgen/mlperf.conf; per-model latency targets unchanged from v5.1) and makes it CURRENT. The registry now registers every known round by version, so both v5.1 and v6.1 are resolvable. When a config carries submission_ref, _resolve_and_validate now pins the runtime RNG seeds from the selected ruleset during construction — before the config is dumped to the report dir, so the persisted config.yaml, RuntimeSettings, and report all reflect the seeds that actually ran. The ruleset value wins over any user-supplied seed (mirrors LoadGen locking core seeds from user.conf). Lenient by design: an unregistered ruleset leaves the config unchanged, so placeholder/non-submission configs are unaffected. Duration/latency overrides and the runtime-path integration (surfacing ruleset latency targets in the report) are deferred to a follow-up. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
There was a problem hiding this comment.
Code Review
This pull request introduces the MLPerf Inference v6.1 ruleset, including its official RNG seeds, and updates the default submission template to reference v6.1. It also implements a mechanism in BenchmarkConfig to automatically override and pin runtime RNG seeds based on the selected submission ruleset, ensuring compliance. Feedback is provided regarding the definition of _v6_1, where copying the mutable benchmark_rulesets dictionary instead of sharing its reference with _v5_1 is recommended to prevent potential side effects.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| version="v6.1", | ||
| scheduler_rng_seed=3936089224930324775, | ||
| sample_index_rng_seed=14276810075590677512, | ||
| benchmark_rulesets=_v5_1.benchmark_rulesets, |
There was a problem hiding this comment.
Sharing the mutable benchmark_rulesets dictionary reference between _v5_1 and _v6_1 can lead to unexpected side effects if the rulesets for one round are modified or extended in the future. It is safer to copy the nested dictionary to ensure round isolation.
| benchmark_rulesets=_v5_1.benchmark_rulesets, | |
| benchmark_rulesets={model: rules.copy() for model, rules in _v5_1.benchmark_rulesets.items()}, |
arekay-nv
left a comment
There was a problem hiding this comment.
Review Council — Multi-AI Code Review
Reviewed by: Codex + Claude | Depth: thorough
Codex: no actionable correctness issues. Claude: 5 posted (1 shared-dict finding on rules.py:282 was dropped as a duplicate of the existing gemini-code-assist comment). See the summary comment below for the tiered breakdown.
| return | ||
|
|
||
| updates: dict[str, int] = {} | ||
| if ruleset.scheduler_rng_seed is not None: |
There was a problem hiding this comment.
[Claude] medium (testing): The two is not None seed branches and the if not updates: return early-exit are never exercised — both registered MLCommons rulesets set both seeds. ruleset_base.py documents None as a valid "unseeded randomization" value, so a partial ruleset (one seed None) or a no-op ruleset (both None) is a supported, reachable state. Add a test that registers a throwaway RoundRuleset(scheduler_rng_seed=None, sample_index_rng_seed=<int>) (and one with both None) and asserts only the set seed is pinned / the config is left unchanged.
| """Only the round seeds rotate; per-model targets are identical.""" | ||
| v5_1 = get_ruleset("mlperf-inference-v5.1") | ||
| v6_1 = get_ruleset("mlperf-inference-v6.1") | ||
| assert v6_1.benchmark_rulesets == v5_1.benchmark_rulesets |
There was a problem hiding this comment.
[Claude] low (testing): v6_1.benchmark_rulesets == v5_1.benchmark_rulesets is identity-trivially true — _v6_1 is built with benchmark_rulesets=_v5_1.benchmark_rulesets (the same dict object), so this passes without actually verifying the per-model targets. (If the shared reference is replaced by a copy — see the existing suggestion on rules.py:282 — this becomes a real structural check.) Consider asserting a concrete expected value, e.g. a specific model's max_tpot_latency_ms, instead.
| # Also register as "mlcommons-current" for convenience | ||
| # Register every known round under its version-specific name | ||
| for ruleset in mlcommons_rounds: | ||
| register_ruleset(f"mlperf-inference-{ruleset.version}", ruleset) |
There was a problem hiding this comment.
[Claude] low (design): register_ruleset does _RULESET_REGISTRY[name] = ruleset with no duplicate guard, so if two ALL_ROUNDS entries ever share a version string the earlier one is silently overwritten — a subtle trap for whoever adds the next round. Consider raising/logging on an already-present key, and/or a test asserting len(ALL_ROUNDS) == len({r.version for r in ALL_ROUNDS}).
| # which the sample-index shuffle already covers). | ||
| _v6_1 = RoundRuleset( | ||
| version="v6.1", | ||
| scheduler_rng_seed=3936089224930324775, |
There was a problem hiding this comment.
[Claude] low (verify before merge): These seeds are compliance-locked, so it's worth confirming them against the tagged v6.1 loadgen/mlperf.conf. They match the repo's own docs/RANDOM_SEEDS.md table (sourced from mlperf.conf @ v6.0.0pre) and the schedule_rng_seed→scheduler / sample_index_rng_seed→dataloader mapping is consistent — but I can't verify from here that the round didn't rotate seeds between v6.0.0pre and v6.1. A maintainer should confirm against the final round config.
| "settings", | ||
| self.settings.model_copy(update={"runtime": new_runtime}), | ||
| ) | ||
| logger.info( |
There was a problem hiding this comment.
[Claude] low (log noise): logger.info("Pinned RNG seeds ...") fires on every construction. with_updates re-runs all validators, and the CLI reconstructs submission configs at least twice (from_yaml_file, then with_updates(timeout=...)/with_updates(datasets=...)), so a single run logs this line multiple times. The override itself is correctly idempotent — consider demoting to debug to cut the noise.
Review Council — Multi-AI Code ReviewReviewed by: Codex (gpt-5.5, xhigh) + Claude | Depth: thorough Codex: "changes consistently register the new MLCommons round and apply the selected ruleset seeds into runtime configuration without introducing an evident functional regression. No actionable correctness issues." Claude: 6 findings; 1 dropped as a duplicate of the existing No critical/high issues. The core logic (frozen-model mutation via 🟡 Should Fix (medium)
🔵 Consider (low)
Commit hygiene: 1 commit, 0 fixups — clean. |
Adds the MLPerf Inference v6.1 RoundRuleset (schedule/sample_index seeds from loadgen/mlperf.conf; per-model latency targets unchanged from v5.1) and makes it CURRENT. The registry now registers every known round by version, so both v5.1 and v6.1 are resolvable.
When a config carries submission_ref, _resolve_and_validate now pins the runtime RNG seeds from the selected ruleset during construction — before the config is dumped to the report dir, so the persisted config.yaml, RuntimeSettings, and report all reflect the seeds that actually ran. The ruleset value wins over any user-supplied seed (mirrors LoadGen locking core seeds from user.conf). Lenient by design: an unregistered ruleset leaves the config unchanged, so placeholder/non-submission configs are unaffected.
Duration/latency overrides and the runtime-path integration (surfacing ruleset latency targets in the report) are deferred to a follow-up.
What does this PR do?
Type of change
Related issues
Testing
Checklist