Skip to content

feat: add replay runner and scheduler with CLI integration#234

Open
JasonXuDeveloper wants to merge 5 commits intoAzure:unstable-replayfrom
JasonXuDeveloper:replay/pr5
Open

feat: add replay runner and scheduler with CLI integration#234
JasonXuDeveloper wants to merge 5 commits intoAzure:unstable-replayfrom
JasonXuDeveloper:replay/pr5

Conversation

@JasonXuDeveloper
Copy link
Contributor

Summary

  • Runner: Worker pool with time-bucketed scheduling, per-worker metrics to avoid lock contention, pool-first WATCH connection assignment with overflow support
  • Scheduler: Orchestrator for both local multi-runner and distributed single-runner modes with configuration validation and warnings
  • CLI: kperf replay run command for local replay execution, kperf runner replay subcommand for distributed runner pods

Test plan

  • go build ./... passes
  • go vet ./... passes
  • go test ./replay/... passes (runner, schedule tests)
  • go test ./cmd/kperf/commands/replay/... passes

Part 5 of 6 in the replay feature PR stack. Depends on PR #233.

- Fix "traget" → "target" typo in LoadProfileSpec comment
- Fix "letencies" → "latencies" typo in runner CLI flag description
- Add empty specs validation in loadConfig to prevent index-out-of-range
  panic when config file has no specs
- Preserve nodeAffinity from runnergroup spec when CLI --affinity flag
  is not provided

Signed-off-by: JasonXuDeveloper - 傑 <jason@xgamedev.net>
Add foundation types for the timeseries replay system:
- ReplayRequest, ReplayProfile, ReplayProfileSpec types with validation
- IsReplayMode() method on RunnerGroupSpec for detecting replay configs
- ReplayProfileSpec field in RunnerGroupSpec for distributed mode
- Sample replay profile and runner group config test data

Signed-off-by: JasonXuDeveloper - 傑 <jason@xgamedev.net>
Extract shared report-building logic into metrics.BuildPercentileLatenciesReport()
to avoid duplication between runner and replay report builders. Refactor
buildRunnerMetricReport() to use the shared utility.

Signed-off-by: JasonXuDeveloper - 傑 <jason@xgamedev.net>
Core replay data processing (no execution engine yet):
- Loader: YAML/gzip profile loading from file or URL
- Partition: request distribution across runners using object-key
  consistent hashing to preserve per-object ordering
- Builder: HTTP request construction with URL building, verb mapping,
  and URL masking for metrics aggregation

Signed-off-by: JasonXuDeveloper - 傑 <jason@xgamedev.net>
Execution engine for the timeseries replay system:
- Runner: worker pool with time-bucketed scheduling, per-worker metrics,
  pool-first WATCH connection assignment with overflow support
- Scheduler: orchestrator for both local multi-runner and distributed
  single-runner modes with configuration validation and warnings
- CLI: 'kperf replay run' command for local replay execution and
  'kperf runner replay' subcommand for distributed runner pods

Signed-off-by: JasonXuDeveloper - 傑 <jason@xgamedev.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant