π§ͺ Experiment Campaign: go-logger
Workflow file: .github/workflows/go-logger.md
Selected dimension: max_turns
Triggered by: ab-testing-advisor on 2026-06-04
Background
The go-logger workflow analyzes Go source files in pkg/ and adds meaningful debug logging statements, capping each run at 5 files per PR. Its dense prompt involves multi-step discovery, per-file analysis, editing, build validation, and PR creation β a workflow with substantial sequential depth. Testing max_turns is warranted because the current timeout-minutes: 15 cap may be masking turn exhaustion on complex files, or conversely, generous turn budgets may allow the agent to over-deliberate on simple logging decisions, wasting tokens without quality gains.
Hypothesis
H0 (null): Changing the maximum turn budget does not affect the number of files successfully logged per run, output quality (checklist compliance), or token cost.
H1 (alternative): A tighter turn budget (conservative) reduces token cost by 20β30% with minimal quality loss on straightforward files, while a generous budget (generous) improves checklist compliance on complex files but at higher cost.
Experiment Configuration
Add the following experiments: block to the workflow frontmatter:
experiments:
max_turns_budget:
variants: [conservative, standard, generous]
description: "Tests whether the agent turn budget affects logging quality and token cost. Conservative caps deliberation early; generous allows deeper analysis of complex files."
hypothesis: "H0: no change in files-logged-per-run or checklist compliance across turn budgets. H1: conservative reduces cost 20-30% with <5% quality loss; generous improves complex-file compliance by >10%."
metric: files_successfully_logged_per_run
secondary_metrics: [token_cost_per_file, checklist_compliance_rate, run_duration_ms]
guardrail_metrics:
- name: empty_pr_rate
direction: min
threshold: 0.10
- name: build_failure_rate
direction: min
threshold: 0.05
min_samples: 15
weight: [33, 34, 33]
start_date: "2026-06-04"
analysis_type: mann_whitney
tags: [cost-efficiency, agent-turns, go-logging]
notify:
discussion: 0
issue: 0
Variant descriptions:
conservative: Sets max_turns: 20 β tight budget forcing the agent to commit to logging decisions quickly without excessive deliberation. Expected to reduce token usage by 20β30% on simple files.
standard: Sets max_turns: 35 β baseline matching current observed behavior given the 15-minute timeout and typical file complexity.
generous: Sets max_turns: 55 β expanded budget allowing deeper analysis of complex multi-function files, expected to improve checklist compliance but at higher token cost.
Workflow Changes Required
The max_turns field must be added to the agent: configuration block in the compiled lock file. The experiment controls this via conditional blocks in the workflow markdown.
Before (no max_turns configured, runtime default applies):
engine: claude
timeout-minutes: 15
After (add to frontmatter, inside the agent prompt region):
engine: claude
timeout-minutes: 15
{{#if experiments.max_turns_budget == "conservative" }}
max_turns: 20
{{else if experiments.max_turns_budget == "generous" }}
max_turns: 55
{{else}}
max_turns: 35
{{/if}}
View full frontmatter diff
engine: claude
name: Go Logger Enhancement
timeout-minutes: 15
+ {{#if experiments.max_turns_budget == "conservative" }}
+ max_turns: 20
+ {{else if experiments.max_turns_budget == "generous" }}
+ max_turns: 55
+ {{else}}
+ max_turns: 35
+ {{/if}}
tools:
Success Metrics
| Metric |
Type |
Target |
| files_successfully_logged_per_run |
Primary |
β₯ 4 of 5 files across all variants |
| token_cost_per_file |
Secondary |
conservative β€ 80% of standard cost |
| checklist_compliance_rate |
Secondary |
generous β₯ 105% of standard |
| run_duration_ms |
Secondary |
Signal only β no target |
| empty_pr_rate |
Guardrail |
Must stay < 10% per variant |
| build_failure_rate |
Guardrail |
Must stay < 5% per variant |
Statistical Design
- Variants:
conservative, standard, generous
- Assignment: Round-robin via
gh-aw experiments runtime (cache-based)
- Minimum runs per variant: 15 (45 total runs across all variants)
- Expected experiment duration: ~45 days (daily schedule, 1 run/day, 3-way split)
- Analysis approach: Mann-Whitney U test (non-parametric; run counts and durations are not normally distributed)
Implementation Steps
Infrastructure Note
β
The analysis_type, tags, and notify experiment schema fields are fully implemented end-to-end in both pkg/workflow/compiler_experiments.go and actions/setup/js/pick_experiment.cjs. No infrastructure sub-issue is required.
References
Generated by π§ͺ Daily A/B Testing Advisor Β· sonnet46 1.5M Β· β·
π§ͺ Experiment Campaign: go-logger
Workflow file:
.github/workflows/go-logger.mdSelected dimension:
max_turnsTriggered by:
ab-testing-advisoron 2026-06-04Background
The
go-loggerworkflow analyzes Go source files inpkg/and adds meaningful debug logging statements, capping each run at 5 files per PR. Its dense prompt involves multi-step discovery, per-file analysis, editing, build validation, and PR creation β a workflow with substantial sequential depth. Testingmax_turnsis warranted because the currenttimeout-minutes: 15cap may be masking turn exhaustion on complex files, or conversely, generous turn budgets may allow the agent to over-deliberate on simple logging decisions, wasting tokens without quality gains.Hypothesis
H0 (null): Changing the maximum turn budget does not affect the number of files successfully logged per run, output quality (checklist compliance), or token cost.
H1 (alternative): A tighter turn budget (
conservative) reduces token cost by 20β30% with minimal quality loss on straightforward files, while a generous budget (generous) improves checklist compliance on complex files but at higher cost.Experiment Configuration
Add the following
experiments:block to the workflow frontmatter:Variant descriptions:
conservative: Setsmax_turns: 20β tight budget forcing the agent to commit to logging decisions quickly without excessive deliberation. Expected to reduce token usage by 20β30% on simple files.standard: Setsmax_turns: 35β baseline matching current observed behavior given the 15-minute timeout and typical file complexity.generous: Setsmax_turns: 55β expanded budget allowing deeper analysis of complex multi-function files, expected to improve checklist compliance but at higher token cost.Workflow Changes Required
The
max_turnsfield must be added to theagent:configuration block in the compiled lock file. The experiment controls this via conditional blocks in the workflow markdown.Before (no
max_turnsconfigured, runtime default applies):After (add to frontmatter, inside the agent prompt region):
View full frontmatter diff
Success Metrics
conservativeβ€ 80% ofstandardcostgenerousβ₯ 105% ofstandardStatistical Design
conservative,standard,generousgh-awexperiments runtime (cache-based)Implementation Steps
experiments:section to frontmatter (YAML above){{#if experiments.max_turns_budget == "<variant>" }}conditional blocks aroundmax_turns:in the workflowgh aw compile go-loggerto regenerate the lock file/tmp/gh-aw/agent/experiments/state.jsonmax_turnsto winning value and remove experiment blockInfrastructure Note
References
.github/workflows/go-logger.md