feat: forecast — context-aware downloads, no workflow logs, partial results on timeout, enriched report#37457
Conversation
…loading - Add forecastDownloadUsageArtifact: downloads only the usage artifact without workflow run log downloads (not needed for AIC computation) - Thread context.Context through loadCachedRunAIC so downloads respect the forecast timeout instead of using context.Background() - Produce partial forecast results on timeout or user cancellation instead of discarding all progress — results collected so far are emitted before the timeout error is returned - Add structured logging throughout loadCachedRunAIC: cache hits, cache misses, download start/end, artifact counts, and interruption - Update workflow YAML: on timeout with partial results, emit a warning and continue (instead of always failing with exit 1) Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
…rtialForecastResults Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
@copilot Check the compiler code and ensure it generates a step in the conclusion job to create the usage artifact. |
|
The compiler already generates usage artifact steps in the conclusion job. In
Both steps run with |
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
… to forecast table Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
…run samples section, remove horizon tip Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done in commits
|
|
@copilot, the forecast should also provide an aggregate of all forecasts so we know the total forecast for the repo. |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR expands the forecast output to include per-run percentile metrics, weekly/monthly projections, and detailed run samples, while optimizing artifact downloads for forecasting and emitting partial results on interruption.
Changes:
- Add per-run AIC samples, P50/P95 per-run metrics, and weekly/monthly projections (incl. Monte Carlo summaries) to forecast results and table output.
- Replace forecast artifact downloading with a usage-artifact-only implementation and improve cancellation/timeout behavior by emitting partial results.
- Update GitHub Action issue template + JS generator/tests to render the new summary columns and a run-samples details section.
Show a summary per file
| File | Description |
|---|---|
| pkg/cli/forecast.go | Adds new forecast result fields (percentiles, weekly/monthly projections, run samples), partial-result emission, and usage-only artifact downloader. |
| pkg/cli/forecast_test.go | Updates mocks and callsites for new loadCachedRunAIC(ctx, ...) signature. |
| actions/setup/md/forecast_issue.md | Switches issue template placeholder from zero-tip to run-samples section. |
| actions/setup/js/create_forecast_issue.cjs | Updates issue body generator to render new columns, totals row, and run-samples details block; keeps legacy fallback. |
| actions/setup/js/create_forecast_issue.test.cjs | Updates existing expectations for new table format and adds tests for run samples + TOTAL row. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 5/5 changed files
- Comments generated: 5
| // Legacy fallback: derive weekly/monthly from the configured-period P50 when new fields are absent. | ||
| const hasNewFields = workflows.some(w => w?.p50_aic_per_run != null || w?.weekly_projected_aic != null); | ||
| const legacyRows = hasNewFields | ||
| ? null | ||
| : workflows.map(workflow => { | ||
| const p50 = workflow?.monte_carlo?.p50_projected_aic ?? workflow?.projected_aic ?? workflow?.monte_carlo?.p50_projected_effective_tokens ?? workflow?.projected_effective_tokens ?? 0; | ||
| return [escapeCell(workflow.workflow_id), workflow.sampled_runs ?? 0, Number(p50)]; | ||
| }); |
| func loadCachedRunAIC(ctx context.Context, runID int64, verbose bool) float64 { | ||
| dir := filepath.Join(defaultLogsOutputDir, fmt.Sprintf("run-%d", runID)) | ||
| summary, ok := loadRunSummary(dir, verbose) | ||
| if ok && summary != nil && summary.TokenUsage != nil && summary.TokenUsage.TotalAIC > 0 { | ||
| forecastRunLog.Printf("AIC cache hit for run %d: aic=%.3f (from run_summary.json)", runID, summary.TokenUsage.TotalAIC) | ||
| return summary.TokenUsage.TotalAIC | ||
| } |
| // Sort partial results by Monte Carlo P50 descending (mirrors the full-results sort). | ||
| sort.Slice(results, func(i, j int) bool { | ||
| pi := results[i].ProjectedAIC | ||
| if mc := results[i].MonteCarlo; mc != nil { | ||
| pi = mc.P50ProjectedAIC | ||
| } | ||
| pj := results[j].ProjectedAIC | ||
| if mc := results[j].MonteCarlo; mc != nil { | ||
| pj = mc.P50ProjectedAIC | ||
| } | ||
| return pi > pj | ||
| }) |
| // Compute P50 and P95 of individual run AIC (per-run percentiles, not period totals). | ||
| sortedAIC := make([]int, len(aicObservations)) | ||
| copy(sortedAIC, aicObservations) | ||
| sort.Ints(sortedAIC) | ||
| result.P50AIC = roundForecastAIC(float64(percentileInt(sortedAIC, 50)) / 1000) | ||
| result.P95AIC = roundForecastAIC(float64(percentileInt(sortedAIC, 95)) / 1000) | ||
|
|
||
| // Compute observed run frequency: runs per calendar day over the history window, | ||
| // scaled to the projection period. | ||
| result.ObservedRunsPerPeriod = float64(n) / float64(config.Days) * float64(periodDays) | ||
| observedRunsPerDay := float64(n) / float64(config.Days) | ||
| result.ObservedRunsPerPeriod = observedRunsPerDay * float64(periodDays) | ||
|
|
||
| // Point estimates for weekly (7-day) and monthly (30-day) projections. | ||
| weeklyRuns := observedRunsPerDay * 7 | ||
| monthlyRuns := observedRunsPerDay * 30 | ||
| result.WeeklyProjectedAIC = roundForecastAIC(weeklyRuns * result.AvgAIC) | ||
| result.MonthlyProjectedAIC = roundForecastAIC(monthlyRuns * result.AvgAIC) |
| for (const wf of workflows) { | ||
| const samples = Array.isArray(wf?.run_samples) ? wf.run_samples : []; | ||
| for (const s of samples) { | ||
| const runID = s?.run_id ?? ""; | ||
| const date = s?.date ?? ""; | ||
| const aic = formatAIC(s?.aic ?? 0); | ||
| lines.push(`| ${escapeCell(wf.workflow_id)} | #${runID} | ${date} | ${aic} |`); | ||
| } | ||
| } |
The forecast command downloaded workflow run logs (unnecessary for AIC computation), ignored the timeout context in artifact downloads, discarded all collected results when a timeout fired, and produced a report with insufficient cost detail.
Changes
Skip workflow log downloads — Introduces
forecastDownloadUsageArtifactas a focused replacement for the general-purposedownloadRunArtifacts. It lists artifacts, downloads only the matchingusageartifact, and returnsErrNoArtifactsimmediately when none is found — no diagnostic log download fallback.Context propagation —
loadCachedRunAICnow acceptscontext.Contextand forwards it through to artifact downloads. Previously usedcontext.Background(), so downloads ran to completion even after the configured--timeoutfired.Partial results on timeout/cancel —
emitPartialForecastResultsemits JSON/table output for all workflows processed before the interrupt, then returns the timeout exit code. Previously all progress was silently discarded.Structured logging —
loadCachedRunAICandforecastDownloadUsageArtifactnow log cache hits, cache misses, download start/completion, artifact counts, and context interruptions viaforecastRunLog.Workflow YAML resilience — On exit 124 (CLI timeout), the step now checks whether
report.jsonhas content. With partial results:::warning::+ continue (cache saves, issue is created with partial data). Without any results:::error::+exit 1as before.Enriched forecast report — The CLI table and GitHub issue now show P50/Run (per-run median AIC), P95/Run (95th-percentile per-run AIC), Weekly (P50) and Monthly (P50) projected totals, and a TOTAL row across all workflows. A detailed run-samples section lists every sampled run with its ID, date, and raw AIC for human review (collapsible
<details>block in the issue body). The "increase the horizon" tip is removed.