Skip to content

feat: forecast — context-aware downloads, no workflow logs, partial results on timeout, enriched report#37457

Merged
pelikhan merged 5 commits into
mainfrom
copilot/improve-forecast-command
Jun 7, 2026
Merged

feat: forecast — context-aware downloads, no workflow logs, partial results on timeout, enriched report#37457
pelikhan merged 5 commits into
mainfrom
copilot/improve-forecast-command

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Jun 7, 2026

The forecast command downloaded workflow run logs (unnecessary for AIC computation), ignored the timeout context in artifact downloads, discarded all collected results when a timeout fired, and produced a report with insufficient cost detail.

Changes

  • Skip workflow log downloads — Introduces forecastDownloadUsageArtifact as a focused replacement for the general-purpose downloadRunArtifacts. It lists artifacts, downloads only the matching usage artifact, and returns ErrNoArtifacts immediately when none is found — no diagnostic log download fallback.

  • Context propagationloadCachedRunAIC now accepts context.Context and forwards it through to artifact downloads. Previously used context.Background(), so downloads ran to completion even after the configured --timeout fired.

  • Partial results on timeout/cancelemitPartialForecastResults emits JSON/table output for all workflows processed before the interrupt, then returns the timeout exit code. Previously all progress was silently discarded.

  • Structured loggingloadCachedRunAIC and forecastDownloadUsageArtifact now log cache hits, cache misses, download start/completion, artifact counts, and context interruptions via forecastRunLog.

  • Workflow YAML resilience — On exit 124 (CLI timeout), the step now checks whether report.json has content. With partial results: ::warning:: + continue (cache saves, issue is created with partial data). Without any results: ::error:: + exit 1 as before.

  • Enriched forecast report — The CLI table and GitHub issue now show P50/Run (per-run median AIC), P95/Run (95th-percentile per-run AIC), Weekly (P50) and Monthly (P50) projected totals, and a TOTAL row across all workflows. A detailed run-samples section lists every sampled run with its ID, date, and raw AIC for human review (collapsible <details> block in the issue body). The "increase the horizon" tip is removed.

Copilot AI and others added 2 commits June 7, 2026 00:58
…loading

- Add forecastDownloadUsageArtifact: downloads only the usage artifact
  without workflow run log downloads (not needed for AIC computation)
- Thread context.Context through loadCachedRunAIC so downloads respect
  the forecast timeout instead of using context.Background()
- Produce partial forecast results on timeout or user cancellation instead
  of discarding all progress — results collected so far are emitted before
  the timeout error is returned
- Add structured logging throughout loadCachedRunAIC: cache hits, cache
  misses, download start/end, artifact counts, and interruption
- Update workflow YAML: on timeout with partial results, emit a warning
  and continue (instead of always failing with exit 1)

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
…rtialForecastResults

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title feat: improve forecast command resilience, logging, and artifact downloading feat: forecast — context-aware downloads, no workflow logs, partial results on timeout Jun 7, 2026
Copilot AI requested a review from pelikhan June 7, 2026 01:04
@pelikhan
Copy link
Copy Markdown
Collaborator

pelikhan commented Jun 7, 2026

@copilot Check the compiler code and ensure it generates a step in the conclusion job to create the usage artifact.

@pelikhan
Copy link
Copy Markdown
Collaborator

pelikhan commented Jun 7, 2026

@copilot

  • the forecast report must provide PFC and P95 for AI credit costs per run, as well as weekly and monthly totals
  • provide a detailed section listing all reports used in the computation. It should also include raw AI credit consumption usage numbers for human review.
  • remove the tip about increasing the horizon.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Jun 7, 2026

@copilot Check the compiler code and ensure it generates a step in the conclusion job to create the usage artifact.

The compiler already generates usage artifact steps in the conclusion job. In notify_comment.go, buildConclusionJob calls buildUsageArtifactUploadSteps (line 72), which emits two steps:

  1. "Collect usage artifact files" — copies aw_info.json and token_usage.jsonl files (from both agent and detection sandboxes) into /tmp/gh-aw/usage/.
  2. "Upload usage artifact" — uploads the collected files as the usage (or <prefix>usage in workflow_call context) artifact using actions/upload-artifact.

Both steps run with if: always() and continue-on-error: true. The TestConclusionJobIncludesUsageArtifactSteps test (and related prefix tests) verify this behaviour and all pass.

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI and others added 2 commits June 7, 2026 01:44
… to forecast table

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
…run samples section, remove horizon tip

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Jun 7, 2026

@copilot

  • the forecast report must provide PFC and P95 for AI credit costs per run, as well as weekly and monthly totals...

Done in commits e4e5aa5 and 1bd0d5c. Changes:

  • P50/Run & P95/Run — per-run median and 95th-percentile AIC computed from sorted sampled-run observations, shown in both the CLI table and the GitHub issue.
  • Weekly & Monthly totals — always computed (regardless of --period) via separate Monte Carlo simulations. CLI table and issue body both show Weekly (P50) and Monthly (P50) columns, with a TOTAL row when multiple workflows are present.
  • Detailed run samples section — CLI prints a per-workflow breakdown table (Run ID, Date, AIC). Issue body renders the same data as a collapsible <details> block.
  • Horizon tip removedzeroProjectedTip and its {zero_projected_tip} template placeholder are gone; the forecast_issue.md template now uses {run_samples_section} in that slot.

Copilot AI changed the title feat: forecast — context-aware downloads, no workflow logs, partial results on timeout feat: forecast — context-aware downloads, no workflow logs, partial results on timeout, enriched report Jun 7, 2026
@pelikhan
Copy link
Copy Markdown
Collaborator

pelikhan commented Jun 7, 2026

@copilot, the forecast should also provide an aggregate of all forecasts so we know the total forecast for the repo.

@pelikhan pelikhan marked this pull request as ready for review June 7, 2026 02:06
Copilot AI review requested due to automatic review settings June 7, 2026 02:06
@pelikhan pelikhan merged commit 881b5db into main Jun 7, 2026
14 of 15 checks passed
@pelikhan pelikhan deleted the copilot/improve-forecast-command branch June 7, 2026 02:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR expands the forecast output to include per-run percentile metrics, weekly/monthly projections, and detailed run samples, while optimizing artifact downloads for forecasting and emitting partial results on interruption.

Changes:

  • Add per-run AIC samples, P50/P95 per-run metrics, and weekly/monthly projections (incl. Monte Carlo summaries) to forecast results and table output.
  • Replace forecast artifact downloading with a usage-artifact-only implementation and improve cancellation/timeout behavior by emitting partial results.
  • Update GitHub Action issue template + JS generator/tests to render the new summary columns and a run-samples details section.
Show a summary per file
File Description
pkg/cli/forecast.go Adds new forecast result fields (percentiles, weekly/monthly projections, run samples), partial-result emission, and usage-only artifact downloader.
pkg/cli/forecast_test.go Updates mocks and callsites for new loadCachedRunAIC(ctx, ...) signature.
actions/setup/md/forecast_issue.md Switches issue template placeholder from zero-tip to run-samples section.
actions/setup/js/create_forecast_issue.cjs Updates issue body generator to render new columns, totals row, and run-samples details block; keeps legacy fallback.
actions/setup/js/create_forecast_issue.test.cjs Updates existing expectations for new table format and adds tests for run samples + TOTAL row.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 5/5 changed files
  • Comments generated: 5

Comment on lines +50 to +57
// Legacy fallback: derive weekly/monthly from the configured-period P50 when new fields are absent.
const hasNewFields = workflows.some(w => w?.p50_aic_per_run != null || w?.weekly_projected_aic != null);
const legacyRows = hasNewFields
? null
: workflows.map(workflow => {
const p50 = workflow?.monte_carlo?.p50_projected_aic ?? workflow?.projected_aic ?? workflow?.monte_carlo?.p50_projected_effective_tokens ?? workflow?.projected_effective_tokens ?? 0;
return [escapeCell(workflow.workflow_id), workflow.sampled_runs ?? 0, Number(p50)];
});
Comment thread pkg/cli/forecast.go
Comment on lines +820 to 826
func loadCachedRunAIC(ctx context.Context, runID int64, verbose bool) float64 {
dir := filepath.Join(defaultLogsOutputDir, fmt.Sprintf("run-%d", runID))
summary, ok := loadRunSummary(dir, verbose)
if ok && summary != nil && summary.TokenUsage != nil && summary.TokenUsage.TotalAIC > 0 {
forecastRunLog.Printf("AIC cache hit for run %d: aic=%.3f (from run_summary.json)", runID, summary.TokenUsage.TotalAIC)
return summary.TokenUsage.TotalAIC
}
Comment thread pkg/cli/forecast.go
Comment on lines +950 to +961
// Sort partial results by Monte Carlo P50 descending (mirrors the full-results sort).
sort.Slice(results, func(i, j int) bool {
pi := results[i].ProjectedAIC
if mc := results[i].MonteCarlo; mc != nil {
pi = mc.P50ProjectedAIC
}
pj := results[j].ProjectedAIC
if mc := results[j].MonteCarlo; mc != nil {
pj = mc.P50ProjectedAIC
}
return pi > pj
})
Comment thread pkg/cli/forecast.go
Comment on lines +601 to +617
// Compute P50 and P95 of individual run AIC (per-run percentiles, not period totals).
sortedAIC := make([]int, len(aicObservations))
copy(sortedAIC, aicObservations)
sort.Ints(sortedAIC)
result.P50AIC = roundForecastAIC(float64(percentileInt(sortedAIC, 50)) / 1000)
result.P95AIC = roundForecastAIC(float64(percentileInt(sortedAIC, 95)) / 1000)

// Compute observed run frequency: runs per calendar day over the history window,
// scaled to the projection period.
result.ObservedRunsPerPeriod = float64(n) / float64(config.Days) * float64(periodDays)
observedRunsPerDay := float64(n) / float64(config.Days)
result.ObservedRunsPerPeriod = observedRunsPerDay * float64(periodDays)

// Point estimates for weekly (7-day) and monthly (30-day) projections.
weeklyRuns := observedRunsPerDay * 7
monthlyRuns := observedRunsPerDay * 30
result.WeeklyProjectedAIC = roundForecastAIC(weeklyRuns * result.AvgAIC)
result.MonthlyProjectedAIC = roundForecastAIC(monthlyRuns * result.AvgAIC)
Comment on lines +134 to +142
for (const wf of workflows) {
const samples = Array.isArray(wf?.run_samples) ? wf.run_samples : [];
for (const s of samples) {
const runID = s?.run_id ?? "";
const date = s?.date ?? "";
const aic = formatAIC(s?.aic ?? 0);
lines.push(`| ${escapeCell(wf.workflow_id)} | #${runID} | ${date} | ${aic} |`);
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants