From 999ee8c03107e59f670fd3fe26f2476e23b914bb Mon Sep 17 00:00:00 2001 From: Antoine van der Lee <4329185+AvdLee@users.noreply.github.com> Date: Mon, 23 Mar 2026 08:30:21 +0100 Subject: [PATCH 1/2] Refocus all skills on wall-clock build time as the primary success metric User feedback showed the skills were over-indexing on cumulative task time (which Xcode parallelizes) and presenting it as build-time savings. This led to many source-level fixes that reduced compiler workload without actually reducing how long the developer waits. Key changes: - AGENTS.md: wall-clock first principle inherited by all skills - Orchestrator: blocking-vs-parallel heuristics, impact language templates, wall-clock-first final report - Report script/template: timing-table disclaimers, wait-time impact field, plain-language verification - Compilation analyzer: parallel workload labeling when not on critical path - Project/SPM analyzers: wall-clock qualifier on prioritization tiers - Fixer: wall-clock delta leads reporting, honest language when task metrics improve but wait time does not - Benchmark artifacts: documents wall-clock vs cumulative distinction - Recommendation format: new wait_time_impact required field --- AGENTS.md | 2 + references/benchmark-artifacts.md | 6 +++ references/recommendation-format.md | 13 ++++-- scripts/generate_optimization_report.py | 14 +++++- .../references/spm-analysis-checks.md | 2 + skills/xcode-build-fixer/SKILL.md | 19 ++++++-- skills/xcode-build-orchestrator/SKILL.md | 46 +++++++++++-------- .../orchestration-report-template.md | 17 ++++--- skills/xcode-compilation-analyzer/SKILL.md | 5 +- .../references/project-audit-checks.md | 2 + 10 files changed, 87 insertions(+), 39 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 6d28348..2035c0d 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -21,6 +21,8 @@ This is a multi-skill Xcode build optimization repository. ## Rules +- Wall-clock build time (how long the developer waits) is the primary success metric. Every recommendation must state its expected impact on wall-clock time. If the impact cannot be predicted, say so. +- Cumulative task time from the Build Timing Summary is diagnostic evidence, not proof of wall-time impact. Xcode parallelizes aggressively, so reducing parallel task time may produce zero wait-time improvement. - Recommend-first by default. Never apply project, source, or package changes without explicit developer approval. - Benchmark before optimizing. Use `.build-benchmark/` artifacts as evidence. - Treat clean and incremental builds as separate metrics. diff --git a/references/benchmark-artifacts.md b/references/benchmark-artifacts.md index d460acb..53ad45b 100644 --- a/references/benchmark-artifacts.md +++ b/references/benchmark-artifacts.md @@ -8,6 +8,12 @@ All skills in this repository should treat `.build-benchmark/` as the canonical - Make clean and incremental build data easy to compare. - Preserve enough context for later specialist analysis without rerunning the benchmark. +## Wall-Clock vs Cumulative Task Time + +The `duration_seconds` field on each run and the `median_seconds` in the summary represent **wall-clock time** -- how long the developer actually waits. This is the primary success metric. + +The `timing_summary_categories` are **aggregated task times** parsed from Xcode's Build Timing Summary. Because Xcode runs many tasks in parallel across CPU cores, these totals typically exceed the wall-clock duration. A large cumulative `SwiftCompile` value is diagnostic evidence of compiler workload, not proof that compilation is blocking the build. Always compare category totals against the wall-clock median before concluding that a category is a bottleneck. + ## File Layout Recommended outputs: diff --git a/references/recommendation-format.md b/references/recommendation-format.md index afc0c21..98dacc6 100644 --- a/references/recommendation-format.md +++ b/references/recommendation-format.md @@ -7,6 +7,7 @@ All optimization skills should report recommendations in a shared structure so t Each recommendation should include: - `title` +- `wait_time_impact` -- plain-language statement of expected wall-clock impact, e.g. "Expected to reduce your clean build by ~3s", "Reduces parallel compile work but unlikely to reduce build wait time", or "Impact on wait time is uncertain -- re-benchmark to confirm" - `category` - `observed_evidence` - `estimated_impact` @@ -30,6 +31,7 @@ Each recommendation should include: "recommendations": [ { "title": "Guard a release-only symbol upload script", + "wait_time_impact": "Expected to reduce your incremental build by approximately 6 seconds.", "category": "project", "observed_evidence": [ "Incremental builds spend 6.3 seconds in a run script phase.", @@ -51,11 +53,12 @@ Each recommendation should include: When rendering for human review, preserve the same field order: 1. title -2. observed evidence -3. estimated impact -4. confidence -5. approval required -6. benchmark verification status +2. wait-time impact +3. observed evidence +4. estimated impact +5. confidence +6. approval required +7. benchmark verification status That makes it easier for the developer to approve or reject specific items quickly. diff --git a/scripts/generate_optimization_report.py b/scripts/generate_optimization_report.py index 3dcc243..cf57ed5 100644 --- a/scripts/generate_optimization_report.py +++ b/scripts/generate_optimization_report.py @@ -293,6 +293,12 @@ def _section_baseline(benchmark: Dict[str, Any]) -> str: count = len(runs) or 1 ranked = sorted(all_cats.items(), key=lambda x: x[1]["seconds"], reverse=True) lines.append(f"\n### {build_type.title()} Build Timing Summary\n") + lines.append( + "> **Note:** These are aggregated task times across all CPU cores. " + "Because Xcode runs many tasks in parallel, these totals typically exceed " + "the actual build wait time shown above. A large number here does not mean " + "it is blocking your build.\n" + ) lines.append("| Category | Tasks | Seconds |") lines.append("|----------|------:|--------:|") for name, data in ranked: @@ -365,6 +371,7 @@ def _section_recommendations(recommendations: Optional[Dict[str, Any]]) -> str: title = item.get("title", "Untitled") lines.append(f"### {i}. {title}\n") for field, label in [ + ("wait_time_impact", "Wait-Time Impact"), ("category", "Category"), ("observed_evidence", "Evidence"), ("estimated_impact", "Impact"), @@ -394,9 +401,11 @@ def _section_approval(recommendations: Optional[Dict[str, Any]]) -> str: lines = ["## Approval Checklist\n"] for i, item in enumerate(items, 1): title = item.get("title", "Untitled") + wait_impact = item.get("wait_time_impact", "") impact = item.get("estimated_impact", "") risk = item.get("risk_level", "") - lines.append(f"- [ ] **{i}. {title}** -- Impact: {impact} | Risk: {risk}") + impact_str = wait_impact if wait_impact else impact + lines.append(f"- [ ] **{i}. {title}** -- Impact: {impact_str} | Risk: {risk}") return "\n".join(lines) @@ -421,7 +430,8 @@ def _section_next_steps(benchmark: Dict[str, Any]) -> str: lines.append(f' --destination "{build["destination"]}" \\') lines.append(" --output-dir .build-benchmark") lines.append("```\n") - lines.append("Compare the new medians against the baseline to verify improvements.") + lines.append("Compare the new wall-clock medians against the baseline. Report results as:") + lines.append('"Your [clean/incremental] build now takes X.Xs (was Y.Ys) -- Z.Zs faster/slower."') return "\n".join(lines) diff --git a/skills/spm-build-analysis/references/spm-analysis-checks.md b/skills/spm-build-analysis/references/spm-analysis-checks.md index 6117fc1..d15cd39 100644 --- a/skills/spm-build-analysis/references/spm-analysis-checks.md +++ b/skills/spm-build-analysis/references/spm-analysis-checks.md @@ -98,6 +98,8 @@ Use this reference when package dependencies or package plugins are suspected bu ## Recommendation Prioritization +Qualify every estimated impact with wall-clock framing. High-priority items should be those likely to reduce the developer's actual wait time, not just cumulative task totals. If the impact on wait time is uncertain, say so. + - High: package plugins or graph structure repeatedly inflating incremental builds, circular dependencies, umbrella re-exports causing cascading rebuilds, Swift macro cascading that causes near-full rebuilds from trivial changes. - Medium: configuration drift that causes duplicate module variants, oversized modules, missing interface/implementation separation, multi-platform build multiplication, `swift-syntax` building universally without prebuilt binary. - Low: clean-environment checkout costs that barely affect local iteration, minor transitive dependency cleanup. diff --git a/skills/xcode-build-fixer/SKILL.md b/skills/xcode-build-fixer/SKILL.md index 2a4d2ac..4779234 100644 --- a/skills/xcode-build-fixer/SKILL.md +++ b/skills/xcode-build-fixer/SKILL.md @@ -102,16 +102,25 @@ Before applying version pin changes: ## Reporting -After applying changes, update the optimization plan with: +Lead with the wall-clock result in plain language: -- Post-change clean build median -- Post-change incremental build median -- Absolute and percentage deltas for both +> "Your clean build now takes X.Xs (was Y.Ys) -- Z.Zs faster." +> "Your incremental build now takes X.Xs (was Y.Ys) -- Z.Zs faster." + +Then include: + +- Post-change clean build wall-clock median +- Post-change incremental build wall-clock median +- Absolute and percentage wall-clock deltas for both - Confidence notes if benchmark noise is high - List of files modified per fix - Any deviations from the original recommendation -If a fix produced no measurable improvement, note `No measurable improvement` and suggest whether to keep or revert. +If cumulative task metrics improved but wall-clock did not, say plainly: "Compiler workload decreased but build wait time did not improve. This is expected when Xcode runs these tasks in parallel with other equally long work." + +If a fix produced no measurable wall-time improvement, note `No measurable wall-time improvement` and suggest whether to keep (e.g. for code quality) or revert. + +For changes valuable for non-benchmark reasons (deterministic package resolution, branch-switch caching), label them: "No wait-time improvement expected from this change. The benefit is [deterministic builds / faster branch switching / reduced CI cost]." Note: `COMPILATION_CACHING` improvements cannot be captured by the standard clean-build benchmark because `xcodebuild clean` invalidates the cache between runs. When reporting on this setting, note that the benefit is real but requires a different measurement approach (e.g., branch-switch benchmarks or repeat builds without cleaning). Recommend keeping the setting enabled based on documented benefit rather than requiring a delta from the benchmark. diff --git a/skills/xcode-build-orchestrator/SKILL.md b/skills/xcode-build-orchestrator/SKILL.md index 1e1c83b..e0a3710 100644 --- a/skills/xcode-build-orchestrator/SKILL.md +++ b/skills/xcode-build-orchestrator/SKILL.md @@ -9,11 +9,12 @@ Use this skill as the recommend-first entrypoint for end-to-end Xcode build opti ## Non-Negotiable Rules +- Wall-clock build time (how long the developer waits) is the primary success metric. Every recommendation must state its expected impact on the developer's actual wait time. - Start in recommendation mode. - Benchmark before making changes. - Do not modify project files, source files, packages, or scripts without explicit developer approval. - Preserve the evidence trail for every recommendation. -- Re-benchmark after approved changes and report the delta. +- Re-benchmark after approved changes and report the wall-clock delta. ## Two-Phase Workflow @@ -27,7 +28,7 @@ Run this phase in agent mode because the agent needs to execute builds, run benc 2. Run `xcode-build-benchmark` to establish a baseline if no fresh benchmark exists. If the build fails to compile, check `git log` for a recent buildable commit. When working in a worktree, cherry-picking a targeted build fix from a feature branch is acceptable to reach a buildable state. If SPM packages reference gitignored directories in their `exclude:` paths (e.g., `__Snapshots__`), create those directories before building -- worktrees do not contain gitignored content and `xcodebuild -resolvePackageDependencies` will crash otherwise. 3. Verify the benchmark artifact has non-empty `timing_summary_categories`. If empty, the timing summary parser may have failed -- re-parse the raw logs or inspect them manually. 4. If incremental builds are the primary pain point and Xcode 16.4+ is available, recommend the developer enable **Task Backtraces** (Scheme Editor > Build tab > Build Debugging > "Task Backtraces"). This reveals why each task re-ran, which is critical for diagnosing unexpected replanning or input invalidation. Include any Task Backtrace evidence in the analysis. -5. If `SwiftCompile`, `CompileC`, `SwiftEmitModule`, or `Planning Swift module` dominate the timing summary, run `diagnose_compilation.py` with the same project inputs to capture type-checking hotspots. +5. Determine whether compile tasks are likely blocking wall-clock progress or just consuming parallel CPU time. Compare the sum of all timing-summary category seconds against the wall-clock median: if the sum is 2x+ the median, most work is parallelized and compile hotspot fixes are unlikely to reduce wait time. If `SwiftCompile`, `CompileC`, `SwiftEmitModule`, or `Planning Swift module` dominate the timing summary **and** appear likely to be on the critical path, run `diagnose_compilation.py` to capture type-checking hotspots. If they are parallelized, still run diagnostics but label findings as "parallel efficiency improvements" rather than "build time improvements." 6. Run the specialist analyses that fit the evidence by reading each skill's SKILL.md and applying its workflow: - [`xcode-compilation-analyzer`](../xcode-compilation-analyzer/SKILL.md) - [`xcode-project-analyzer`](../xcode-project-analyzer/SKILL.md) @@ -49,28 +50,36 @@ Run this phase in agent mode after the developer has reviewed and approved recom ## Prioritization Rules -Rank items using: +The goal is to reduce how long the developer waits for builds to finish. -- measured evidence strength -- expected impact on incremental builds -- expected impact on clean builds -- implementation risk -- confidence +1. Identify the developer's primary pain (clean build, incremental build, or both) and the measured wall-clock median. +2. Determine what is likely **blocking** wall-clock progress: + - If the sum of all timing-summary category seconds is 2x+ the wall-clock median, most work is parallelized. Compile hotspot fixes are unlikely to reduce wait time. + - If a single serial category (e.g. `PhaseScriptExecution`, `CompileAssetCatalog`, `CodeSign`) accounts for a large fraction of wall-clock, that is the real bottleneck. + - If `Planning Swift module` or `SwiftEmitModule` dominates incremental builds, the cause is likely invalidation or module size, not individual file compile speed. +3. Rank recommendations by likely wall-time savings, not cumulative task reduction. +4. Source-level compile fixes should not outrank project/graph/configuration fixes unless evidence suggests they are on the critical path. -Prefer changes that are: +Prefer changes that are measurable, reversible, and low-risk. -- measurable -- reversible -- low-risk -- likely to improve the most common developer loop first +## Recommendation Impact Language + +Every recommendation presented to the developer must include one of these impact statements: + +- "Expected to reduce your [clean/incremental] build by approximately X seconds." +- "Reduces parallel compile work but is unlikely to reduce your build wait time because other tasks take equally long." +- "Impact on wait time is uncertain -- re-benchmark after applying to confirm." +- "No wait-time improvement expected. The benefit is [deterministic builds / faster branch switching / reduced CI cost]." + +Never quote cumulative task-time savings as the headline impact. If a change reduces 5 seconds of parallel compile work but another equally long task still runs, the developer's wait time does not change. ## Approval Gate Before implementing anything, present a short approval list that includes: - recommendation name +- expected wait-time impact (using the impact language above) - evidence summary -- estimated impact - affected files or settings - whether the change is low, medium, or high risk @@ -87,14 +96,15 @@ After approval, delegate to `xcode-build-fixer`: ## Final Report -The final report must include: +Lead with the wall-clock result in plain language, e.g.: "Your clean build now takes 82s (was 86s) -- 4s faster." Then include: -- baseline clean and incremental medians -- post-change clean and incremental medians -- absolute and percentage deltas +- baseline clean and incremental wall-clock medians +- post-change clean and incremental wall-clock medians +- absolute and percentage wall-clock deltas - what changed - what was intentionally left unchanged - confidence notes if noise prevents a strong conclusion +- if cumulative task metrics improved but wall-clock did not, say plainly: "Compiler workload decreased but build wait time did not improve. This is expected when Xcode runs these tasks in parallel with other equally long work." - a ready-to-paste community results row and a link to open a PR (see the report template) ## Preferred Command Paths diff --git a/skills/xcode-build-orchestrator/references/orchestration-report-template.md b/skills/xcode-build-orchestrator/references/orchestration-report-template.md index 3f35591..ff30f6e 100644 --- a/skills/xcode-build-orchestrator/references/orchestration-report-template.md +++ b/skills/xcode-build-orchestrator/references/orchestration-report-template.md @@ -27,6 +27,8 @@ Use this structure when the orchestrator consolidates benchmark evidence and spe ### Clean Build Timing Summary +> **Note:** These are aggregated task times across all CPU cores. Because Xcode runs many tasks in parallel, these totals typically exceed the actual build wait time shown above. A large number here does not mean it is blocking your build. + | Category | Tasks | Seconds | |----------|------:|--------:| | SwiftCompile | 325 | 271.245s | @@ -68,6 +70,7 @@ Use this structure when the orchestrator consolidates benchmark evidence and spe ## Prioritized Recommendations ### 1. Recommendation title +**Wait-Time Impact:** Expected to reduce your clean build by approximately 3 seconds. **Category:** project **Evidence:** ... **Impact:** High @@ -75,8 +78,8 @@ Use this structure when the orchestrator consolidates benchmark evidence and spe **Risk:** Low ## Approval Checklist -- [ ] **1. Recommendation title** -- Impact: High | Risk: Low -- [ ] **2. Another recommendation** -- Impact: Medium | Risk: Low +- [ ] **1. Recommendation title** -- Wait-Time Impact: ~3s clean build reduction | Risk: Low +- [ ] **2. Another recommendation** -- Wait-Time Impact: Uncertain, re-benchmark to confirm | Risk: Low ## Next Steps @@ -84,14 +87,14 @@ After implementing approved changes, re-benchmark with the same inputs: ... -Compare the new medians against the baseline to verify improvements. +Compare the new wall-clock medians against the baseline. Report results as: +"Your [clean/incremental] build now takes X.Xs (was Y.Ys) -- Z.Zs faster/slower." ## Verification (post-approval) -- Post-change clean median: -- Post-change zero-change median: -- Clean delta: -- Zero-change delta: +- Post-change clean build: X.Xs (was Y.Ys) -- Z.Zs faster/slower +- Post-change incremental build: X.Xs (was Y.Ys) -- Z.Zs faster/slower +- If cumulative task metrics improved but wall-clock did not: "Compiler workload decreased but build wait time did not improve. This is expected when Xcode runs these tasks in parallel with other equally long work." ## Remaining follow-up ideas - Item: diff --git a/skills/xcode-compilation-analyzer/SKILL.md b/skills/xcode-compilation-analyzer/SKILL.md index 56a8269..f82b550 100644 --- a/skills/xcode-compilation-analyzer/SKILL.md +++ b/skills/xcode-compilation-analyzer/SKILL.md @@ -11,7 +11,8 @@ Use this skill when compile time, not just general project configuration, looks - Start from evidence, ideally a recent `.build-benchmark/` artifact or raw timing-summary output. - Prefer analysis-only compiler flags over persistent project edits during investigation. -- Rank findings by expected compile-time impact, not by how easy they are to describe. +- Rank findings by expected **wall-clock** impact, not cumulative compile-time impact. When compile tasks are heavily parallelized (sum of compile categories >> wall-clock median), note that fixing individual hotspots may improve parallel efficiency without reducing build wait time. +- When the evidence points to parallelized work rather than serial bottlenecks, label recommendations as "Reduces compiler workload (parallel)" rather than "Reduces build time." - Do not edit source or build settings without explicit developer approval. ## What To Inspect @@ -68,7 +69,7 @@ For each recommendation, include: - observed evidence - likely affected file or module -- estimated impact +- expected wait-time impact (e.g. "Expected to reduce your clean build by ~2s" or "Reduces parallel compile work but unlikely to reduce build wait time") - confidence - whether approval is required before applying it diff --git a/skills/xcode-project-analyzer/references/project-audit-checks.md b/skills/xcode-project-analyzer/references/project-audit-checks.md index f527d8c..e84a288 100644 --- a/skills/xcode-project-analyzer/references/project-audit-checks.md +++ b/skills/xcode-project-analyzer/references/project-audit-checks.md @@ -81,6 +81,8 @@ Do not flag language-migration settings (`SWIFT_STRICT_CONCURRENCY`, `SWIFT_UPCO ## Recommendation Prioritization +Qualify every estimated impact with wall-clock framing. High-priority items should be those likely to reduce the developer's actual wait time, not just cumulative task totals. If the impact on wait time is uncertain, say so. + - High: serial script bottlenecks, missing dependency metadata, configuration drift causing redundant module builds, excessive "Planning Swift module" time, or scripts silently invalidating build inputs. - Medium: stale target structure, noncritical scripts running too often, slow asset catalog compilation blocking the critical path, unnecessary codesigning on unchanged output, or significant `ExtractAppIntentsMetadata` time in projects without App Intents. - Low: settings cleanup without strong evidence of current impact. From ac8632f78dd310d2d6718cee81ad95dbe9620fb8 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Mon, 23 Mar 2026 07:31:09 +0000 Subject: [PATCH 2/2] chore: sync README structure [skip ci] --- README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/README.md b/README.md index c232fd3..a72771b 100644 --- a/README.md +++ b/README.md @@ -154,7 +154,6 @@ xcode-build-optimization-agent-skill/ build-benchmark.schema.json scripts/ benchmark_builds.py - check_spm_pins.py diagnose_compilation.py generate_optimization_report.py render_recommendations.py