feat(sre-agent): add FinOps recipe content by MSBrett · Pull Request #2168 · microsoft/finops-toolkit

MSBrett · 2026-06-03T15:01:41Z

Replaces part of #2111.

Scope:

Adds the FinOps Hub SRE Agent recipe content, expected config, scheduled tasks, tools, skills, knowledge, subagents, and azcapman submodule reference.
Includes recipe verification/build support and the SRE Agent deploy unit test.

Review notes:

Base: features/sre-kpi-query-catalog.
Depends on the KPI query catalog PR because the recipe consumes that catalog.
Split plan: memory://projects/finops-toolkit/pr-2111-split-plan.

Verification:

git diff --check features/sre-kpi-query-catalog..features/sre-agent-recipe
Split coverage union has 247 files, matching Azure SRE Agent template, tools, schedules, and docs #2111 with no missing or extra paths.

Copilot

Pull request overview

Adds the FinOps Hub Azure SRE Agent “recipe” content under src/templates/sre-agent/, including subagent definitions, tool manifests (Kusto + Python), skills/reference material, knowledge files, scheduled tasks, and verification/config artifacts intended to support deployment and ongoing ops workflows.

Changes:

Introduces the recipes/finops-hub/ recipe (agents, tools, skills, scheduled tasks, connectors, knowledge) for the FinOps Toolkit SRE Agent.
Adds verification/config artifacts (expected-config.json, roles.yaml, agent.json, built-in tool overrides) to validate deployments and expected inventory.
Adds template guardrails and repository wiring (AGENTS guidance, upstream pin, git attributes/ignore, submodule reference).

Reviewed changes

Copilot reviewed 91 out of 93 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
src/templates/sre-agent/recipes/finops-hub/roles.yaml	Declares managed identity RBAC requirements for subscription/target RGs and ADX.
src/templates/sre-agent/recipes/finops-hub/README.md	Recipe overview and deploy usage.
src/templates/sre-agent/recipes/finops-hub/knowledge/teams-notification-guide.md	Delivery guidance for Teams/Outlook connector tools.
src/templates/sre-agent/recipes/finops-hub/knowledge/onboarding-recommendations.md	Onboarding and connector setup recommendations.
src/templates/sre-agent/recipes/finops-hub/knowledge/document-index.md	Knowledge source inventory and verification sentinel.
src/templates/sre-agent/recipes/finops-hub/knowledge/chart-artifact-verification.md	Non-visual chart artifact verification guidance.
src/templates/sre-agent/recipes/finops-hub/expected-config.json	Expected inventory/config used for verification comparisons.
src/templates/sre-agent/recipes/finops-hub/connectors.json	Defines the FinOps Hub Kusto connector for the recipe.
src/templates/sre-agent/recipes/finops-hub/config/tools/vm-quota-usage.yaml	Python tool: VM quota usage reporting via ARM REST.
src/templates/sre-agent/recipes/finops-hub/config/tools/suppress-advisor-recommendations.yaml	Python tool: bulk Advisor suppression creation via Resource Graph + ARM PUT.
src/templates/sre-agent/recipes/finops-hub/config/tools/resource-graph-query.yaml	Python tool: run Resource Graph queries across subscriptions.
src/templates/sre-agent/recipes/finops-hub/config/tools/deploy-bulk-anomaly-alerts.yaml	Python tool: deploy Cost Mgmt anomaly alerts across MG subscriptions.
src/templates/sre-agent/recipes/finops-hub/config/tools/deploy-budget.yaml	Python tool: create/update subscription budgets via ARM REST.
src/templates/sre-agent/recipes/finops-hub/config/tools/deploy-anomaly-alert.yaml	Python tool: create/update anomaly scheduled actions per subscription.
src/templates/sre-agent/recipes/finops-hub/config/tools/capacity-reservation-groups.yaml	Python tool: CRG inventory and utilization reporting.
src/templates/sre-agent/recipes/finops-hub/config/tools/benefit-recommendations.yaml	Python tool: Cost Mgmt benefit recommendations at billing scope.
src/templates/sre-agent/recipes/finops-hub/config/subagents/ftk-hubs-agent.yaml	Subagent definition for hub deployment/ops workflows plus tool/hook config.
src/templates/sre-agent/recipes/finops-hub/config/subagents/ftk-database-query.yaml	Subagent definition for Kusto/catalog-based analytics plus tool/hook config.
src/templates/sre-agent/recipes/finops-hub/config/subagents/finops-practitioner.yaml	Orchestrator subagent definition and delegation rules.
src/templates/sre-agent/recipes/finops-hub/config/subagents/chief-financial-officer.yaml	CFO subagent definition (KB-only) for exec framing.
src/templates/sre-agent/recipes/finops-hub/config/subagents/azure-capacity-manager.yaml	Capacity subagent definition plus tool/hook config.
src/templates/sre-agent/recipes/finops-hub/config/skills/finops-toolkit/references/workflows/ftk-hubs-healthCheck.md	Skill reference workflow: hub health checks.
src/templates/sre-agent/recipes/finops-hub/config/skills/finops-toolkit/references/workflows/ftk-hubs-connect.md	Skill reference workflow: connect/discover hub and persist env.
src/templates/sre-agent/recipes/finops-hub/config/skills/finops-toolkit/references/understand-finops-hub-context.md	Foundational grounding step for hub context before analysis.
src/templates/sre-agent/recipes/finops-hub/config/skills/finops-toolkit/references/top-cost-drivers.md	Skill reference for ranking/driver analysis patterns.
src/templates/sre-agent/recipes/finops-hub/config/skills/finops-toolkit/references/settings-format.md	Skill reference for `.ftk/environments.local.md` schema.
src/templates/sre-agent/recipes/finops-hub/config/skills/finops-toolkit/references/finops-hubs.md	Skill knowledge: query catalog usage, constraints, best practices.
src/templates/sre-agent/recipes/finops-hub/config/skills/finops-toolkit/references/custom-dimension-analysis.md	Skill reference for allocation/tag dimension analysis.
src/templates/sre-agent/recipes/finops-hub/config/skills/finops-toolkit/references/cost-trend-analysis.md	Skill reference for trend analysis patterns.
src/templates/sre-agent/recipes/finops-hub/config/skills/finops-toolkit/references/cost-spike-investigation.md	Skill reference for spike root-cause patterns.
src/templates/sre-agent/recipes/finops-hub/config/skills/finops-toolkit/references/cost-anomaly-detection.md	Skill reference for anomaly detection decomposition.
src/templates/sre-agent/recipes/finops-hub/config/skills/finops-toolkit/README.md	Skill README describing activation and query catalog usage.
src/templates/sre-agent/recipes/finops-hub/config/skills/azure-cost-management/references/Get-BenefitRecommendations.ps1	Reference script for benefit recommendations via Az REST.
src/templates/sre-agent/recipes/finops-hub/config/skills/azure-cost-management/references/azure-macc.md	Cost Mgmt knowledge: MACC tracking and workflows.
src/templates/sre-agent/recipes/finops-hub/config/skills/azure-cost-management/references/azure-credits.md	Cost Mgmt knowledge: Azure credits/prepayment workflows.
src/templates/sre-agent/recipes/finops-hub/config/skills/azure-cost-management/references/azure-cost-exports.md	Cost Mgmt knowledge: exports config/backfill guidance.
src/templates/sre-agent/recipes/finops-hub/config/skills/azure-cost-management/references/azure-budgets.md	Cost Mgmt knowledge: budget creation/notifications/action groups.
src/templates/sre-agent/recipes/finops-hub/config/skills/azure-cost-management/README.md	Azure cost management skill README.
src/templates/sre-agent/recipes/finops-hub/config/built-in-tools.json	Enables/overrides built-in visualization and log query tools.
src/templates/sre-agent/recipes/finops-hub/automations/scheduled-tasks/storage-paas-growth-forecast.yaml	Scheduled task: storage/PaaS growth forecast.
src/templates/sre-agent/recipes/finops-hub/automations/scheduled-tasks/sku-availability-audit.yaml	Scheduled task: SKU availability audit.
src/templates/sre-agent/recipes/finops-hub/automations/scheduled-tasks/non-compute-quota-audit.yaml	Scheduled task: non-compute quota audit.
src/templates/sre-agent/recipes/finops-hub/automations/scheduled-tasks/monitoring-scope-validation.yaml	Scheduled task: monitoring scope validation.
src/templates/sre-agent/recipes/finops-hub/automations/scheduled-tasks/db-quota-audit.yaml	Scheduled task: DB quota audit.
src/templates/sre-agent/recipes/finops-hub/automations/scheduled-tasks/compute-utilization-trend.yaml	Scheduled task: compute utilization trend.
src/templates/sre-agent/recipes/finops-hub/automations/scheduled-tasks/capacity-daily-monitor.yaml	Scheduled task: daily capacity monitor.
src/templates/sre-agent/recipes/finops-hub/automations/scheduled-tasks/budget-coverage-audit.yaml	Scheduled task: budget coverage audit.
src/templates/sre-agent/recipes/finops-hub/automations/scheduled-tasks/benefit-recommendation-review.yaml	Scheduled task: benefit recommendation executive review.
src/templates/sre-agent/recipes/finops-hub/automations/scheduled-tasks/alert-coverage-audit.yaml	Scheduled task: anomaly alert coverage audit.
src/templates/sre-agent/recipes/finops-hub/automations/scheduled-tasks/advisor-suppression-review.yaml	Scheduled task: Advisor suppression review.
src/templates/sre-agent/recipes/finops-hub/agent.json	Default agent settings (access/action mode, provider, toggles).
src/templates/sre-agent/AGENTS.md	Guardrails and template inventory for contributors/agents.
src/templates/sre-agent/.upstream-pin	Upstream pin metadata for the starter-lab template.
src/templates/sre-agent/.gitignore	Ignores build gate artifacts and un-ignores `bin/`.
src/templates/sre-agent/.gitattributes	Forces LF EOL for shell scripts.
.gitmodules	Adds azcapman submodule reference under the SRE agent template.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

RolandKrummenacher

Reviewed the SRE Agent recipe across three areas — deployable IaC/RBAC, shell/Python/build tooling, and repo structure/packaging/content. The recipe is well-engineered overall (read-only Kusto AllDatabasesViewer scoped to one cluster, system-assigned identity, secure App Insights via reference(), agent-scoped admin role, hermetic Pester tests, yaml.safe_load, parameterized JMESPath/jq, accurate 37-query count). But there are a few merge-blockers, mostly around insecure-by-default deployment and a fragile submodule. Details inline.

High / blocking

Portal one-click deploy is insecure by default: accessLevel='High' (Contributor on the agent RG + every targetResourceGroups) + actionMode='autonomous' + all 19 shipped scheduled tasks agent_mode: autonomous and enabled → an unattended, write-capable agent with a fleet of self-triggering cron jobs on first click. The CLI path correctly defaults to Low/review; the portal path deliberately escalates both.
The deployment script downloads the recipe package and executes its contents (connectors, Python tools, skills, subagents, autonomous scheduled tasks) with no integrity check, from an overridable recipePackageUri.
Git submodule azcapman is uninitialized, unpinned (no branch), and the capacity skill's files are symlinks into it — the build fails / silently drops the capacity skill without git submodule update --init, and every toolkit consumer/clone/source-zip now inherits the submodule.

Medium
4. telemetry.sh hardcodes an App Insights ikey but is dead code (never sourced), while the README's documented --no-telemetry / SRE_AGENT_NO_TELEMETRY opt-out is a silent no-op.
5. yoy-report.yaml hardcodes a July–June fiscal year (same class as the sibling plugin PR #2167).

Also (no inline anchor — binary): docs/deploy/sre-agent/{14.0,latest}/sre-agent-recipe.zip commits a 346 KB build artifact twice (byte-identical). No other docs/deploy path on dev commits a binary zip; binaries don't diff/review and the two copies will drift. Prefer generating it at release, or add a CI check that both copies match the source recipe + mark it binary in .gitattributes.

(For reference, the sibling PR #2167 carries the plugin versions of much of this skill/agent content; the fiscal-year item there and here should be resolved consistently.)

RolandKrummenacher · 2026-06-17T07:04:42Z

+
+@description('Agent access level.')
+@allowed(['Low', 'High'])
+param accessLevel string = 'High'


Insecure-by-default portal deployment. The portal entry point defaults accessLevel='High' (line 30) and actionMode='autonomous' (line 34). High grants the agent's identity Contributor (b24988ac) on the agent RG and every targetResourceGroups (infra/modules/resource-group-rbac.bicep), and all 19 shipped scheduled tasks are agent_mode: autonomous and enabled. So a one-click portal deploy yields an unattended, write-capable agent running a fleet of self-triggering cron jobs immediately.

The CLI path (infra/main.bicep:24,28) correctly defaults to Low + review; only this portal wrapper escalates. Recommend defaulting the portal to Low/review too, and/or gating High+autonomous behind an explicit acknowledgement in createUiDefinition.json, and shipping the scheduled tasks disabled (or in review mode) so the operator opts into autonomy.

RolandKrummenacher · 2026-06-17T07:04:42Z

+
+Write-Output "Downloading SRE Agent recipe package: $recipePackageUri"
+Invoke-WithRetry -Label 'download recipe package' -Action {
+    Invoke-WebRequest -Uri $recipePackageUri -OutFile $zipPath


Recipe package is downloaded and its contents executed with no integrity verification. recipePackageUri (line 122) comes from an env var, is fetched via Invoke-WebRequest (line 139), Expand-Archived (line 141), and its extras.json is pushed to the agent — connectors, Python tools, skills, subagents, and autonomous scheduled tasks — which then run with the agent's (Contributor, when High) identity. The default is templateLink-relative (trustworthy from the official template), but the parameter is fully overridable and there's no SHA pin/signature. Anyone who can set recipePackageUri (or MITM a non-pinned host) can inject tools/subagents/cron tasks. Recommend pinning an expected SHA-256 and verifying after download, and restricting the URI to https:// on a known-host allowlist. (recipePackageUri is correctly not exposed in createUiDefinition.json, which limits portal tampering — good — but ARM/CLI callers can still override it.)

RolandKrummenacher · 2026-06-17T07:04:42Z

@@ -0,0 +1,3 @@
+[submodule "src/templates/sre-agent/submodules/azcapman"]


Submodule makes the recipe build non-hermetic and is currently broken. azcapman is pinned only by gitlink SHA (no branch), is uninitialized, and three files under recipes/finops-hub/config/skills/azure-capacity-management/ (SKILL.md, references/docs, references/scripts) are symlinks into it. build-extras.py raises Expected skill directory missing SKILL.md or has a broken symlink: azure-capacity-management, so Build-SreAgentTemplate.ps1 fails unless git submodule update --init was run first — and no build doc/README states that prerequisite. Adding a submodule also burdens every toolkit clone and any git archive/source zip (empty submodule → broken symlinks). Recommend vendoring the azcapman skill/docs/scripts directly into the template and dropping the submodule + symlinks; if the submodule must stay, pin a branch and add an enforced, documented submodule update --init step in the build.

RolandKrummenacher · 2026-06-17T07:04:42Z

+#   source "$(dirname "$0")/telemetry.sh"
+#   send_telemetry "deploy" "finops-hub" "westus3" "true" "false" "true" "false" "deploy"
+
+_TELEMETRY_IKEY="f10eff7f-b995-4c41-8347-90f0f55d5969"


Two issues. (1) This sender is dead code — telemetry.sh is never sourced/executed by any script, yet it's shipped with a hardcoded App Insights ikey and a raw POST to the ingestion endpoint. Meanwhile the README (line 219) tells users to export SRE_AGENT_NO_TELEMETRY=1 and deploy.sh accepts --no-telemetry, but both are no-ops (deploy.sh just shifts the flag "for compatibility"), so the documented opt-out controls nothing. (2) It diverges from the toolkit's telemetry convention — every other template uses the Bicep enableDefaultTelemetry deployment (ARM-visible, parameter opt-out). Recommend either deleting telemetry.sh (and the misleading README/flag), or wiring it through the standard Bicep telemetry mechanism and actually honoring the opt-out. (Hardcoding an App Insights ingestion key in a public repo is itself acceptable — they're write-only — the problems are the dead code + misleading docs + divergence.)

RolandKrummenacher · 2026-06-17T07:04:42Z

+    Load your finops-toolkit and azure-cost-management skills. Lead this as the FinOps practitioner. Delegate all FinOps Hub Kusto evidence collection to `ftk-database-query`, delegate capacity-risk evidence to `azure-capacity-manager`, and consult `chief-financial-officer` for fiscal planning, executive framing, and commitment-risk recommendations.
+
+
+    Our fiscal year runs July through June. This task runs on January 5 and July 5. On January 5, compare the completed July-December first-half period against the same period in the prior fiscal year and forecast through June 30. On July 5, compare the just-completed July-June fiscal year against the previous fiscal year and prepare the next fiscal year planning view.


Hardcoded July–June fiscal year baked into the task prompt ("Our fiscal year runs July through June... forecast through June 30"; also lines 87, 92, 145, 165). That's Microsoft's FY, not the customer's — for a calendar-FY org this produces the wrong comparison windows and forecast dates. Same class of issue flagged on the sibling plugin PR #2167's ftk-ytd-report; recommend resolving both consistently (parameterize fiscal-year start/end or clearly mark it as a required customization point).

# Conflicts: # docs-mslearn/toolkit/changelog.md # src/queries/INDEX.md # src/queries/KPI.md # src/queries/catalog/ai-cost-by-application.kql # src/queries/catalog/ai-daily-trend.kql # src/queries/catalog/ai-model-cost-comparison.kql # src/queries/catalog/ai-token-usage-breakdown.kql # src/queries/catalog/allocation-accuracy-index.kql # src/queries/catalog/anomaly-detection-rate.kql # src/queries/catalog/anomaly-variance-total.kql # src/queries/catalog/commitment-discount-waste.kql # src/queries/catalog/commitment-utilization-score.kql # src/queries/catalog/compute-cost-per-core.kql # src/queries/catalog/compute-spend-commitment-coverage.kql # src/queries/catalog/cost-optimization-index.kql # src/queries/catalog/cost-per-gb-stored.kql # src/queries/catalog/cost-visibility-delay.kql # src/queries/catalog/data-update-frequency.kql # src/queries/catalog/macc-consumption-vs-commitment.kql # src/queries/catalog/percentage-unallocated-costs.kql # src/queries/catalog/percentage-untagged-costs.kql # src/queries/catalog/storage-tier-distribution.kql # src/queries/catalog/tagging-policy-compliance.kql # src/queries/finops-hub-database-guide.md

…omizable fiscal year - Vendor azure-capacity-management skill as real files and drop the azcapman git submodule + symlinks so a clean clone builds all 3 skills (Roland H3). - Default the recipe agent to Low (read-only) access and autonomous reporting, and stop granting subscription-wide Reader by default (Roland H1); reports run unattended on a least-privilege identity. - Mark the yoy-report July-June fiscal year as a documented, customizable example instead of a silent assumption (Roland M5). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…artifacts Collapse the recipe and deploy slices into one coherent template PR (the recipe is a subdirectory of the template and is packaged into the deploy artifacts, so they cannot be split into independently-buildable PRs). Brings the deploy-side security fixes onto the unified template and regenerates the committed artifacts: - Secure-by-default: portal/CLI accessLevel=Low (read-only) + autonomous reporting; subscription Reader off by default (Roland H1). - Recipe package integrity: SHA-256 (fail-closed) + https host allowlist before Expand-Archive; hash now injected into azuredeploy.json at package time (Roland H2). - Removed dead bin/telemetry.sh + hardcoded App Insights key + no-op opt-out (Roland M4). - Parameterized deployer principalType for CI/CD OIDC service-principal deploys. - Regenerated docs/deploy/sre-agent/{14.0,latest} azuredeploy.json + createUiDefinition + recipe zip from scrubbed queries and vendored capacity skill: zips are leak-free and the integrity hash matches. All 39 SRE deploy tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Remove broken yaml-to-deck GitHub links (lines 82, 166) - Removed PowerPoint template reference (non-existent path) - Removed lint.py reference (non-existent path) - Kept Power BI theme reference (actual file exists) - Update ms.date to 06/17/2026 in 31 docs-mslearn files (was 06/05/2026, stale by 12 days) - Genericize brand.md: replace SRE Agent examples with product-neutral patterns - Line 113: Changed 'FinOps toolkit SRE Agent' to generic placeholder - Line 135: Replaced SRE-specific examples with <Product> placeholders - Line 159: Changed anaphora example from 'Azure SRE Agent' to generic 'Azure Data Explorer' - Line 184-192: Replaced all SRE Agent examples in page title table with Azure Data Explorer - Line 200-204: Updated anaphora short form table (removed SRE Agent, added Power BI) - Line 206: Generic 'product vs component' instead of 'agent vs subagent' - Line 210: Replaced SRE Agent in pattern list with generic products - Line 216: Changed 'subagents' to 'agents' (generic term) - Line 124: Changed URL example from SRE Agent to Azure Data Explorer - Line 240: Generic placeholder in external contributor guidance Brand guidance now provides product-neutral patterns reusable across all FinOps toolkit integrations, not coupled to SRE Agent work (PR #2168). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…enerated zips Set ms.date to today for changed docs, change portal default actionMode to review for safer defaults, and remove build-generated recipe zips with a gitignore entry. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Align Pester assertions with the safer portal default (actionMode = review) while keeping CLI, recipe, and expected-config defaults at autonomous. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

RolandKrummenacher

Approving on content — all review findings are resolved:

✅ Portal deploy defaults now accessLevel=Low + actionMode=review (secure by default; test added)
✅ azcapman capacity skill vendored as real files; submodule + .gitmodules removed
✅ Recipe package integrity: SHA256 verification added and properly wired (Package-Toolkit.ps1 injects the real hash at packaging time, fail-closed)
✅ telemetry.sh removed
✅ Generated zips dropped from docs/deploy
✅ Fiscal year documented as a configurable worked-example

Two merge-logistics items to handle before merging (not code issues):

Base is features/sre-kpi-query-catalog, which was squash-merged via #2166 — retarget the base to dev and rebase (use rebase --onto origin/dev origin/features/sre-kpi-query-catalog … to avoid src/queries conflicts from the squash).
This branch still contains the full deploy slice that #2169 also carries; decide whether #2169 is consolidated here or rebased on top of this once landed (it's currently stale/broken vs this branch). Approving the content; please sort the base/stack before merge.

flanakin

I gave up reviewing this. There's too much. There are a few big blockers, more questions, etc.

flanakin · 2026-06-18T09:10:45Z

 release/scloud-occurrence-report.md
+
+# Generated SRE Agent recipe packages
+docs/deploy/sre-agent/*/sre-agent-recipe.zip


nit: Why are these in the deploy folder if they're not getting committed? Shouldn't we clean them up so they don't even land there?

flanakin · 2026-06-18T09:11:42Z

+    {
+      "source_path_from_root": "/finops/finops/toolkit/hubs/configure-sre.md",
+      "redirect_url": "/cloud-computing/finops/toolkit/sre-agent/overview",
+      "redirect_document_id": false
+    },


We don't need a redirect for something that never existed before

Suggested change

{

"source_path_from_root": "/finops/finops/toolkit/hubs/configure-sre.md",

"redirect_url": "/cloud-computing/finops/toolkit/sre-agent/overview",

"redirect_document_id": false

},

flanakin · 2026-06-18T09:14:33Z

          href: toolkit/hubs/upgrade.md
        - name: Compatibility guide
          href: toolkit/hubs/compatibility.md
+    - name: FinOps toolkit SRE Agent


nit: I would put this after Workbooks, based on popularity. I know it's new, but that's the general practice we've always have. I could see higher than Alerts and AOE, but probably not Workbooks. We can always move it based on usage.

flanakin · 2026-06-18T09:15:45Z

          href: toolkit/hubs/upgrade.md
        - name: Compatibility guide
          href: toolkit/hubs/compatibility.md
+    - name: FinOps toolkit SRE Agent


Following the naming convention: We don't put "FinOps toolkit" as a prefix on everything. Adding "Azure" to hopefully add context to what "SRE" means. Lowercasing "agent" per Microsoft style.

Suggested change

- name: FinOps toolkit SRE Agent

- name: Azure SRE agent

flanakin · 2026-06-18T09:17:38Z

+          href: toolkit/sre-agent/security.md
+        - name: Troubleshooting
+          href: toolkit/sre-agent/troubleshooting.md
+        - name: Template reference


Is this a deployment template? If so, I'd be consistent with hubs:

Suggested change

- name: Template reference

- name: Deployment template

flanakin · 2026-06-18T10:17:44Z

+- [Deploy Azure SRE Agent with the FinOps toolkit](deploy.md)
+- [Azure SRE Agent in the FinOps toolkit](overview.md)
+- [FinOps hubs](../hubs/finops-hubs-overview.md)


Suggested change

- [Deploy Azure SRE Agent with the FinOps toolkit](deploy.md)

- [Azure SRE Agent in the FinOps toolkit](overview.md)

- [FinOps hubs](../hubs/finops-hubs-overview.md)

- [FinOps hubs](../hubs/finops-hubs-overview.md)

flanakin · 2026-06-18T10:18:28Z

Remove everything added in the docs/deploy folder. Those shouldn't be added now. They aren't part of v14.

flanakin · 2026-06-18T10:23:38Z

-    Get-ChildItem $destDir -Force -Recurse -Filter ".DS_Store" | Remove-Item -Force
-


Why is this removed? We don't want to package it.

Fwiw, we could add this to the -notin list on line 201.

flanakin · 2026-06-18T10:25:32Z

+
+                # Inject the recipe package SHA-256 into azuredeploy.json so the deployment
+                # script can verify package integrity before extraction. Runs only when a
+                # template ships both the compiled template and a recipe zip; no-op otherwise.
+                $deployJson = "$targetDir/azuredeploy.json"
+                $recipeZip = "$targetDir/sre-agent-recipe.zip"
+                if ((Test-Path $deployJson) -and (Test-Path $recipeZip))
+                {
+                    $deployContent = Get-Content $deployJson -Raw
+                    if ($deployContent -match 'PLACEHOLDER_RECIPE_PACKAGE_SHA256')
+                    {
+                        $recipeSha = (Get-FileHash -Algorithm SHA256 -Path $recipeZip).Hash.ToLower()
+                        $deployContent = $deployContent -replace 'PLACEHOLDER_RECIPE_PACKAGE_SHA256', $recipeSha
+                        Set-Content -Path $deployJson -Value $deployContent -NoNewline
+                        Write-Verbose "    Injected recipe package SHA-256 ($recipeSha) into $deployJson"
+                    }
+                }


Do not add SRE-specific code here. This is tech debt. Let's make sure this script can stay generic and we have conventions that help accomplish what's needed. Happy to discuss and brainstorm ideas. This PR is too big for me to see the forest thru the trees right now 😕

flanakin · 2026-06-18T10:27:21Z

+- `bin/deploy.sh` — Canonical deployment entry point copied from the Microsoft starter-lab setup flow and updated for no-azd FinOps deployment
+- `infra/` — Copied-and-updated Microsoft starter-lab Bicep baseline
+- `recipes/finops-hub/` — Recipe content
+- `../claude-plugin/output-styles/ftk-output-style.md` — Uploaded as SRE Agent knowledge and referenced by every scheduled task for report formatting


How is this file used? I'm assuming we can't link to other folders.

flanakin

🤖 [AI][Claude Code] PR Review

Summary: Strong, well-tested PR. All of @RolandKrummenacher's prior blockers are verified resolved in the tree (secure-by-default Low/review, SHA256 fail-closed integrity verification with https + host allowlist, vendored capacity skill with no submodule/symlinks, telemetry.sh removed, no committed zips, fiscal year documented as a worked example, RBAC includes Monitoring Reader). The 39-test Pester suite locks the secure defaults and integrity checks in place, and the "50 tools" claim (37 Kusto + 13 Python) checks out. Only 2 minor suggestions below — no blockers or should-fix items.

💡 Suggestions (2)

Changelog "Changed" entry states the action mode was set to Autonomous, which contradicts the now-default review mode in the deployment templates.
createUiDefinition.json lets a user escalate to High + autonomous with only a tooltip — an InfoBox warning would reinforce the opt-in.

flanakin · 2026-06-18T10:43:48Z

+  - Added 6 knowledge documents for agent onboarding, artifact verification, Teams notification patterns, known issues, document index guidance, and FinOps Toolkit output style.
+  - Added FinOps toolkit SRE Agent documentation pages for Microsoft Learn.
+- **Changed**
+  - Set agent action mode to Autonomous so scheduled tasks can deliver reports without human approval.


🤖 [AI][Claude Code] 💡 Suggestion

This line says the agent action mode was "set to Autonomous," but the secure-by-default resolution changed the shipped defaults: main.bicep (portal) defaults actionMode to review, and createUiDefinition.json defaults to review as well. A reader scanning the changelog would conclude autonomous is the out-of-box behavior, which it no longer is for the portal path.

Consider rewording to reflect that the agent defaults to review mode, with autonomous available as an opt-in for scheduled task delivery — e.g.:

Suggested change

- Set agent action mode to Autonomous so scheduled tasks can deliver reports without human approval.

- Defaulted the deployment to review mode (read-only `Low` access); autonomous mode is an opt-in for unattended scheduled task delivery.

flanakin · 2026-06-18T10:43:48Z

+            "visible": true
+          },
+          {
+            "name": "actionMode",


🤖 [AI][Claude Code] 💡 Suggestion

The defaults here are now safe (accessLevel=Low, actionMode=review), which resolves the earlier blocker. As a follow-on hardening, selecting High (Contributor on target resource groups) and/or autonomous is currently conveyed only via the tooltip. Adding a Microsoft.Common.InfoBox (style Warning) that appears when the operator escalates would make the security trade-off explicit at the point of choice and reinforce the opt-in, matching @RolandKrummenacher's original suggestion to gate High+autonomous behind an explicit acknowledgement. Optional — current defaults already protect the one-click path.

MSBrett · 2026-06-21T21:01:20Z

Closing this PR. The SRE agent content here grew too large and is chasing a moving target. We'll revisit with a smaller, content-only approach aligned to the Azure SRE Agent plugin model.

msbrett added 7 commits June 3, 2026 07:45

feat(queries): add FinOps KPI query catalog

8948b1a

feat(sre-agent): add FinOps recipe content

87443aa

feat(sre-agent): add deployable Azure template

6a01421

fix(queries): remove markdown trailing whitespace

a1f7008

merge: update KPI query catalog base

7e609aa

fix(sre-agent): remove recipe whitespace

a019530

merge: update SRE Agent recipe base

42cc7f4

MSBrett mentioned this pull request Jun 3, 2026

Azure SRE Agent template, tools, schedules, and docs #2111

Closed

microsoft-github-policy-service Bot added the Needs: Review 👀 PR that is ready to be reviewed label Jun 3, 2026

MSBrett marked this pull request as ready for review June 3, 2026 15:54

MSBrett requested a review from flanakin as a code owner June 3, 2026 15:54

msbrett added 9 commits June 3, 2026 09:15

test(sre-agent): keep deployment tests with deploy slice

c044e75

merge: update SRE Agent recipe base

258ee3c

fix(sre-agent): keep deploy checks in deploy slice

05601e6

fix(sre-agent): address deploy CI failures

b332089

test(sre-agent): normalize bash stub path on Windows

3e83081

test(sre-agent): harden bash stub permissions

60d7857

test(sre-agent): stub extras builder in deploy tests

e46aa39

test(sre-agent): fix Windows Python extras stub

de97720

test(sre-agent): preserve Azure resource IDs on Windows

4ed9fb9

MSBrett requested a review from Copilot June 4, 2026 13:02

Copilot started reviewing on behalf of MSBrett June 4, 2026 13:02 View session

Copilot AI reviewed Jun 4, 2026

View reviewed changes

MSBrett enabled auto-merge (squash) June 4, 2026 13:19

fix(sre-agent): address recipe review feedback

a6d68e6

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

MSBrett requested a review from RolandKrummenacher as a code owner June 4, 2026 13:21

chore: update mslearn dates

fe4aa54

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

RolandKrummenacher requested changes Jun 17, 2026

View reviewed changes

microsoft-github-policy-service Bot added Needs: Attention 👋 Issue or PR needs to be reviewed by the author or it will be closed due to no activity and removed Needs: Review 👀 PR that is ready to be reviewed labels Jun 17, 2026

microsoft-github-policy-service Bot assigned MSBrett Jun 17, 2026

RolandKrummenacher mentioned this pull request Jun 17, 2026

feat(sre-agent): add deployable Azure template #2169

Closed

msbrett and others added 2 commits June 17, 2026 09:28

microsoft-github-policy-service Bot added Needs: Review 👀 PR that is ready to be reviewed and removed Needs: Attention 👋 Issue or PR needs to be reviewed by the author or it will be closed due to no activity labels Jun 17, 2026

MSBrett changed the base branch from features/sre-kpi-query-catalog to dev June 17, 2026 17:26

MSBrett marked this pull request as draft June 17, 2026 17:41

auto-merge was automatically disabled June 17, 2026 17:41
Pull request was converted to draft

MSBrett marked this pull request as ready for review June 17, 2026 18:21

MSBrett and others added 2 commits June 17, 2026 13:09

test(sre-agent): expect review-mode default for portal actionMode

4f3eee1

Align Pester assertions with the safer portal default (actionMode = review) while keeping CLI, recipe, and expected-config defaults at autonomous. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

MSBrett requested a review from RolandKrummenacher June 17, 2026 20:38

RolandKrummenacher approved these changes Jun 18, 2026

View reviewed changes

flanakin requested changes Jun 18, 2026

View reviewed changes

microsoft-github-policy-service Bot added Needs: Attention 👋 Issue or PR needs to be reviewed by the author or it will be closed due to no activity and removed Needs: Review 👀 PR that is ready to be reviewed labels Jun 18, 2026

flanakin reviewed Jun 18, 2026

View reviewed changes

MSBrett closed this Jun 21, 2026

microsoft-github-policy-service Bot added Needs: Review 👀 PR that is ready to be reviewed and removed Needs: Attention 👋 Issue or PR needs to be reviewed by the author or it will be closed due to no activity labels Jun 21, 2026

		@@ -0,0 +1,3 @@
		[submodule "src/templates/sre-agent/submodules/azcapman"]

		Load your finops-toolkit and azure-cost-management skills. Lead this as the FinOps practitioner. Delegate all FinOps Hub Kusto evidence collection to `ftk-database-query`, delegate capacity-risk evidence to `azure-capacity-manager`, and consult `chief-financial-officer` for fiscal planning, executive framing, and commitment-risk recommendations.


		Our fiscal year runs July through June. This task runs on January 5 and July 5. On January 5, compare the completed July-December first-half period against the same period in the prior fiscal year and forecast through June 30. On July 5, compare the just-completed July-June fiscal year against the previous fiscal year and prepare the next fiscal year planning view.

		Get-ChildItem $destDir -Force -Recurse -Filter ".DS_Store" \| Remove-Item -Force

	- Set agent action mode to Autonomous so scheduled tasks can deliver reports without human approval.
	- Defaulted the deployment to review mode (read-only `Low` access); autonomous mode is an opt-in for unattended scheduled task delivery.

Conversation

MSBrett commented Jun 3, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RolandKrummenacher left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RolandKrummenacher left a comment

Choose a reason for hiding this comment

Uh oh!

flanakin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

flanakin left a comment

Choose a reason for hiding this comment

💡 Suggestions (2)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MSBrett commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants