feat(sre-agent): add FinOps recipe content#2168
Conversation
There was a problem hiding this comment.
Pull request overview
Adds the FinOps Hub Azure SRE Agent “recipe” content under src/templates/sre-agent/, including subagent definitions, tool manifests (Kusto + Python), skills/reference material, knowledge files, scheduled tasks, and verification/config artifacts intended to support deployment and ongoing ops workflows.
Changes:
- Introduces the
recipes/finops-hub/recipe (agents, tools, skills, scheduled tasks, connectors, knowledge) for the FinOps Toolkit SRE Agent. - Adds verification/config artifacts (
expected-config.json,roles.yaml,agent.json, built-in tool overrides) to validate deployments and expected inventory. - Adds template guardrails and repository wiring (AGENTS guidance, upstream pin, git attributes/ignore, submodule reference).
Reviewed changes
Copilot reviewed 91 out of 93 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/templates/sre-agent/recipes/finops-hub/roles.yaml | Declares managed identity RBAC requirements for subscription/target RGs and ADX. |
| src/templates/sre-agent/recipes/finops-hub/README.md | Recipe overview and deploy usage. |
| src/templates/sre-agent/recipes/finops-hub/knowledge/teams-notification-guide.md | Delivery guidance for Teams/Outlook connector tools. |
| src/templates/sre-agent/recipes/finops-hub/knowledge/onboarding-recommendations.md | Onboarding and connector setup recommendations. |
| src/templates/sre-agent/recipes/finops-hub/knowledge/document-index.md | Knowledge source inventory and verification sentinel. |
| src/templates/sre-agent/recipes/finops-hub/knowledge/chart-artifact-verification.md | Non-visual chart artifact verification guidance. |
| src/templates/sre-agent/recipes/finops-hub/expected-config.json | Expected inventory/config used for verification comparisons. |
| src/templates/sre-agent/recipes/finops-hub/connectors.json | Defines the FinOps Hub Kusto connector for the recipe. |
| src/templates/sre-agent/recipes/finops-hub/config/tools/vm-quota-usage.yaml | Python tool: VM quota usage reporting via ARM REST. |
| src/templates/sre-agent/recipes/finops-hub/config/tools/suppress-advisor-recommendations.yaml | Python tool: bulk Advisor suppression creation via Resource Graph + ARM PUT. |
| src/templates/sre-agent/recipes/finops-hub/config/tools/resource-graph-query.yaml | Python tool: run Resource Graph queries across subscriptions. |
| src/templates/sre-agent/recipes/finops-hub/config/tools/deploy-bulk-anomaly-alerts.yaml | Python tool: deploy Cost Mgmt anomaly alerts across MG subscriptions. |
| src/templates/sre-agent/recipes/finops-hub/config/tools/deploy-budget.yaml | Python tool: create/update subscription budgets via ARM REST. |
| src/templates/sre-agent/recipes/finops-hub/config/tools/deploy-anomaly-alert.yaml | Python tool: create/update anomaly scheduled actions per subscription. |
| src/templates/sre-agent/recipes/finops-hub/config/tools/capacity-reservation-groups.yaml | Python tool: CRG inventory and utilization reporting. |
| src/templates/sre-agent/recipes/finops-hub/config/tools/benefit-recommendations.yaml | Python tool: Cost Mgmt benefit recommendations at billing scope. |
| src/templates/sre-agent/recipes/finops-hub/config/subagents/ftk-hubs-agent.yaml | Subagent definition for hub deployment/ops workflows plus tool/hook config. |
| src/templates/sre-agent/recipes/finops-hub/config/subagents/ftk-database-query.yaml | Subagent definition for Kusto/catalog-based analytics plus tool/hook config. |
| src/templates/sre-agent/recipes/finops-hub/config/subagents/finops-practitioner.yaml | Orchestrator subagent definition and delegation rules. |
| src/templates/sre-agent/recipes/finops-hub/config/subagents/chief-financial-officer.yaml | CFO subagent definition (KB-only) for exec framing. |
| src/templates/sre-agent/recipes/finops-hub/config/subagents/azure-capacity-manager.yaml | Capacity subagent definition plus tool/hook config. |
| src/templates/sre-agent/recipes/finops-hub/config/skills/finops-toolkit/references/workflows/ftk-hubs-healthCheck.md | Skill reference workflow: hub health checks. |
| src/templates/sre-agent/recipes/finops-hub/config/skills/finops-toolkit/references/workflows/ftk-hubs-connect.md | Skill reference workflow: connect/discover hub and persist env. |
| src/templates/sre-agent/recipes/finops-hub/config/skills/finops-toolkit/references/understand-finops-hub-context.md | Foundational grounding step for hub context before analysis. |
| src/templates/sre-agent/recipes/finops-hub/config/skills/finops-toolkit/references/top-cost-drivers.md | Skill reference for ranking/driver analysis patterns. |
| src/templates/sre-agent/recipes/finops-hub/config/skills/finops-toolkit/references/settings-format.md | Skill reference for .ftk/environments.local.md schema. |
| src/templates/sre-agent/recipes/finops-hub/config/skills/finops-toolkit/references/finops-hubs.md | Skill knowledge: query catalog usage, constraints, best practices. |
| src/templates/sre-agent/recipes/finops-hub/config/skills/finops-toolkit/references/custom-dimension-analysis.md | Skill reference for allocation/tag dimension analysis. |
| src/templates/sre-agent/recipes/finops-hub/config/skills/finops-toolkit/references/cost-trend-analysis.md | Skill reference for trend analysis patterns. |
| src/templates/sre-agent/recipes/finops-hub/config/skills/finops-toolkit/references/cost-spike-investigation.md | Skill reference for spike root-cause patterns. |
| src/templates/sre-agent/recipes/finops-hub/config/skills/finops-toolkit/references/cost-anomaly-detection.md | Skill reference for anomaly detection decomposition. |
| src/templates/sre-agent/recipes/finops-hub/config/skills/finops-toolkit/README.md | Skill README describing activation and query catalog usage. |
| src/templates/sre-agent/recipes/finops-hub/config/skills/azure-cost-management/references/Get-BenefitRecommendations.ps1 | Reference script for benefit recommendations via Az REST. |
| src/templates/sre-agent/recipes/finops-hub/config/skills/azure-cost-management/references/azure-macc.md | Cost Mgmt knowledge: MACC tracking and workflows. |
| src/templates/sre-agent/recipes/finops-hub/config/skills/azure-cost-management/references/azure-credits.md | Cost Mgmt knowledge: Azure credits/prepayment workflows. |
| src/templates/sre-agent/recipes/finops-hub/config/skills/azure-cost-management/references/azure-cost-exports.md | Cost Mgmt knowledge: exports config/backfill guidance. |
| src/templates/sre-agent/recipes/finops-hub/config/skills/azure-cost-management/references/azure-budgets.md | Cost Mgmt knowledge: budget creation/notifications/action groups. |
| src/templates/sre-agent/recipes/finops-hub/config/skills/azure-cost-management/README.md | Azure cost management skill README. |
| src/templates/sre-agent/recipes/finops-hub/config/built-in-tools.json | Enables/overrides built-in visualization and log query tools. |
| src/templates/sre-agent/recipes/finops-hub/automations/scheduled-tasks/storage-paas-growth-forecast.yaml | Scheduled task: storage/PaaS growth forecast. |
| src/templates/sre-agent/recipes/finops-hub/automations/scheduled-tasks/sku-availability-audit.yaml | Scheduled task: SKU availability audit. |
| src/templates/sre-agent/recipes/finops-hub/automations/scheduled-tasks/non-compute-quota-audit.yaml | Scheduled task: non-compute quota audit. |
| src/templates/sre-agent/recipes/finops-hub/automations/scheduled-tasks/monitoring-scope-validation.yaml | Scheduled task: monitoring scope validation. |
| src/templates/sre-agent/recipes/finops-hub/automations/scheduled-tasks/db-quota-audit.yaml | Scheduled task: DB quota audit. |
| src/templates/sre-agent/recipes/finops-hub/automations/scheduled-tasks/compute-utilization-trend.yaml | Scheduled task: compute utilization trend. |
| src/templates/sre-agent/recipes/finops-hub/automations/scheduled-tasks/capacity-daily-monitor.yaml | Scheduled task: daily capacity monitor. |
| src/templates/sre-agent/recipes/finops-hub/automations/scheduled-tasks/budget-coverage-audit.yaml | Scheduled task: budget coverage audit. |
| src/templates/sre-agent/recipes/finops-hub/automations/scheduled-tasks/benefit-recommendation-review.yaml | Scheduled task: benefit recommendation executive review. |
| src/templates/sre-agent/recipes/finops-hub/automations/scheduled-tasks/alert-coverage-audit.yaml | Scheduled task: anomaly alert coverage audit. |
| src/templates/sre-agent/recipes/finops-hub/automations/scheduled-tasks/advisor-suppression-review.yaml | Scheduled task: Advisor suppression review. |
| src/templates/sre-agent/recipes/finops-hub/agent.json | Default agent settings (access/action mode, provider, toggles). |
| src/templates/sre-agent/AGENTS.md | Guardrails and template inventory for contributors/agents. |
| src/templates/sre-agent/.upstream-pin | Upstream pin metadata for the starter-lab template. |
| src/templates/sre-agent/.gitignore | Ignores build gate artifacts and un-ignores bin/. |
| src/templates/sre-agent/.gitattributes | Forces LF EOL for shell scripts. |
| .gitmodules | Adds azcapman submodule reference under the SRE agent template. |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
RolandKrummenacher
left a comment
There was a problem hiding this comment.
Reviewed the SRE Agent recipe across three areas — deployable IaC/RBAC, shell/Python/build tooling, and repo structure/packaging/content. The recipe is well-engineered overall (read-only Kusto AllDatabasesViewer scoped to one cluster, system-assigned identity, secure App Insights via reference(), agent-scoped admin role, hermetic Pester tests, yaml.safe_load, parameterized JMESPath/jq, accurate 37-query count). But there are a few merge-blockers, mostly around insecure-by-default deployment and a fragile submodule. Details inline.
High / blocking
- Portal one-click deploy is insecure by default:
accessLevel='High'(Contributor on the agent RG + everytargetResourceGroups) +actionMode='autonomous'+ all 19 shipped scheduled tasksagent_mode: autonomousand enabled → an unattended, write-capable agent with a fleet of self-triggering cron jobs on first click. The CLI path correctly defaults toLow/review; the portal path deliberately escalates both. - The deployment script downloads the recipe package and executes its contents (connectors, Python tools, skills, subagents, autonomous scheduled tasks) with no integrity check, from an overridable
recipePackageUri. - Git submodule
azcapmanis uninitialized, unpinned (no branch), and the capacity skill's files are symlinks into it — the build fails / silently drops the capacity skill withoutgit submodule update --init, and every toolkit consumer/clone/source-zip now inherits the submodule.
Medium
4. telemetry.sh hardcodes an App Insights ikey but is dead code (never sourced), while the README's documented --no-telemetry / SRE_AGENT_NO_TELEMETRY opt-out is a silent no-op.
5. yoy-report.yaml hardcodes a July–June fiscal year (same class as the sibling plugin PR #2167).
Also (no inline anchor — binary): docs/deploy/sre-agent/{14.0,latest}/sre-agent-recipe.zip commits a 346 KB build artifact twice (byte-identical). No other docs/deploy path on dev commits a binary zip; binaries don't diff/review and the two copies will drift. Prefer generating it at release, or add a CI check that both copies match the source recipe + mark it binary in .gitattributes.
(For reference, the sibling PR #2167 carries the plugin versions of much of this skill/agent content; the fiscal-year item there and here should be resolved consistently.)
|
|
||
| @description('Agent access level.') | ||
| @allowed(['Low', 'High']) | ||
| param accessLevel string = 'High' |
There was a problem hiding this comment.
Insecure-by-default portal deployment. The portal entry point defaults accessLevel='High' (line 30) and actionMode='autonomous' (line 34). High grants the agent's identity Contributor (b24988ac) on the agent RG and every targetResourceGroups (infra/modules/resource-group-rbac.bicep), and all 19 shipped scheduled tasks are agent_mode: autonomous and enabled. So a one-click portal deploy yields an unattended, write-capable agent running a fleet of self-triggering cron jobs immediately.
The CLI path (infra/main.bicep:24,28) correctly defaults to Low + review; only this portal wrapper escalates. Recommend defaulting the portal to Low/review too, and/or gating High+autonomous behind an explicit acknowledgement in createUiDefinition.json, and shipping the scheduled tasks disabled (or in review mode) so the operator opts into autonomy.
|
|
||
| Write-Output "Downloading SRE Agent recipe package: $recipePackageUri" | ||
| Invoke-WithRetry -Label 'download recipe package' -Action { | ||
| Invoke-WebRequest -Uri $recipePackageUri -OutFile $zipPath |
There was a problem hiding this comment.
Recipe package is downloaded and its contents executed with no integrity verification. recipePackageUri (line 122) comes from an env var, is fetched via Invoke-WebRequest (line 139), Expand-Archived (line 141), and its extras.json is pushed to the agent — connectors, Python tools, skills, subagents, and autonomous scheduled tasks — which then run with the agent's (Contributor, when High) identity. The default is templateLink-relative (trustworthy from the official template), but the parameter is fully overridable and there's no SHA pin/signature. Anyone who can set recipePackageUri (or MITM a non-pinned host) can inject tools/subagents/cron tasks. Recommend pinning an expected SHA-256 and verifying after download, and restricting the URI to https:// on a known-host allowlist. (recipePackageUri is correctly not exposed in createUiDefinition.json, which limits portal tampering — good — but ARM/CLI callers can still override it.)
| @@ -0,0 +1,3 @@ | |||
| [submodule "src/templates/sre-agent/submodules/azcapman"] | |||
There was a problem hiding this comment.
Submodule makes the recipe build non-hermetic and is currently broken. azcapman is pinned only by gitlink SHA (no branch), is uninitialized, and three files under recipes/finops-hub/config/skills/azure-capacity-management/ (SKILL.md, references/docs, references/scripts) are symlinks into it. build-extras.py raises Expected skill directory missing SKILL.md or has a broken symlink: azure-capacity-management, so Build-SreAgentTemplate.ps1 fails unless git submodule update --init was run first — and no build doc/README states that prerequisite. Adding a submodule also burdens every toolkit clone and any git archive/source zip (empty submodule → broken symlinks). Recommend vendoring the azcapman skill/docs/scripts directly into the template and dropping the submodule + symlinks; if the submodule must stay, pin a branch and add an enforced, documented submodule update --init step in the build.
| # source "$(dirname "$0")/telemetry.sh" | ||
| # send_telemetry "deploy" "finops-hub" "westus3" "true" "false" "true" "false" "deploy" | ||
|
|
||
| _TELEMETRY_IKEY="f10eff7f-b995-4c41-8347-90f0f55d5969" |
There was a problem hiding this comment.
Two issues. (1) This sender is dead code — telemetry.sh is never sourced/executed by any script, yet it's shipped with a hardcoded App Insights ikey and a raw POST to the ingestion endpoint. Meanwhile the README (line 219) tells users to export SRE_AGENT_NO_TELEMETRY=1 and deploy.sh accepts --no-telemetry, but both are no-ops (deploy.sh just shifts the flag "for compatibility"), so the documented opt-out controls nothing. (2) It diverges from the toolkit's telemetry convention — every other template uses the Bicep enableDefaultTelemetry deployment (ARM-visible, parameter opt-out). Recommend either deleting telemetry.sh (and the misleading README/flag), or wiring it through the standard Bicep telemetry mechanism and actually honoring the opt-out. (Hardcoding an App Insights ingestion key in a public repo is itself acceptable — they're write-only — the problems are the dead code + misleading docs + divergence.)
| Load your finops-toolkit and azure-cost-management skills. Lead this as the FinOps practitioner. Delegate all FinOps Hub Kusto evidence collection to `ftk-database-query`, delegate capacity-risk evidence to `azure-capacity-manager`, and consult `chief-financial-officer` for fiscal planning, executive framing, and commitment-risk recommendations. | ||
|
|
||
|
|
||
| Our fiscal year runs July through June. This task runs on January 5 and July 5. On January 5, compare the completed July-December first-half period against the same period in the prior fiscal year and forecast through June 30. On July 5, compare the just-completed July-June fiscal year against the previous fiscal year and prepare the next fiscal year planning view. |
There was a problem hiding this comment.
Hardcoded July–June fiscal year baked into the task prompt ("Our fiscal year runs July through June... forecast through June 30"; also lines 87, 92, 145, 165). That's Microsoft's FY, not the customer's — for a calendar-FY org this produces the wrong comparison windows and forecast dates. Same class of issue flagged on the sibling plugin PR #2167's ftk-ytd-report; recommend resolving both consistently (parameterize fiscal-year start/end or clearly mark it as a required customization point).
# Conflicts: # docs-mslearn/toolkit/changelog.md # src/queries/INDEX.md # src/queries/KPI.md # src/queries/catalog/ai-cost-by-application.kql # src/queries/catalog/ai-daily-trend.kql # src/queries/catalog/ai-model-cost-comparison.kql # src/queries/catalog/ai-token-usage-breakdown.kql # src/queries/catalog/allocation-accuracy-index.kql # src/queries/catalog/anomaly-detection-rate.kql # src/queries/catalog/anomaly-variance-total.kql # src/queries/catalog/commitment-discount-waste.kql # src/queries/catalog/commitment-utilization-score.kql # src/queries/catalog/compute-cost-per-core.kql # src/queries/catalog/compute-spend-commitment-coverage.kql # src/queries/catalog/cost-optimization-index.kql # src/queries/catalog/cost-per-gb-stored.kql # src/queries/catalog/cost-visibility-delay.kql # src/queries/catalog/data-update-frequency.kql # src/queries/catalog/macc-consumption-vs-commitment.kql # src/queries/catalog/percentage-unallocated-costs.kql # src/queries/catalog/percentage-untagged-costs.kql # src/queries/catalog/storage-tier-distribution.kql # src/queries/catalog/tagging-policy-compliance.kql # src/queries/finops-hub-database-guide.md
…omizable fiscal year - Vendor azure-capacity-management skill as real files and drop the azcapman git submodule + symlinks so a clean clone builds all 3 skills (Roland H3). - Default the recipe agent to Low (read-only) access and autonomous reporting, and stop granting subscription-wide Reader by default (Roland H1); reports run unattended on a least-privilege identity. - Mark the yoy-report July-June fiscal year as a documented, customizable example instead of a silent assumption (Roland M5). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Pull request was converted to draft
…artifacts
Collapse the recipe and deploy slices into one coherent template PR (the recipe
is a subdirectory of the template and is packaged into the deploy artifacts, so
they cannot be split into independently-buildable PRs). Brings the deploy-side
security fixes onto the unified template and regenerates the committed artifacts:
- Secure-by-default: portal/CLI accessLevel=Low (read-only) + autonomous reporting;
subscription Reader off by default (Roland H1).
- Recipe package integrity: SHA-256 (fail-closed) + https host allowlist before
Expand-Archive; hash now injected into azuredeploy.json at package time (Roland H2).
- Removed dead bin/telemetry.sh + hardcoded App Insights key + no-op opt-out (Roland M4).
- Parameterized deployer principalType for CI/CD OIDC service-principal deploys.
- Regenerated docs/deploy/sre-agent/{14.0,latest} azuredeploy.json + createUiDefinition
+ recipe zip from scrubbed queries and vendored capacity skill: zips are leak-free
and the integrity hash matches. All 39 SRE deploy tests pass.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove broken yaml-to-deck GitHub links (lines 82, 166) - Removed PowerPoint template reference (non-existent path) - Removed lint.py reference (non-existent path) - Kept Power BI theme reference (actual file exists) - Update ms.date to 06/17/2026 in 31 docs-mslearn files (was 06/05/2026, stale by 12 days) - Genericize brand.md: replace SRE Agent examples with product-neutral patterns - Line 113: Changed 'FinOps toolkit SRE Agent' to generic placeholder - Line 135: Replaced SRE-specific examples with <Product> placeholders - Line 159: Changed anaphora example from 'Azure SRE Agent' to generic 'Azure Data Explorer' - Line 184-192: Replaced all SRE Agent examples in page title table with Azure Data Explorer - Line 200-204: Updated anaphora short form table (removed SRE Agent, added Power BI) - Line 206: Generic 'product vs component' instead of 'agent vs subagent' - Line 210: Replaced SRE Agent in pattern list with generic products - Line 216: Changed 'subagents' to 'agents' (generic term) - Line 124: Changed URL example from SRE Agent to Azure Data Explorer - Line 240: Generic placeholder in external contributor guidance Brand guidance now provides product-neutral patterns reusable across all FinOps toolkit integrations, not coupled to SRE Agent work (PR #2168). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…enerated zips Set ms.date to today for changed docs, change portal default actionMode to review for safer defaults, and remove build-generated recipe zips with a gitignore entry. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Align Pester assertions with the safer portal default (actionMode = review) while keeping CLI, recipe, and expected-config defaults at autonomous. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
RolandKrummenacher
left a comment
There was a problem hiding this comment.
Approving on content — all review findings are resolved:
- ✅ Portal deploy defaults now
accessLevel=Low+actionMode=review(secure by default; test added) - ✅
azcapmancapacity skill vendored as real files; submodule +.gitmodulesremoved - ✅ Recipe package integrity: SHA256 verification added and properly wired (Package-Toolkit.ps1 injects the real hash at packaging time, fail-closed)
- ✅
telemetry.shremoved - ✅ Generated zips dropped from
docs/deploy - ✅ Fiscal year documented as a configurable worked-example
Two merge-logistics items to handle before merging (not code issues):
- Base is
features/sre-kpi-query-catalog, which was squash-merged via #2166 — retarget the base todevand rebase (userebase --onto origin/dev origin/features/sre-kpi-query-catalog …to avoidsrc/queriesconflicts from the squash). - This branch still contains the full deploy slice that #2169 also carries; decide whether #2169 is consolidated here or rebased on top of this once landed (it's currently stale/broken vs this branch). Approving the content; please sort the base/stack before merge.
flanakin
left a comment
There was a problem hiding this comment.
I gave up reviewing this. There's too much. There are a few big blockers, more questions, etc.
| release/scloud-occurrence-report.md | ||
|
|
||
| # Generated SRE Agent recipe packages | ||
| docs/deploy/sre-agent/*/sre-agent-recipe.zip |
There was a problem hiding this comment.
nit: Why are these in the deploy folder if they're not getting committed? Shouldn't we clean them up so they don't even land there?
| { | ||
| "source_path_from_root": "/finops/finops/toolkit/hubs/configure-sre.md", | ||
| "redirect_url": "/cloud-computing/finops/toolkit/sre-agent/overview", | ||
| "redirect_document_id": false | ||
| }, |
There was a problem hiding this comment.
We don't need a redirect for something that never existed before
| { | |
| "source_path_from_root": "/finops/finops/toolkit/hubs/configure-sre.md", | |
| "redirect_url": "/cloud-computing/finops/toolkit/sre-agent/overview", | |
| "redirect_document_id": false | |
| }, |
| href: toolkit/hubs/upgrade.md | ||
| - name: Compatibility guide | ||
| href: toolkit/hubs/compatibility.md | ||
| - name: FinOps toolkit SRE Agent |
There was a problem hiding this comment.
nit: I would put this after Workbooks, based on popularity. I know it's new, but that's the general practice we've always have. I could see higher than Alerts and AOE, but probably not Workbooks. We can always move it based on usage.
| href: toolkit/hubs/upgrade.md | ||
| - name: Compatibility guide | ||
| href: toolkit/hubs/compatibility.md | ||
| - name: FinOps toolkit SRE Agent |
There was a problem hiding this comment.
Following the naming convention: We don't put "FinOps toolkit" as a prefix on everything. Adding "Azure" to hopefully add context to what "SRE" means. Lowercasing "agent" per Microsoft style.
| - name: FinOps toolkit SRE Agent | |
| - name: Azure SRE agent |
| href: toolkit/sre-agent/security.md | ||
| - name: Troubleshooting | ||
| href: toolkit/sre-agent/troubleshooting.md | ||
| - name: Template reference |
There was a problem hiding this comment.
Is this a deployment template? If so, I'd be consistent with hubs:
| - name: Template reference | |
| - name: Deployment template |
| - [Deploy Azure SRE Agent with the FinOps toolkit](deploy.md) | ||
| - [Azure SRE Agent in the FinOps toolkit](overview.md) | ||
| - [FinOps hubs](../hubs/finops-hubs-overview.md) |
There was a problem hiding this comment.
| - [Deploy Azure SRE Agent with the FinOps toolkit](deploy.md) | |
| - [Azure SRE Agent in the FinOps toolkit](overview.md) | |
| - [FinOps hubs](../hubs/finops-hubs-overview.md) | |
| - [FinOps hubs](../hubs/finops-hubs-overview.md) |
There was a problem hiding this comment.
Remove everything added in the docs/deploy folder. Those shouldn't be added now. They aren't part of v14.
| Get-ChildItem $destDir -Force -Recurse -Filter ".DS_Store" | Remove-Item -Force | ||
|
|
There was a problem hiding this comment.
Why is this removed? We don't want to package it.
Fwiw, we could add this to the -notin list on line 201.
|
|
||
| # Inject the recipe package SHA-256 into azuredeploy.json so the deployment | ||
| # script can verify package integrity before extraction. Runs only when a | ||
| # template ships both the compiled template and a recipe zip; no-op otherwise. | ||
| $deployJson = "$targetDir/azuredeploy.json" | ||
| $recipeZip = "$targetDir/sre-agent-recipe.zip" | ||
| if ((Test-Path $deployJson) -and (Test-Path $recipeZip)) | ||
| { | ||
| $deployContent = Get-Content $deployJson -Raw | ||
| if ($deployContent -match 'PLACEHOLDER_RECIPE_PACKAGE_SHA256') | ||
| { | ||
| $recipeSha = (Get-FileHash -Algorithm SHA256 -Path $recipeZip).Hash.ToLower() | ||
| $deployContent = $deployContent -replace 'PLACEHOLDER_RECIPE_PACKAGE_SHA256', $recipeSha | ||
| Set-Content -Path $deployJson -Value $deployContent -NoNewline | ||
| Write-Verbose " Injected recipe package SHA-256 ($recipeSha) into $deployJson" | ||
| } | ||
| } |
There was a problem hiding this comment.
Do not add SRE-specific code here. This is tech debt. Let's make sure this script can stay generic and we have conventions that help accomplish what's needed. Happy to discuss and brainstorm ideas. This PR is too big for me to see the forest thru the trees right now 😕
| - `bin/deploy.sh` — Canonical deployment entry point copied from the Microsoft starter-lab setup flow and updated for no-azd FinOps deployment | ||
| - `infra/` — Copied-and-updated Microsoft starter-lab Bicep baseline | ||
| - `recipes/finops-hub/` — Recipe content | ||
| - `../claude-plugin/output-styles/ftk-output-style.md` — Uploaded as SRE Agent knowledge and referenced by every scheduled task for report formatting |
There was a problem hiding this comment.
How is this file used? I'm assuming we can't link to other folders.
flanakin
left a comment
There was a problem hiding this comment.
🤖 [AI][Claude Code] PR Review
Summary: Strong, well-tested PR. All of @RolandKrummenacher's prior blockers are verified resolved in the tree (secure-by-default Low/review, SHA256 fail-closed integrity verification with https + host allowlist, vendored capacity skill with no submodule/symlinks, telemetry.sh removed, no committed zips, fiscal year documented as a worked example, RBAC includes Monitoring Reader). The 39-test Pester suite locks the secure defaults and integrity checks in place, and the "50 tools" claim (37 Kusto + 13 Python) checks out. Only 2 minor suggestions below — no blockers or should-fix items.
💡 Suggestions (2)
- Changelog "Changed" entry states the action mode was set to Autonomous, which contradicts the now-default
reviewmode in the deployment templates. createUiDefinition.jsonlets a user escalate to High + autonomous with only a tooltip — an InfoBox warning would reinforce the opt-in.
| - Added 6 knowledge documents for agent onboarding, artifact verification, Teams notification patterns, known issues, document index guidance, and FinOps Toolkit output style. | ||
| - Added FinOps toolkit SRE Agent documentation pages for Microsoft Learn. | ||
| - **Changed** | ||
| - Set agent action mode to Autonomous so scheduled tasks can deliver reports without human approval. |
There was a problem hiding this comment.
🤖 [AI][Claude Code] 💡 Suggestion
This line says the agent action mode was "set to Autonomous," but the secure-by-default resolution changed the shipped defaults: main.bicep (portal) defaults actionMode to review, and createUiDefinition.json defaults to review as well. A reader scanning the changelog would conclude autonomous is the out-of-box behavior, which it no longer is for the portal path.
Consider rewording to reflect that the agent defaults to review mode, with autonomous available as an opt-in for scheduled task delivery — e.g.:
| - Set agent action mode to Autonomous so scheduled tasks can deliver reports without human approval. | |
| - Defaulted the deployment to review mode (read-only `Low` access); autonomous mode is an opt-in for unattended scheduled task delivery. |
| "visible": true | ||
| }, | ||
| { | ||
| "name": "actionMode", |
There was a problem hiding this comment.
🤖 [AI][Claude Code] 💡 Suggestion
The defaults here are now safe (accessLevel=Low, actionMode=review), which resolves the earlier blocker. As a follow-on hardening, selecting High (Contributor on target resource groups) and/or autonomous is currently conveyed only via the tooltip. Adding a Microsoft.Common.InfoBox (style Warning) that appears when the operator escalates would make the security trade-off explicit at the point of choice and reinforce the opt-in, matching @RolandKrummenacher's original suggestion to gate High+autonomous behind an explicit acknowledgement. Optional — current defaults already protect the one-click path.
|
Closing this PR. The SRE agent content here grew too large and is chasing a moving target. We'll revisit with a smaller, content-only approach aligned to the Azure SRE Agent plugin model. |
Replaces part of #2111.
Scope:
Review notes:
features/sre-kpi-query-catalog.memory://projects/finops-toolkit/pr-2111-split-plan.Verification:
git diff --check features/sre-kpi-query-catalog..features/sre-agent-recipe