feat(sre-agent): add deployable Azure template#2169
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new deployable FinOps Toolkit SRE Agent template under src/templates/sre-agent/, including subscription-scoped Bicep entry points, supporting infra modules, post-provision “apply extras” automation, and Microsoft Learn documentation + deploy-to-Azure artifacts. This PR also updates the toolkit build pipeline to package the new template and generate the portal recipe zip.
Changes:
- Introduces subscription-scoped Bicep entry points and infra modules (monitoring, agent resource, RBAC, optional ADX principal assignment, and deployment-script-based “apply extras”).
- Adds deployment automation (CLI wrapper scripts, prerequisites check, GitHub Actions example) and packaging/build integration for the template.
- Adds Learn docs + deploy artifacts (Create UI Definition copies, TOC updates, redirection, and changelog entry).
Reviewed changes
Copilot reviewed 36 out of 43 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| src/templates/sre-agent/package-manifest.json | Declares which generated artifacts ship for portal deployment packaging. |
| src/templates/sre-agent/main.bicep | Portal-focused subscription-scoped entry point wiring infra + extras. |
| src/templates/sre-agent/infra/scripts/Apply-SreAgentExtras.ps1 | Deployment-script logic to apply connectors/tools/skills/agents/scheduled tasks from the packaged recipe. |
| src/templates/sre-agent/infra/resources.bicep | Resource-group scoped orchestration module for monitoring + agent deployment. |
| src/templates/sre-agent/infra/modules/sre-agent.bicep | Defines Microsoft.App/agents resource and deployer RBAC on the agent. |
| src/templates/sre-agent/infra/modules/resource-group-rbac.bicep | Assigns RG-scoped roles to the agent managed identity based on access level. |
| src/templates/sre-agent/infra/modules/monitoring.bicep | Deploys Log Analytics and workspace-based Application Insights. |
| src/templates/sre-agent/infra/modules/kusto-all-databases-viewer-rbac.bicep | Optional ADX AllDatabasesViewer principal assignment. |
| src/templates/sre-agent/infra/modules/apply-extras.bicep | Creates UAMI + deploymentScripts resource to run Apply-SreAgentExtras.ps1. |
| src/templates/sre-agent/infra/main.bicep | CLI-focused subscription-scoped infra entry point (RG + resources + RBAC + optional ADX RBAC). |
| src/templates/sre-agent/examples/ci-cd/github-actions-deploy.yml | Example GitHub Actions workflow to run the deployment wrapper. |
| src/templates/sre-agent/createUiDefinition.json | Azure portal UI definition for Deploy-to-Azure experience. |
| src/templates/sre-agent/bin/telemetry.sh | Adds a helper for best-effort usage telemetry (currently standalone). |
| src/templates/sre-agent/bin/post-provision.sh | Compatibility wrapper forwarding to the new apply-extras flow. |
| src/templates/sre-agent/bin/deploy.sh | CLI deployment wrapper (parameter validation, deterministic deploy naming, infra deploy, then apply extras). |
| src/templates/sre-agent/bin/check-prerequisites.sh | Preflight checks for required local tools (az/jq/python/PyYAML/etc.). |
| src/templates/sre-agent/.build.config | Registers the template’s custom build step for packaging. |
| src/scripts/Build-Toolkit.ps1 | Adjusts template copy behavior during build (affects which files ship). |
| src/scripts/Build-SreAgentTemplate.ps1 | New build step to generate and zip the portal recipe extras package. |
| docs/deploy/sre-agent/latest/createUiDefinition.json | Published deploy artifact copy for “latest” portal flow. |
| docs/deploy/sre-agent/14.0/createUiDefinition.json | Published deploy artifact copy for versioned portal flow. |
| docs-mslearn/toolkit/sre-agent/troubleshooting.md | Learn troubleshooting guide for SRE Agent deployments and runtime issues. |
| docs-mslearn/toolkit/sre-agent/tools.md | Learn reference catalog of shipped tools (Kusto + Python). |
| docs-mslearn/toolkit/sre-agent/template.md | Learn template reference (parameters/outputs/scripts/modules). |
| docs-mslearn/toolkit/sre-agent/security.md | Learn security/permissions guidance for the deployment. |
| docs-mslearn/toolkit/sre-agent/scheduled-tasks.md | Learn overview of scheduled tasks and their behavior. |
| docs-mslearn/toolkit/sre-agent/python-tools.md | Learn reference for the shipped Python tools. |
| docs-mslearn/toolkit/sre-agent/overview.md | Learn overview page for the SRE Agent template and architecture. |
| docs-mslearn/toolkit/sre-agent/knowledge.md | Learn guidance for knowledge/memory usage with the agent. |
| docs-mslearn/toolkit/sre-agent/get-started.md | Learn quickstart for what to do post-deployment. |
| docs-mslearn/toolkit/sre-agent/deploy.md | Learn deployment tutorial (portal + CLI) and validation steps. |
| docs-mslearn/toolkit/sre-agent/agents.md | Learn explanation of specialist agents/skills and handoff model. |
| docs-mslearn/toolkit/hubs/configure-sre.md | Learn doc integrating SRE Agent deployment guidance into hubs documentation. |
| docs-mslearn/toolkit/changelog.md | Changelog updates to include SRE Agent documentation/template additions. |
| docs-mslearn/TOC.yml | Adds SRE Agent section to the Learn TOC; renames “Workload” → “Usage” optimization entries. |
| docs-mslearn/.openpublishing.redirection.finops.json | Adds a redirection entry related to the SRE Agent docs pathing. |
- infra/main.bicep: union agent RG with targetResourceGroups so the agent RG is never omitted from targetRgIds (matches the portal entry point). - Build-Toolkit.ps1: restore Get-ChildItem -Force so dotfiles such as .upstream-pin, .gitignore, and .gitattributes are copied into the release output; restore the .DS_Store cleanup pass. - scheduled-tasks.md: change agent owner column to finops-practitioner for all 11 tasks that delegate to specialists in the prompt, matching the shipped YAML manifests; add a clarifying note about delegation. - tools.md, kusto-tools.md: replace ../../../src/... relative links with https://github.com/microsoft/finops-toolkit/tree/main/src/... so the links resolve on learn.microsoft.com. - docs-mslearn (14 files): bump ms.date to 06/04/2026 per AGENTS.md guidance for the update-mslearn-dates CI workflow. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
RolandKrummenacher
left a comment
There was a problem hiding this comment.
Pointer review — the substantive findings for this deploy slice are on the recipe PR, to avoid duplicating threads.
This PR overlaps almost entirely with #2168. All 43 files changed here are also changed in #2168 (recipe) — the deployable-template content (infra/, bin/, main.bicep/createUiDefinition.json, docs/deploy/sre-agent/**, the deploy tests, telemetry.sh, the committed sre-agent-recipe.zip) lives in both branches, and the security-critical files (main.bicep, infra/scripts/Apply-SreAgentExtras.ps1, bin/telemetry.sh) are byte-identical between the two. So the deploy-slice findings I left on #2168 apply here unchanged:
- Portal one-click defaults to
accessLevel=High(Contributor on the agent RG + target RGs) +actionMode=autonomous, with all 19 scheduled tasksagent_mode: autonomous+ enabled → unattended write-capable agent on first deploy (CLI path correctly defaults to Low/review). Apply-SreAgentExtras.ps1downloads the recipe package and executes its tools/subagents/cron with no integrity check, from an overridablerecipePackageUri.telemetry.shis dead code shipping a hardcoded ikey while the README--no-telemetryopt-out is a no-op.- Committed binary
sre-agent-recipe.zip(dup'd in14.0/+latest/), first binary underdocs/deploy.
See the full inline detail in the #2168 review: #2168 (review)
Structural note: because #2169 is a strict subset of #2168 (and the two branches have drifted by ~10 files — build-extras.py, expected-config.json, roles.yaml, infra/main.bicep, a few docs), the recipe/deploy split isn't actually realized in the branches and they can't merge independently without conflict/duplication. Worth deciding the division deliberately — pull the deploy-template files out of #2168 so this PR owns them, or collapse the two — and rebasing the stack onto current dev (the base here, features/sre-agent-recipe, in turn sits on the squash-merged #2166). Marking this COMMENT rather than Changes-Requested since the blocking verdict and detail live on #2168; happy to mirror the inline comments here instead if you'd prefer them on this PR.
…etry - Default portal/CLI to Low (read-only) access; keep autonomous reporting so the 19 scheduled report tasks run unattended on a least-privilege identity. Stop defaulting subscription-wide Reader. (Roland H1) - Verify the downloaded recipe package: require an expected SHA-256 (fail-closed) and an https host allowlist before Expand-Archive. (Roland H2) - Remove dead bin/telemetry.sh, its hardcoded App Insights key, and the no-op --no-telemetry / SRE_AGENT_NO_TELEMETRY opt-out. (Roland M4) - Parameterize deployer principalType so CI/CD OIDC service-principal deploys assign RBAC correctly. (Roland #2167-adjacent CI finding) Compiled azuredeploy.json + recipe zip are regenerated in a later build pass (two deferred tests remain red until then). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Pull request was closed
|
Superseded by #2168. The SRE Agent recipe and deployment template are one indivisible build unit — All of this PR's security fixes were brought onto the unified #2168: Low read-only defaults + autonomous reporting, subscription Reader off by default, recipe-package SHA-256 integrity (fail-closed) + https allowlist, removal of the dead telemetry script and hardcoded key, and a parameterized deployer principalType for CI/CD OIDC. #2168 also carries the clean recipe content and regenerated leak-free deploy artifacts. |
Replaces part of #2111.
Scope:
Review notes:
features/sre-agent-recipe.memory://projects/finops-toolkit/pr-2111-split-plan.Verification:
git diff --check features/sre-agent-recipe..features/sre-agent-deploy