Skip to content

feat(sre-agent): add deployable Azure template#2169

Closed
MSBrett wants to merge 12 commits into
features/sre-agent-recipefrom
features/sre-agent-deploy
Closed

feat(sre-agent): add deployable Azure template#2169
MSBrett wants to merge 12 commits into
features/sre-agent-recipefrom
features/sre-agent-deploy

Conversation

@MSBrett

@MSBrett MSBrett commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Replaces part of #2111.

Scope:

  • Adds the SRE Agent deployable Azure template, Bicep modules, deployment scripts, portal UI definition, package manifest, generated deploy artifacts, and SRE Agent docs.
  • Keeps deployment/build/docs review separate from recipe content review.

Review notes:

  • Base: features/sre-agent-recipe.
  • Depends on the SRE Agent recipe PR.
  • Split plan: memory://projects/finops-toolkit/pr-2111-split-plan.

Verification:

@microsoft-github-policy-service microsoft-github-policy-service Bot added the Needs: Review 👀 PR that is ready to be reviewed label Jun 3, 2026
@MSBrett MSBrett marked this pull request as ready for review June 3, 2026 15:54

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new deployable FinOps Toolkit SRE Agent template under src/templates/sre-agent/, including subscription-scoped Bicep entry points, supporting infra modules, post-provision “apply extras” automation, and Microsoft Learn documentation + deploy-to-Azure artifacts. This PR also updates the toolkit build pipeline to package the new template and generate the portal recipe zip.

Changes:

  • Introduces subscription-scoped Bicep entry points and infra modules (monitoring, agent resource, RBAC, optional ADX principal assignment, and deployment-script-based “apply extras”).
  • Adds deployment automation (CLI wrapper scripts, prerequisites check, GitHub Actions example) and packaging/build integration for the template.
  • Adds Learn docs + deploy artifacts (Create UI Definition copies, TOC updates, redirection, and changelog entry).

Reviewed changes

Copilot reviewed 36 out of 43 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
src/templates/sre-agent/package-manifest.json Declares which generated artifacts ship for portal deployment packaging.
src/templates/sre-agent/main.bicep Portal-focused subscription-scoped entry point wiring infra + extras.
src/templates/sre-agent/infra/scripts/Apply-SreAgentExtras.ps1 Deployment-script logic to apply connectors/tools/skills/agents/scheduled tasks from the packaged recipe.
src/templates/sre-agent/infra/resources.bicep Resource-group scoped orchestration module for monitoring + agent deployment.
src/templates/sre-agent/infra/modules/sre-agent.bicep Defines Microsoft.App/agents resource and deployer RBAC on the agent.
src/templates/sre-agent/infra/modules/resource-group-rbac.bicep Assigns RG-scoped roles to the agent managed identity based on access level.
src/templates/sre-agent/infra/modules/monitoring.bicep Deploys Log Analytics and workspace-based Application Insights.
src/templates/sre-agent/infra/modules/kusto-all-databases-viewer-rbac.bicep Optional ADX AllDatabasesViewer principal assignment.
src/templates/sre-agent/infra/modules/apply-extras.bicep Creates UAMI + deploymentScripts resource to run Apply-SreAgentExtras.ps1.
src/templates/sre-agent/infra/main.bicep CLI-focused subscription-scoped infra entry point (RG + resources + RBAC + optional ADX RBAC).
src/templates/sre-agent/examples/ci-cd/github-actions-deploy.yml Example GitHub Actions workflow to run the deployment wrapper.
src/templates/sre-agent/createUiDefinition.json Azure portal UI definition for Deploy-to-Azure experience.
src/templates/sre-agent/bin/telemetry.sh Adds a helper for best-effort usage telemetry (currently standalone).
src/templates/sre-agent/bin/post-provision.sh Compatibility wrapper forwarding to the new apply-extras flow.
src/templates/sre-agent/bin/deploy.sh CLI deployment wrapper (parameter validation, deterministic deploy naming, infra deploy, then apply extras).
src/templates/sre-agent/bin/check-prerequisites.sh Preflight checks for required local tools (az/jq/python/PyYAML/etc.).
src/templates/sre-agent/.build.config Registers the template’s custom build step for packaging.
src/scripts/Build-Toolkit.ps1 Adjusts template copy behavior during build (affects which files ship).
src/scripts/Build-SreAgentTemplate.ps1 New build step to generate and zip the portal recipe extras package.
docs/deploy/sre-agent/latest/createUiDefinition.json Published deploy artifact copy for “latest” portal flow.
docs/deploy/sre-agent/14.0/createUiDefinition.json Published deploy artifact copy for versioned portal flow.
docs-mslearn/toolkit/sre-agent/troubleshooting.md Learn troubleshooting guide for SRE Agent deployments and runtime issues.
docs-mslearn/toolkit/sre-agent/tools.md Learn reference catalog of shipped tools (Kusto + Python).
docs-mslearn/toolkit/sre-agent/template.md Learn template reference (parameters/outputs/scripts/modules).
docs-mslearn/toolkit/sre-agent/security.md Learn security/permissions guidance for the deployment.
docs-mslearn/toolkit/sre-agent/scheduled-tasks.md Learn overview of scheduled tasks and their behavior.
docs-mslearn/toolkit/sre-agent/python-tools.md Learn reference for the shipped Python tools.
docs-mslearn/toolkit/sre-agent/overview.md Learn overview page for the SRE Agent template and architecture.
docs-mslearn/toolkit/sre-agent/knowledge.md Learn guidance for knowledge/memory usage with the agent.
docs-mslearn/toolkit/sre-agent/get-started.md Learn quickstart for what to do post-deployment.
docs-mslearn/toolkit/sre-agent/deploy.md Learn deployment tutorial (portal + CLI) and validation steps.
docs-mslearn/toolkit/sre-agent/agents.md Learn explanation of specialist agents/skills and handoff model.
docs-mslearn/toolkit/hubs/configure-sre.md Learn doc integrating SRE Agent deployment guidance into hubs documentation.
docs-mslearn/toolkit/changelog.md Changelog updates to include SRE Agent documentation/template additions.
docs-mslearn/TOC.yml Adds SRE Agent section to the Learn TOC; renames “Workload” → “Usage” optimization entries.
docs-mslearn/.openpublishing.redirection.finops.json Adds a redirection entry related to the SRE Agent docs pathing.

Comment thread src/templates/sre-agent/infra/main.bicep Outdated
Comment thread src/scripts/Build-Toolkit.ps1 Outdated
Comment thread docs-mslearn/toolkit/sre-agent/scheduled-tasks.md Outdated
Comment thread docs-mslearn/toolkit/sre-agent/scheduled-tasks.md Outdated
Comment thread docs-mslearn/toolkit/sre-agent/scheduled-tasks.md Outdated
Comment thread docs-mslearn/toolkit/sre-agent/scheduled-tasks.md Outdated
Comment thread docs-mslearn/toolkit/sre-agent/scheduled-tasks.md Outdated
Comment thread docs-mslearn/toolkit/sre-agent/scheduled-tasks.md Outdated
Comment thread docs-mslearn/toolkit/sre-agent/tools.md Outdated
Comment thread docs-mslearn/toolkit/sre-agent/overview.md Outdated
@MSBrett MSBrett enabled auto-merge (squash) June 4, 2026 13:16
@MSBrett MSBrett disabled auto-merge June 4, 2026 13:16
@MSBrett MSBrett enabled auto-merge (squash) June 4, 2026 13:18
- infra/main.bicep: union agent RG with targetResourceGroups so the agent
  RG is never omitted from targetRgIds (matches the portal entry point).
- Build-Toolkit.ps1: restore Get-ChildItem -Force so dotfiles such as
  .upstream-pin, .gitignore, and .gitattributes are copied into the
  release output; restore the .DS_Store cleanup pass.
- scheduled-tasks.md: change agent owner column to finops-practitioner
  for all 11 tasks that delegate to specialists in the prompt, matching
  the shipped YAML manifests; add a clarifying note about delegation.
- tools.md, kusto-tools.md: replace ../../../src/... relative links with
  https://github.com/microsoft/finops-toolkit/tree/main/src/... so the
  links resolve on learn.microsoft.com.
- docs-mslearn (14 files): bump ms.date to 06/04/2026 per AGENTS.md
  guidance for the update-mslearn-dates CI workflow.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@RolandKrummenacher RolandKrummenacher left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pointer review — the substantive findings for this deploy slice are on the recipe PR, to avoid duplicating threads.

This PR overlaps almost entirely with #2168. All 43 files changed here are also changed in #2168 (recipe) — the deployable-template content (infra/, bin/, main.bicep/createUiDefinition.json, docs/deploy/sre-agent/**, the deploy tests, telemetry.sh, the committed sre-agent-recipe.zip) lives in both branches, and the security-critical files (main.bicep, infra/scripts/Apply-SreAgentExtras.ps1, bin/telemetry.sh) are byte-identical between the two. So the deploy-slice findings I left on #2168 apply here unchanged:

  • Portal one-click defaults to accessLevel=High (Contributor on the agent RG + target RGs) + actionMode=autonomous, with all 19 scheduled tasks agent_mode: autonomous + enabled → unattended write-capable agent on first deploy (CLI path correctly defaults to Low/review).
  • Apply-SreAgentExtras.ps1 downloads the recipe package and executes its tools/subagents/cron with no integrity check, from an overridable recipePackageUri.
  • telemetry.sh is dead code shipping a hardcoded ikey while the README --no-telemetry opt-out is a no-op.
  • Committed binary sre-agent-recipe.zip (dup'd in 14.0/ + latest/), first binary under docs/deploy.

See the full inline detail in the #2168 review: #2168 (review)

Structural note: because #2169 is a strict subset of #2168 (and the two branches have drifted by ~10 files — build-extras.py, expected-config.json, roles.yaml, infra/main.bicep, a few docs), the recipe/deploy split isn't actually realized in the branches and they can't merge independently without conflict/duplication. Worth deciding the division deliberately — pull the deploy-template files out of #2168 so this PR owns them, or collapse the two — and rebasing the stack onto current dev (the base here, features/sre-agent-recipe, in turn sits on the squash-merged #2166). Marking this COMMENT rather than Changes-Requested since the blocking verdict and detail live on #2168; happy to mirror the inline comments here instead if you'd prefer them on this PR.

…etry

- Default portal/CLI to Low (read-only) access; keep autonomous reporting so the
  19 scheduled report tasks run unattended on a least-privilege identity. Stop
  defaulting subscription-wide Reader. (Roland H1)
- Verify the downloaded recipe package: require an expected SHA-256 (fail-closed)
  and an https host allowlist before Expand-Archive. (Roland H2)
- Remove dead bin/telemetry.sh, its hardcoded App Insights key, and the no-op
  --no-telemetry / SRE_AGENT_NO_TELEMETRY opt-out. (Roland M4)
- Parameterize deployer principalType so CI/CD OIDC service-principal deploys
  assign RBAC correctly. (Roland #2167-adjacent CI finding)

Compiled azuredeploy.json + recipe zip are regenerated in a later build pass
(two deferred tests remain red until then).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@MSBrett MSBrett closed this Jun 17, 2026
auto-merge was automatically disabled June 17, 2026 18:21

Pull request was closed

@MSBrett

MSBrett commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Superseded by #2168. The SRE Agent recipe and deployment template are one indivisible build unit — recipes/ is a subdirectory of src/templates/sre-agent/ that Build-SreAgentTemplate.ps1 packages into the deploy artifacts (zip + azuredeploy.json), so they can't be reviewed or merged as independent PRs.

All of this PR's security fixes were brought onto the unified #2168: Low read-only defaults + autonomous reporting, subscription Reader off by default, recipe-package SHA-256 integrity (fail-closed) + https allowlist, removal of the dead telemetry script and hardcoded key, and a parameterized deployer principalType for CI/CD OIDC. #2168 also carries the clean recipe content and regenerated leak-free deploy artifacts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Needs: Review 👀 PR that is ready to be reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants