Add opt-in RBAC management UI via rbacManagement flag#4865
Add opt-in RBAC management UI via rbacManagement flag#4865dimitri-nicolo wants to merge 30 commits into
Conversation
504568c to
8239694
Compare
| // that need a tenant-scoped Manager (the manager controller itself) must use | ||
| // GetManager from pkg/controller/manager. Non-manager controllers reading | ||
| // zero-tenant-only flags (e.g. spec.rbacManagement) belong here. | ||
| func GetZeroTenantManagerOrNil(ctx context.Context, c client.Client) (*operatorv1.Manager, error) { |
There was a problem hiding this comment.
Can't we move the GetManager func from manager_controller to the utils package and re-use it? There is no reason that apiserver should throw an error if manager is not present. We should however only fetch it when enterprise CRDs exist and we are not running in cloud.
There was a problem hiding this comment.
I see that you moved the func to here, but I also meant to delete this func entirely as well.
There was a problem hiding this comment.
I think the code is more clear if we delete this method, remove any notion of "zero tenant" from the code including comments (it's not yargon that is used today) and use the multitenant booleans explicitly at the caller. I think that will be more understandable to the average reader.
| // +optional | ||
| ManagerDeployment *ManagerDeployment `json:"managerDeployment,omitempty"` | ||
|
|
||
| // RBAC configures the RBAC management UI feature. Only honored in |
There was a problem hiding this comment.
I don't think we should leak cloud (internal) implications into the API comments. Our users don't need to be aware/think about those.
| // Disabled. | ||
| // +optional | ||
| // +kubebuilder:validation:Enum=Enabled;Disabled | ||
| Mode RBACMode `json:"mode,omitempty"` |
There was a problem hiding this comment.
What do you think of the following field names?
- Management: Enabled|Disabled
- UI: Enabled|Disabled
or values: Visible | Hidden
b9493a6 to
94159c6
Compare
|
|
||
| // RBAC management UI: when enabled on the Manager CR, the rbacsync | ||
| // controller runs inside calico-kube-controllers to reconcile the | ||
| // catalog of managed ClusterRoles (per-tier + 32 fine-grained + 6 |
There was a problem hiding this comment.
nit: This number is not likely to hold up over time, best to keep the comment succinct.
|
General comment: the AI generated comments in this code are a bit excessive. An extreme example may be the permissions. The func explains in comments twice which permissions are added + the code to add them. They are making the code less readable. And now we need to also keep the comments up to date when we add more. The operator code is not super complicated, I'd much rather have comments be only used if it adds critical information you couldn't get from simply reading the code (or have your ai read it for you). |
| }, | ||
| // Escalation coverage for webhook-mod resource roles (create/patch on | ||
| // Secrets to wire up webhook auth). | ||
| rbacv1.PolicyRule{ |
There was a problem hiding this comment.
Could we be more specific with secrets and configmaps and bind them to the namespaces we actually need?
There was a problem hiding this comment.
The named-resource access moved to a new Role + RoleBinding in calico-system, where tigera-idp-groups and tigera-idp-ldap-config actually live. The broad create is bound to that one namespace now.
Dropped the tigera-known-oidc-users rule (it lives in tigera-elasticsearch and is already granted by logstorage.go), and tightened the activation gate to !Tenant.MultiTenant() since the IdP resources are pinned to calico-system and the RBAC UI is zero-tenant-only anyway.
Is that what you meant by "bind them to the namespaces we actually need" — or were you pointing at something narrower?
8833288 to
2b749da
Compare
| // management UI's directory cache and the cascading cleanup of managed | ||
| // bindings when a group is removed). | ||
| { | ||
| APIGroups: []string{""}, |
There was a problem hiding this comment.
This one should be in a role.
2b749da to
6af6ea3
Compare
| return reconcile.Result{}, err | ||
| } | ||
|
|
||
| if instance.Spec.Variant.IsEnterprise() { |
There was a problem hiding this comment.
Wondering if you found the variant check to be necessary here?
| rbacv1.PolicyRule{ | ||
| APIGroups: []string{"rbac.authorization.k8s.io"}, | ||
| Resources: []string{"clusterroles", "clusterrolebindings"}, | ||
| Verbs: []string{"escalate", "bind"}, |
There was a problem hiding this comment.
Is it possible to avoid giving escalate and bind privileges? It essentially means that this controller can become a superuser if it chooses to be (i.e. is compromised)
| rules = append(rules, rbacv1.PolicyRule{ | ||
| APIGroups: []string{"rbac.authorization.k8s.io"}, | ||
| Resources: []string{"clusterroles", "roles", "clusterrolebindings", "rolebindings"}, | ||
| Verbs: []string{"get", "list", "watch", "create", "update", "patch", "delete", "bind", "escalate"}, |
There was a problem hiding this comment.
Similar to previous comment - can we avoid bind and escalate? This one seems a bit more serious since it is given to users
| rbacv1.PolicyRule{ | ||
| APIGroups: []string{""}, | ||
| Resources: []string{"configmaps"}, | ||
| ResourceNames: []string{"tigera-known-oidc-users"}, |
There was a problem hiding this comment.
Does this controller access this config map? I thought only the ui-apis endpoint would consult this config map when listing users
| // UI adds to calico-manager-role. Named-resource access is scoped separately | ||
| // on rbacManagementUINamespacedRole. | ||
| func rbacManagementUIRules() []rbacv1.PolicyRule { | ||
| rules := RBACManagementEscalationRules() |
There was a problem hiding this comment.
Does the manager use its own SA when creating RBAC via the UI? Or is it impersonating the user?
| }) | ||
| } | ||
|
|
||
| if c.cfg.Manager.RBACManagementEnabled() && c.cfg.Authentication != nil && c.cfg.Authentication.Spec.OIDC != nil { |
There was a problem hiding this comment.
Why do we gate on OIDC being enabled? I'm wondering what that has do to with the user deciding they want to do an LDAP sync for RBAC UI.
To me I would have expected that this block conditionally renders depending on whether the LDAP secret is placed by the user, and that we could parse the LDAP URL to scope the network policy (we do a similar thing for guardian policy)
…ileges (PR tigera#4865 review) Addresses Pasan's review comment on the tigera-network-admin escalate/bind grant. The RBAC management UI must never let an admin grant more than they already hold, so this grants NO escalate and NO bind. ui-apis performs the /team/* RBAC writes by impersonating the calling user, so the kube-apiserver runs its privilege-escalation check against the admin's own permissions. Without escalate/bind that means: an admin can assign a calico-ui- role to a group only if they personally hold every rule in it, and can edit a role only if they already hold its rules — otherwise the apiserver returns 403. The UI is a convenience layer over the user's own privileges, not an amplifier; deciding which admins may delegate which capabilities is then a matter of what those admins are granted, under the operator's control. The grant is also narrowed to what the handlers actually do (verified against the /team/* implementation): roles are read-only (the UI lists/inspects the catalog, never authors roles); clusterrolebindings get create/update/delete for group and member writes; rolebindings are never created (only cluster-scoped groups are), so they get update/delete only. Gated behind the RBAC-UI feature flag. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…issions instead (PR tigera#4865 review) Addresses Pasan's review comment that the rbacsync controller can become a superuser if compromised, via cluster-wide escalate/bind on RBAC objects. The verbs aren't actually needed: rbacSyncControllerRules already grants the SA the full permission set the calico-ui- roles contain (RBACManagementEscalationRules), so it satisfies Kubernetes' privilege-escalation check by *holding* the rules it writes, rather than by escalate/bind. Removing the verbs means a compromised SA can only assign permissions it already has — it cannot mint an arbitrary cluster-admin role. This relies on RBACManagementEscalationRules staying a superset of the managed roles' contents (which live in calico/kube-controllers). That drift risk is accepted here and is better caught by E2Es than guarded by a standing escalate grant. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…review) Reviewer noted that a field named `rbac` reads as configuration for the Manager's K8s RBAC rather than the RBAC management UI feature. Rename the top-level field and its type from `RBAC`/`spec.rbac` to `RBACUI`/`spec.rbacUI`, and the inner toggle from `ui` to `enabled` (a named `Enabled`/`Disabled` enum), so the API reads as `spec.rbacUI.enabled: Enabled|Disabled`. Keeping the wrapper struct leaves room for future RBAC-UI sub-configuration. Updates the RBACManagementEnabled() helper, the render unit tests, and the regenerated deepcopy and Manager CRD. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… as an error (PR tigera#4865 review) Two related review points on how callers read the Manager CR: - GetManager now swallows IsNotFound and returns (nil, nil), matching the sibling GetManagementCluster helper, so callers branch on a nil instance instead of inspecting the error. - A NoMatchError (the Manager CRD is not registered yet) is propagated rather than treated as not-found. In enterprise the CRD is expected; hitting NoMatchError means it has not loaded yet, and we cannot infer the user's intent, so the installation and apiserver controllers now degrade and requeue instead of assuming the RBAC management UI is off. Callers simplified accordingly; the manager controller branches on `instance == nil` for the not-found path, and its unit test now asserts the (nil, nil) contract. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nt its rules (PR tigera#4865 review) Addresses two review points on the rbacsync controller RBAC: - Scope down the secrets rule: the controller's cluster-scoped `secrets: get/create/patch` is replaced by named/namespaced access in the existing rbacSyncIDPGroupsRole (calico-system) -- named get/patch on the tigera-idp-ldap-config secret, with create kept broad within the namespace since create cannot be name-scoped. - Comment the remaining cluster-scoped rules (compliances read, pods list for leader election) and document why the escalation rules belong on this SA: rbacsync acts as its own identity, unlike the UI which impersonates the user, so it must hold every permission it grants. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ments (PR tigera#4865 review) The RBAC management UI is rendered on every management cluster except multi-tenant ones (the gate is `!Tenant.MultiTenant()`, which covers both the zero-tenant and single-tenant cases). Word the manager cluster-role gate comment and the manager controller LDAP egress comment to match that gate rather than naming a single tenancy mode. Leaves the pre-existing generic tenancy-mode descriptions untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…urces (PR tigera#4865 review) The calico-system Role for the RBAC management UI carried a broad `secrets: patch` rule. Reviewer noted patch can take a resource name, so fold patch into the two named-resource rules (tigera-idp-ldap-config and tigera-idp-groups) instead. The create rule stays broad within the namespace, since create cannot be scoped to a not-yet-existing object. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…mpliances:get (PR tigera#4865 review) rbacManagementUIRules added the shared RBACManagementEscalationRules to the calico-manager cluster role. Reviewer pointed out the RBAC management UI writes managed RBAC objects via user impersonation (Voltron impersonates the caller), so the privilege-escalation check runs against the impersonated user, not the manager SA -- making those rules dead weight on this SA. The escalation rules remain on the rbacsync controller SA, which acts as its own identity. The manager SA keeps only the compliances:get rule that ui-apis itself consumes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…igera#4865 review) This case asserted that OIDC being configured does not by itself open LDAP egress. Now that the egress gate keys solely on the LDAP config secret (not on Authentication.Spec.OIDC), the oidc input is irrelevant and the case duplicates the "secret absent" test directly above it. Reviewer noted it is not really a regression case; remove it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tigera#4865 review) The ui-apis RBAC_UI_ENABLED env was set from Manager.RBACManagementEnabled() alone, but every resource backing the feature is gated on `enabled && !MultiTenant()`. On a multi-tenant management cluster the container would advertise the feature with no supporting RBAC behind it. Gate the env value the same way so the backing resources exist whenever ui-apis reports the feature on. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sync the bundled CRDs with the current projectcalico/calico and tigera/calico-private master sources via `make gen-versions`. Upstream dropped the deprecated `preserveUnknownFields: false` field from the CRDs and updated felixconfigurations/ipamblocks schemas; this regenerates the bundle to match so `make validate-gen-versions` / dirty-check passes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| // Disabled. | ||
| // +optional | ||
| // +kubebuilder:validation:Enum=Enabled;Disabled | ||
| Enabled RBACUIStatus `json:"enabled,omitempty"` |
There was a problem hiding this comment.
| Enabled RBACUIStatus `json:"enabled,omitempty"` | |
| Type *RBACUIType `json:"rbacUI,omitempty"` |
There was a problem hiding this comment.
What about the following options:
// RBACUI configures the RBAC management UI.
type RBACUI struct {
// Type selects the RBAC management UI mode. Unmanaged leaves Calico
// Enterprise RBAC to be configured manually; ManagedCalicoEnterpriseRBAC
// enables the RBAC management UI for Calico Enterprise RBAC. Defaults to
// Unmanaged.
// +optional
// +kubebuilder:validation:Enum=Unmanaged;ManagedCalicoEnterpriseRBAC
Type *RBACUIType `json:"type,omitempty"`
}
// RBACUIType selects the RBAC management UI mode.
type RBACUIType string
const (
// RBACUIUnmanaged leaves RBAC to be configured manually.
RBACUIUnmanaged RBACUIType = "Unmanaged"
// RBACUIManagedCalicoEnterpriseRBAC enables the RBAC management UI for
// Calico Enterprise RBAC.
RBACUIManagedCalicoEnterpriseRBAC RBACUIType = "ManagedCalicoEnterpriseRBAC"
)
| return reconcile.Result{}, err | ||
| } | ||
|
|
||
| // A nil managerCR (no Manager created) means callers fall back to their |
There was a problem hiding this comment.
nit: this comment serves as storage of rationale/history that is useful for the author, rather than context that helps the reader - would just remove (personally)
| // On Calico/OSS the Manager CRD is absent, so the read returns NoMatchError. | ||
| // We are in enterprise, so the Manager CRD is expected. A nil managerCR | ||
| // (no Manager created) means callers fall back to their defaults. A | ||
| // NoMatchError (Manager CRD not registered yet) is treated as an error: |
There was a problem hiding this comment.
nit: this comment serves as storage of rationale/history that is useful for the author, rather than context that helps the reader - would just remove (personally)
| // needs on top of render.RBACManagementEscalationRules. | ||
| func rbacSyncControllerRules() []rbacv1.PolicyRule { | ||
| rules := render.RBACManagementEscalationRules() | ||
| return append(rules, |
There was a problem hiding this comment.
What I meant was that it would be nice to have comments that store the context behind why we need these specific rules - my understanding is that they map to a specific permission needed from a specific managed role - can we reference the name of that managed role here?
| // and calico-kube-controllers roles for the RBAC management UI. The SA writing | ||
| // a managed role must already hold every permission it grants, else K8s rejects | ||
| // the write as privilege escalation; keep this aligned with the managed roles. | ||
| // RBACManagementEscalationRules returns the permission set the rbacsync |
There was a problem hiding this comment.
Sorry for catching this just now and not as part of my previous feedback, but if we have realized now that escalation rules only apply to the rbacsync controller and not ui-apis, I guess this no longer needs to be shared code? Everything should just be defined within rbacSyncControllerRules and everything there is an escalation rule? (and as per my other comment, each rule can have a comment explaining which managed role it covers)
| {Name: "ELASTIC_KIBANA_DISABLED", Value: strconv.FormatBool(c.cfg.Tenant.MultiTenant())}, | ||
| {Name: "VOLTRON_URL", Value: ManagerService(c.cfg.Tenant)}, | ||
| {Name: "RBAC_UI_ENABLED", Value: strconv.FormatBool(c.cfg.Manager.RBACManagementEnabled())}, | ||
| // The RBAC management UI is not supported on multi-tenant management |
There was a problem hiding this comment.
nit: I would remove this comment, feels like a comment written for the person steering Claude rather than the eventual reader of the code
| }) | ||
| } | ||
|
|
||
| if c.cfg.Manager.RBACManagementEnabled() && c.cfg.RBACManagementLDAPConfigured { |
There was a problem hiding this comment.
Should this also gate on multi-tenancy?
… review) Rename the RBAC management UI toggle field from Enabled (RBACUIStatus) to Type (*RBACUIType) per review. Keep the Enabled/Disabled enum idiom; the pointer distinguishes unset from the zero value.
…#4865 review) The comment stored authoring rationale rather than reader-useful context; remove it per review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…era#4865 review) The comment stored authoring rationale rather than reader-useful context; remove it per review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ers (PR tigera#4865 review) Explain in the appended rbacSyncControllerRules comments which managed role's grant each escalation rule mirrors: the tigera-network-admin ClusterRole for compliances, and the per-cluster log-access ClusterRoles for pods:list. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… (PR tigera#4865 review) The escalation rules only apply to the rbacsync controller (not ui-apis), so the shared render.RBACManagementEscalationRules is no longer shared. Fold every rule into rbacSyncControllerRules — where they all belong as escalation coverage — each annotated with the managed role whose grant it mirrors, and delete the now-dead rbac_management.go. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…4865 review) The comment read as authoring context rather than something useful to the eventual reader; remove it per review. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s (PR tigera#4865 review) The RBAC management UI is unsupported on multi-tenant clusters, so its LDAP egress rule now carries the same !MultiTenant() gate as the rest of the feature. Adds a test that the egress is omitted in multi-tenant mode even with the UI enabled and the LDAP config Secret present. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…for clarity on role mirroring
pasanw
left a comment
There was a problem hiding this comment.
LGTM 🎉 As discussed, let's see what the other maintainers think about the API changes before we merge
| type RBACUI struct { | ||
| // Enabled turns the RBAC management UI on or off. Defaults to false. | ||
| // +optional | ||
| Enabled *bool `json:"enabled,omitempty"` |
There was a problem hiding this comment.
We have an updated api spec in the docs that specifies these toggles,
Use a named string type with Enabled/Disabled constants and an Enum marker — not a *bool. This is the established repo idiom:
type NotificationMode string
const (
Disabled NotificationMode = "Disabled"
Enabled NotificationMode = "Enabled"
)
// +kubebuilder:validation:Enum=Enabled;Disabled
There was a problem hiding this comment.
@Brian-McM Thanks, I had considered raising the boolean vs. enum question with the maintainers before merging anything, but I'm happy to follow the idiom. Done in the latest commit: replaced the *bool with a named RBACUIStatusType enum (Enabled/Disabled + Enum marker), exposed as a State field.
cc: @pasanw
| // cert above. | ||
| WAFWebhookCABundle: certificateManager.KeyPair().GetCertificatePEM(), | ||
| WAFWebhookCABundle: certificateManager.KeyPair().GetCertificatePEM(), | ||
| RBACManagementEnabled: managerCR.RBACManagementEnabled(), |
There was a problem hiding this comment.
Since utils.GetManager returns nil, nil on not found won't this now panic?
There was a problem hiding this comment.
@Brian-McM It shouldn't panic, RBACManagementEnabled() guards m == nil before touching the pointer. The latest commit adds a table test (api/v1/manager_types_test.go) covering the nil-managerCR, nil-RBACUI, and nil-State cases.
…based on config Secret
…f Enabled and update related tests
rene-dekker
left a comment
There was a problem hiding this comment.
The host should be taken from Authentication.spec.ldap.host.
…CR and remove deprecated code
|
I reviewed only the ldap changes, which look good now. |
Description
Type: New feature (Calico Enterprise only). Addresses PMREQ-824.
Adds an opt-in RBAC management UI feature, toggled by a new
Manager.spec.rbac.uifield (Enabled|Disabled, defaults toDisabled). When enabled, the operator renders the additional RBAC and network access the UI needs to let an administrator manage role/group assignments from the Manager: maintaining a catalog of managedClusterRoles, binding IdP groups to roles via therbacsynccontroller, and resolving groups from an external LDAP/AD directory.This should be merged because it is the operator-side enablement for the RBAC management UI; without it the UI's backend has no RBAC or egress to function. Gating on
spec.rbac.uikeeps the feature — and its escalation-capable permissions — entirely dormant on clusters that don't opt in.What the flag turns on, by layer
api/v1/manager_types.go) — newRBACspec struct with aUI RBACUIenum field and a nil-safeManager.RBACManagementEnabled()helper (returnsfalsefor a nil Manager or any value other thanEnabled). Regenerated deepcopy and the Manager CRD.calico-managerrole (pkg/render/manager.go): adds the cluster-scoped RBAC management UI rules (full CRUD on managed RBAC objects + escalation-prevention coverage for tier/resource roles), plus a namespacedRole+RoleBindingincalico-systemthat scopes thecreate/named-resource access to the IdPConfigMap/Secrets (tigera-idp-groups,tigera-idp-ldap-config) instead of granting it cluster-wide. SetsRBAC_UI_ENABLEDon theui-apiscontainer and opens LDAP/AD egress (389/636) when OIDC authentication is configured.calico-kube-controllers(pkg/render/kubecontrollers/kube-controllers.go): enables therbacsynccontroller and its RBAC only when the flag is set (dormant otherwise), and adds a namespacedRole+RoleBindinggrantingrbacsyncread on thetigera-idp-groupsConfigMapincalico-system.tigera-network-admin(pkg/render/apiserver.go): grants thebind/escalate-capable rule a network-admin needs to assign managed roles via the UI, also gated on the flag.pkg/render/rbac_management.goso the manager and kube-controllers roles stay aligned (the SA writing a managed role must already hold every permission it grants, or K8s rejects the write as privilege escalation).Why the installation and apiserver controllers read the Manager CR
The flag conceptually belongs to the Manager, but two of the resources it gates are not rendered by the manager controller:
rbacsynccontroller runs insidecalico-kube-controllers, rendered by the installation controller.tigera-network-adminClusterRole is rendered by the apiserver controller.So each of those controllers reads the Manager CR itself and passes a
RBACManagementEnabledbool into its render config. To support this,GetManagerwas moved out of the manager controller intopkg/controller/utilsso the apiserver and installation controllers can reuse it without importing the manager controller package. Both callers treat a missing Manager as not-found (feature off) rather than an error, and the installation controller only reads it for the enterprise variant.Both controllers also add a watch on the
ManagerCR. Without it, togglingspec.rbac.uiwould not re-trigger an installation/apiserver reconcile, so therbacsynccontroller and the network-admin rule wouldn't appear or disappear until some unrelated event re-ran those controllers.Per-cluster behavior
The feature is keyed off each cluster's own
Manager.spec.rbac.ui, so a management cluster and a managed cluster opt in independently. Calico Enterprise evaluates a user's managed-cluster actions against that managed cluster's own RBAC (Voltron proxies the request to the managed cluster's apiserver), so the network-adminbind/escalategrant and the manager-SA rules are intentionally rendered on managed clusters too — that is where a user managing per-cluster RBAC needs them. In multi-tenant management clusters the manager-side rules and namespaced Role are not rendered (gated on!MultiTenant()).Testing: unit tests added for all three render surfaces (manager, apiserver, kube-controllers) covering the enabled/disabled gate and the multi-tenant exclusion;
make gen-filesproduces no drift;go build ./...and the affectedpkg/render*/pkg/controller/apiserversuites pass.Components affected: apiserver (
tigera-network-admin), manager (calico-managerrole + namespaced Role + network policy), kube-controllers (rbacsync), Manager CRD, installation & apiserver controllers,pkg/controller/utils.Release Note
For PR author
make gen-filesmake gen-versionsFor PR reviewers
A note for code reviewers - all pull requests must have the following:
kind/bugif this is a bugfix.kind/enhancementif this is a a new feature.enterpriseif this PR applies to Calico Enterprise only.