Skip to content

Conversation

@zanderfriz
Copy link

What this PR does: Introduces a proposal for a crossplane provider to the cortex project) to declaratively manage Cortex Alertmanager and Ruler configurations through Kubernetes Custom Resources.

Which issue(s) this PR fixes: N/A
Checklist

  • [N/A] Tests updated
  • Documentation added
  • [ N/A] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@dosubot dosubot bot added component/alertmanager component/rules Bits & bobs todo with rules and alerts: the ruler, config service etc. labels Nov 3, 2025
@friedrichg
Copy link
Member

thanks!. please follow https://github.com/cortexproject/cortex/pull/7085/checks?check_run_id=54406852290 to fix DCO

@zanderfriz zanderfriz force-pushed the proposal-crossplane-provider branch from 3f0d67e to 2f35332 Compare November 4, 2025 19:12
@friedrichg
Copy link
Member

@zanderfriz please rebase to have CI pass the PR. We made some changes in GitHub Actions

@friedrichg
Copy link
Member

I am in support of this proposal

I have 2 requests to merge this as accepted:

  • Let's put this in a separate repo inside cortexproject, where the selected maintainers will be able to keep this component updated.
  • We need 2 maintainers for this. (I can't be a maintainer, sorry). I am expecting you will be one of the mantainers. Can you find 1 person to help you with this?

@alolita
Copy link

alolita commented Nov 18, 2025

+1 on making sure there are at least 2 maintainers for this provider component.

Support a separate repo within the project.

@SungJin1212
Copy link
Member

+1

Signed-off-by: afrisvold <afrisvold@apple.com>
@zanderfriz zanderfriz force-pushed the proposal-crossplane-provider branch from 2f35332 to 40fdd1f Compare November 20, 2025 19:26
@zanderfriz
Copy link
Author

After discussing with @devopsjedi, he said he would be happy to be a maintainer on this project

@devopsjedi
Copy link

After discussing with @devopsjedi, he said he would be happy to be a maintainer on this project

Agreed- excited to support this effort!

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Nov 21, 2025

#### TenantConfig

The TenantConfig CRD manages connection details and authentication for a specific Cortex tenant:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why connection and auth only? And how tenant config will be consumed by Cortex?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TenantConfig is the configuration for the crossplane provider to connect to the cortex instance as a Tennant. It is not for configuring a Tenant on cortex as the cortex administrator.

Copy link

@forestsword forestsword left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had written a long winded version of this comment but realized it was all just a matter of my internal organization's organization. In short we can't use this version of the operator because we can't run crossplane. We're an observability team and do not have the responsibility to run something that at the same time can provision s3 buckets.

Also, neither the Prometheus nor Opentelemetry operator, two work-horses of our observability infrastructure, require that we run crossplane, why should cortex?

Don't get me wrong, I don't want to trash the idea of crossplane. It's better for the cortex community to have a crossplane provider than nothing. But we won't be able to use it where I work and that makes me sad.

Comment on lines +134 to +135
providerConfigRef:
name: cortex-config

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this referring to? Is it crossplane specific?

Comment on lines +167 to +168
tenantConfigRef:
name: production-tenant

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We run multiple clusters and it would be helpful to be able to specify multiple clusters where rules should be deployed to. Otherwise we'd need a RuleGroup per cortex cluster.

The RuleGroup CRD manages Prometheus alerting and recording rules within a Cortex namespace:

```yaml
apiVersion: config.cortexmetrics.io/v1alpha1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've heard some talk of people wanting a prometheus operator compatible api for cortex CRDs. Would that be a goal here?

- Applies necessary changes via HTTP API calls
- Updates resource status with current state and any errors

2. **External Resource Identification**: Resources are identified using:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's possible (obviously not ideal but I've seen a lot of mistakes in my life) that you could have the same alerts defined on two clusters in the exact same namespace name and without further identifying attributes they would conflict with each other. Each operator would try and take control. I think it might be necessary to provide additional identifying attributes to prevent conflicts like this. For instance each operator would be passed k8s.cluster.name at start as an identifying attribute and resources would be saved in cortex like k8s.cluster.name/k8s.namespace.name/resource. Wdyt?

**Comparison**:
- **Pros**: Direct control over implementation, no external dependencies
- **Cons**:
- Requires building and maintaining complex controller infrastructure

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what complex infrastructure would be required for a classic k8s operator other than running the operator and setting it up with the api server. Running crossplane is more complex from my perspective especially because its feature set extends way beyond just cortex. Could you provide an example?

- **Pros**: Direct control over implementation, no external dependencies
- **Cons**:
- Requires building and maintaining complex controller infrastructure
- No composition or configuration management capabilities

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this. I don't see the responsibility of an operator to do this. It's the 'deployment delivery' tech that does this like helm or tanka etc. Could you provide an example of how the provider would do this?

- **Cons**:
- Requires building and maintaining complex controller infrastructure
- No composition or configuration management capabilities
- Limited reusability across different Kubernetes clusters

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree, not everyone can or will use crossplane, everyone can run a classic operator IMO.

- Requires building and maintaining complex controller infrastructure
- No composition or configuration management capabilities
- Limited reusability across different Kubernetes clusters
- Missing advanced features like external secret management

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide an example? We'd be delivering secrets via the external secrets operator from vault. We would only need to reference the secret like described in the CRDs above.

- No composition or configuration management capabilities
- Limited reusability across different Kubernetes clusters
- Missing advanced features like external secret management
- Significant development and maintenance overhead

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is subjective. There's years of experience out there running and writing k8s operators, from opentelemetry to prometheus as examples. Crossplane is much younger and not a given. Kubebuilder for it limitations does provide a relief from much of the plumbing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/alertmanager component/rules Bits & bobs todo with rules and alerts: the ruler, config service etc. lgtm This PR has been approved by a maintainer size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants