Skip to content

Conversation

@jlore-decathlon
Copy link

@jlore-decathlon jlore-decathlon commented Nov 20, 2025

Proposed addition

The current Datadog metric provider relies on their Metric API.
However, this API has pretty low rate limits, and people with a moderately sized infrastructure tend to reach these limits quite easily when scaling their usage of Flagger or datadog-based autoscaling (like KEDA).

Datadog offers a more scalable alternative by making its Cluster Agent batch requests by groups of 35 see Cluster Agent Autoscaling Metrics. It then makes these metrics available within the cluster by exposing an endpoint following Kubernetes External Metrics API.

Note

This endpoint is not documented by Datadog, as they expect people to have the agent register against the control plane as the cluster's external metrics provider and then making these metrics available through k8s API Server, removing the need to query the endpoint directly.
However, by implementing a kubernetes API, its behavior is predictable and stable enough to be used directly.

We've relied on the way KEDA implemented a similar feature during design and implementation. However, Flagger is not an autoscaling solution so we're not going to mimic the metric proxy Keda operates. We simply propose to query the external metric server directly. By doing this, we also chose to make the provider generic and compatible with any external metrics server. The downside is that we cannot abstract the way datadog names its metrics which isn't trivial.

fix: #1235

Any alternatives you've considered?

We've pondered modifying the Datadog metric provider instead of making an external metrics provider. But we felt that this had the benefit of making other external metric providers compatible and kept the code datadog-agnostic.

We could theoretically make it even more generic and use any kubernetes metric API (standard, Custom or External), but I think Flagger already offers this

Disclaimer

  • This PR was peer programmed with my colleague @mveroone.
  • We're not Go developers, yet we did our best to follow the project's coding conventions and guidelines. Any feedback is welcome and we'll be happy to rework any part of that contribution you think needs it.
  • AI disclosure : AIL 1 (see https://danielmiessler.com/blog/ai-influence-level-ail). Minor code autocomplete, but mostly manual coding and writing.
  • We've built the docker image and tested end-to-end on one of our GKE clusters against Datadog cluster agent endpoint. it seems to work as expected, but the lack of feedback from Flagger leaves room for some doubts, but we were unsure if adding some logging was a good idea.

@jlore-decathlon jlore-decathlon marked this pull request as draft November 20, 2025 16:00
@jlore-decathlon jlore-decathlon force-pushed the feat/externalmetrics branch 2 times, most recently from 85c595d to 139a34a Compare November 24, 2025 16:33
@jlore-decathlon jlore-decathlon changed the title feat(externalmetrics): implement ExternalMetricsProvider for querying… feat(externalmetrics): implement ExternalMetricsProvider for querying external metrics Dec 1, 2025
@jlore-decathlon jlore-decathlon marked this pull request as ready for review December 1, 2025 12:39
@jlore-decathlon jlore-decathlon force-pushed the feat/externalmetrics branch 2 times, most recently from 3757b5a to 72ad54a Compare December 1, 2025 12:48
@jlore-decathlon jlore-decathlon changed the title feat(externalmetrics): implement ExternalMetricsProvider for querying external metrics feat(provider): add External Metrics provider Dec 1, 2025
Datadog provider is often meeting API rate limits on bigger
implementations. Datadog Cluster Agent can batch metric queries
and expose them through an endpoint compatible with Kubernetes External
Metrics API.

This implementations allows to use this endpoint and any other server
implementing Kubernetes External Metrics API. Including k8s API server
itself.

Co-authored-by: Johan Lore <johan.lore@decathlon.com>
Co-authored-by: Maxime Véroone <maxime.veroone@decathlon.com>
Signed-off-by: Johan Lore <johan.lore@decathlon.com>
Signed-off-by: Maxime Véroone <maxime.veroone@decathlon.com>
@mveroone mveroone force-pushed the feat/externalmetrics branch from eeeccfc to 86cc361 Compare December 3, 2025 08:28
Copy link
Member

@aryan9600 aryan9600 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for opening this PR!

applicationBearerToken = "token"
)

// ExternalMetricsProvider executes datadog queries
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// ExternalMetricsProvider executes datadog queries
// ExternalMetricsProvider fetches metrics from an ExternalMetricsProvider.

bearerToken string

timeout time.Duration
client *http.Client
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use an ExternalMetricsClient object created by to fetch the external_metrics.ExternalMetricValueList? we can create one using the NewForConfig function in this package. it takes care of loading the service account token automatically and provides a nice interface to fetch the metrics?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Using external metrics from the Kubernetes API server

3 participants