Skip to content

filter_kubernetes: destroy upstream and TLS context on happy path exit#11738

Open
ShelbyZ wants to merge 1 commit intofluent:masterfrom
ShelbyZ:filter-tls-leaks
Open

filter_kubernetes: destroy upstream and TLS context on happy path exit#11738
ShelbyZ wants to merge 1 commit intofluent:masterfrom
ShelbyZ:filter-tls-leaks

Conversation

@ShelbyZ
Copy link
Copy Markdown
Contributor

@ShelbyZ ShelbyZ commented Apr 22, 2026

Summary

Continuation of fixing filter_kubernetes - #11730

Problem:
fetch_pod_service_map creates a per-call upstream and TLS context each iteration of the background thread loop. The error and failure paths destroy them, but the happy path returned without calling flb_upstream_destroy / flb_tls_destroy, leaking both on every successful fetch.

Why this does not effect other plugins:

The following conditions have to be true for this to be a real leak:

  1. Per-call upstream — a fresh upstream and TLS context created each invocation, not a long-lived pooled one
  2. Called in a loop — runs repeatedly for the process lifetime, so each leak compounds
  3. Background thread — the normal event loop that drains the destroy queue never runs in this thread

fetch_pod_service_map is the only place in the codebase where all are true.

  • Other plugins uses a long-lived upstream that's created once at init and pooled — condition 1 doesn't apply.
  • out_calyptia creates a per-call upstream but only calls it once at startup — condition 2 doesn't apply.
  • out_azure_blob creates a per-call upstream in a loop but already correctly destroys both upstream and TLS context on every path.
  • filter_aws assigns the upstream to ctx->aws_ec2_filter_client->upstream — it's stored on the context, not a per-call local, long-lived.

Testing
Before we can approve your change; please submit the following in a comment:

  • [WIP] Attached Valgrind output that shows no leaks or memory corruption was found

From the test data collected across two run durations on (master, no fix):

Heap (jemalloc profiling):

  • +5.7 MB at 90 minutes
  • +14.6 MB at 3.5 hours

Growth is linear and sustained — ~5 MB per 90-minute window

Source pinned to update_pod_service_mapfetch_pod_service_mapflb_tls_session_create / BIO_ssl_copy_session_id / SSL_CTX_add_custom_ext

RSS:

  • +113 MB at 90 minutes (~1,279 KiB/min)
  • +145 MB at 3.5 hours (~683 KiB/min, decelerating)

The majority of RSS growth (~82%) is the jemalloc retained page pool, not the leak itself
The leak contributes ~80 KiB/min to RSS growth (difference between master and filter_ssl-fix + filter-tls-fix post-warmup rates)

Master vs Fixes
image

Fixes
image

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • [N/A] Run local packaging test showing all targets (including any new ones) build.
  • [N/A] Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • [N/A] Documentation required for this feature

Backporting

  • [N/A] Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • Bug Fixes
    • Fixed resource cleanup in the Kubernetes AWS plugin to ensure all connection and encryption resources are properly disposed during normal operations. This enhancement ensures consistency between success and error handling paths, preventing potential resource accumulation that could degrade system stability and performance over extended periods in production environments.

Signed-off-by: Shelby Hagman <shelbyzh@amazon.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 22, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1c796d6b-9485-4c24-94c6-1e09886102e7

📥 Commits

Reviewing files that changed from the base of the PR and between 29deec9 and 6a73047.

📒 Files selected for processing (1)
  • plugins/filter_kubernetes/kubernetes_aws.c

📝 Walkthrough

Walkthrough

The fetch_pod_service_map function in the Kubernetes AWS filter now calls flb_upstream_destroy() and flb_tls_destroy() during its normal success cleanup path, ensuring pod-association upstream and TLS resources are consistently released, matching existing error-path disposal behavior.

Changes

Cohort / File(s) Summary
AWS pod-association cleanup
plugins/filter_kubernetes/kubernetes_aws.c
Added destruction of ctx->aws_pod_association_upstream and ctx->aws_pod_association_tls in the normal cleanup path of fetch_pod_service_map, ensuring resources are freed on both success and error paths; both pointers set to NULL after destruction.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested labels

docs-required, backport to v4.2.x

Suggested reviewers

  • edsiper
  • cosmo0920

Poem

🐰 A pod's association runs deep,
Upstreams and TLS to keep,
But cleanup must flow both ways—
Success and error paths, always! 🔐

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly and specifically describes the main change: adding resource cleanup for upstream and TLS context during the success path exit of fetch_pod_service_map function in filter_kubernetes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants