filter_kubernetes: destroy upstream and TLS context on happy path exit#11738
filter_kubernetes: destroy upstream and TLS context on happy path exit#11738ShelbyZ wants to merge 1 commit intofluent:masterfrom
Conversation
Signed-off-by: Shelby Hagman <shelbyzh@amazon.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary
Continuation of fixing filter_kubernetes - #11730
Problem:
fetch_pod_service_mapcreates a per-call upstream and TLS context each iteration of the background thread loop. The error and failure paths destroy them, but the happy path returned without callingflb_upstream_destroy/flb_tls_destroy, leaking both on every successful fetch.Why this does not effect other plugins:
The following conditions have to be true for this to be a real leak:
fetch_pod_service_mapis the only place in the codebase where all are true.out_calyptiacreates a per-call upstream but only calls it once at startup — condition 2 doesn't apply.out_azure_blobcreates a per-call upstream in a loop but already correctly destroys both upstream and TLS context on every path.filter_awsassigns the upstream to ctx->aws_ec2_filter_client->upstream — it's stored on the context, not a per-call local, long-lived.Testing
Before we can approve your change; please submit the following in a comment:
From the test data collected across two run durations on (master, no fix):
Heap (jemalloc profiling):
Growth is linear and sustained — ~5 MB per 90-minute window
Source pinned to
update_pod_service_map→fetch_pod_service_map→flb_tls_session_create/BIO_ssl_copy_session_id/SSL_CTX_add_custom_extRSS:
The majority of RSS growth (~82%) is the jemalloc retained page pool, not the leak itself
The leak contributes ~80 KiB/min to RSS growth (difference between master and filter_ssl-fix + filter-tls-fix post-warmup rates)
Master vs Fixes

Fixes

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit