filter_kubernetes: make memory growth rigidly on filter k8s#11731
filter_kubernetes: make memory growth rigidly on filter k8s#11731
Conversation
📝 Walkthrough🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/filter_kubernetes/kube_meta.c`:
- Around line 125-129: The token-loading branch can return -1 while leaving tk
== NULL, causing get_http_auth_header() to format "Bearer %s" with a NULL
pointer and overwrite ctx->token; update the logic around the token command/file
read (the branch that frees res, pclose(fp) and currently returns -1) so that
both the size-limit failure and the alternate token-file failure set a common
ret == -1 condition and return immediately before get_http_auth_header() runs;
specifically, ensure the branches using FLB_KUBE_TOKEN_MAX_SIZE, flb_free(res),
pclose(fp) and the other token-file branch do not leave tk NULL—move the return
-1 into a shared if (ret == -1) block after both branches (or return earlier) so
get_http_auth_header(), ctx->token and tk are never used when loading failed.
In
`@tests/integration/scenarios/filter_kubernetes/tests/test_filter_kubernetes_001.py`:
- Around line 23-24: The method log_message(self, format, *args) shadows the
built-in name format; rename the parameter (e.g., to fmt or message_format) to
avoid the A002 lint error while preserving behavior — update the method
signature in the log_message definition to use the new parameter name
(log_message(self, fmt, *args)) and adjust any internal references (if any) to
the new name; ensure the method still returns None as before.
- Around line 29-33: The test currently uses a probe socket then creates
ThreadingHTTPServer, allowing a race; instead construct ThreadingHTTPServer
bound to ("127.0.0.1", 0) directly (use ThreadingHTTPServer(("127.0.0.1", 0),
_KubeApiHandler)) and then read the assigned port from server.server_address[1]
to use in the test; remove the temporary probe socket and any
sock.bind/getsockname calls so the server owns the ephemeral port.
- Around line 78-105: In _write_config update the Kube_Token_Command entry so
the token command is quoted and uses the active Python interpreter: replace the
unquoted `python3 {script_file}` with a command built from `sys.executable` and
a properly quoted path to `script_file` (and any tmp_path components) so the
string written for Kube_Token_Command is safe when passed to popen(); ensure the
final value written into the config for `Kube_Token_Command` is a single quoted
command string (including the script path) to avoid shell-splitting on spaces or
metacharacters.
- Around line 122-143: In
test_filter_kubernetes_token_command_rejects_multiline_output_over_limit, ensure
the Service is always stopped by moving service.stop() into a finally block and
avoid snapshotting too early by waiting for both expected log markers: first
wait_for_log_contains("failed to run command", timeout=25) and then
wait_for_log_contains("kube token command test", timeout=25) (or vice versa)
before stopping the service; update the code around Service(str(config_file)),
service.start(), and wait_for_log_contains(...) so cleanup is in finally and
both markers are awaited to prevent flakiness.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 24c2a1b2-e396-486f-b335-f8bc8b00164a
📒 Files selected for processing (2)
plugins/filter_kubernetes/kube_meta.ctests/integration/scenarios/filter_kubernetes/tests/test_filter_kubernetes_001.py
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
9bdc651 to
702e2a6
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
tests/integration/scenarios/filter_kubernetes/tests/test_filter_kubernetes_001.py (1)
118-129:⚠️ Potential issue | 🟡 MinorAccept-path test still misses the
try/finallycleanup.The rejects test now wraps
service.stop()infinally(lines 141-145), but this accept test was left as-is. Ifwait_for_log_containsraises (e.g., timeout), the Fluent Bit process leaks across tests and_run_kube_api_serverwill still tear down on context exit but the service is never stopped.🧪 Proposed fix
service = Service(str(config_file)) - service.start() - log_text = service.wait_for_log_contains("kube token command test", timeout=25) - service.stop() + service.start() + try: + log_text = service.wait_for_log_contains("kube token command test", timeout=25) + finally: + service.stop()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/scenarios/filter_kubernetes/tests/test_filter_kubernetes_001.py` around lines 118 - 129, The accept-path test test_filter_kubernetes_token_command_accepts_multiline_output_over_8kb risks leaking the Service when wait_for_log_contains raises; wrap the Service.start()/wait_for_log_contains/Service.stop() sequence in a try/finally so Service.stop() is always called (use the existing _run_kube_api_server() context and ensure Service.stop() is invoked in the finally block); reference the test function name and the Service.start(), Service.stop(), and wait_for_log_contains calls to locate where to add the try/finally cleanup.
🧹 Nitpick comments (2)
tests/integration/scenarios/filter_kubernetes/tests/test_filter_kubernetes_001.py (1)
54-60: Minor:wait_for_log_containsreads the log file twice per poll.The lambda calls
read_file(self.flb.log_file)twice on every iteration (once for the predicate, once for the return value). For the huge-output test that polls over 25s with 0.5s interval while the file may grow into the hundreds of KB, this doubles the I/O. Binding the read once keeps the behavior identical and halves syscalls.♻️ Suggested refactor
def wait_for_log_contains(self, text, timeout=20): + def _check(): + content = read_file(self.flb.log_file) + return content if text in content else None return self.service.wait_for_condition( - lambda: read_file(self.flb.log_file) if text in read_file(self.flb.log_file) else None, + _check, timeout=timeout, interval=0.5, description=f"log text {text!r}", )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/scenarios/filter_kubernetes/tests/test_filter_kubernetes_001.py` around lines 54 - 60, The helper wait_for_log_contains performs read_file(self.flb.log_file) twice each poll; modify wait_for_log_contains so the lambda used with service.wait_for_condition reads the file once per invocation (e.g., assign content = read_file(self.flb.log_file) inside the lambda and then check if text in content and return content or None), keeping the same timeout/interval/description and preserving use of self.flb.log_file, service.wait_for_condition and read_file.plugins/filter_kubernetes/kube_meta.c (1)
122-154: Rigid growth logic looks correct; consider cappingnew_capacityatFLB_KUBE_TOKEN_MAX_SIZE.The size check
len > FLB_KUBE_TOKEN_MAX_SIZE - size - 1is equivalent tosize + len + 1 > FLB_KUBE_TOKEN_MAX_SIZEand cannot underflow becausesizeis kept strictly belowFLB_KUBE_TOKEN_MAX_SIZEby construction. The doubling loop is bounded (required_size ≤ 1MB, sonew_capacitystops around 2MB — nosize_toverflow risk), and cleanup on realloc failure is correct.One minor nit: because the guard above already rejects inputs that would exceed 1MB, the doubling can still allocate up to ~2MB for a token that by policy cannot exceed 1MB. Capping avoids that slack.
♻️ Optional tightening
new_capacity = capacity; while (new_capacity < required_size) { new_capacity *= 2; } + if (new_capacity > FLB_KUBE_TOKEN_MAX_SIZE) { + new_capacity = FLB_KUBE_TOKEN_MAX_SIZE; + } temp = flb_realloc(res, new_capacity);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/filter_kubernetes/kube_meta.c` around lines 122 - 154, The growth loop may overshoot and allocate up to ~2x FLB_KUBE_TOKEN_MAX_SIZE; after computing new_capacity in the while (new_capacity < required_size) doubling loop inside the token-read code, cap new_capacity to FLB_KUBE_TOKEN_MAX_SIZE (use FLB_KUBE_TOKEN_MAX_SIZE when new_capacity > FLB_KUBE_TOKEN_MAX_SIZE) before calling flb_realloc so res/new_capacity cannot exceed the configured token max; keep the existing pre-check (len > FLB_KUBE_TOKEN_MAX_SIZE - size - 1) intact and only add the cap on new_capacity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In
`@tests/integration/scenarios/filter_kubernetes/tests/test_filter_kubernetes_001.py`:
- Around line 118-129: The accept-path test
test_filter_kubernetes_token_command_accepts_multiline_output_over_8kb risks
leaking the Service when wait_for_log_contains raises; wrap the
Service.start()/wait_for_log_contains/Service.stop() sequence in a try/finally
so Service.stop() is always called (use the existing _run_kube_api_server()
context and ensure Service.stop() is invoked in the finally block); reference
the test function name and the Service.start(), Service.stop(), and
wait_for_log_contains calls to locate where to add the try/finally cleanup.
---
Nitpick comments:
In `@plugins/filter_kubernetes/kube_meta.c`:
- Around line 122-154: The growth loop may overshoot and allocate up to ~2x
FLB_KUBE_TOKEN_MAX_SIZE; after computing new_capacity in the while (new_capacity
< required_size) doubling loop inside the token-read code, cap new_capacity to
FLB_KUBE_TOKEN_MAX_SIZE (use FLB_KUBE_TOKEN_MAX_SIZE when new_capacity >
FLB_KUBE_TOKEN_MAX_SIZE) before calling flb_realloc so res/new_capacity cannot
exceed the configured token max; keep the existing pre-check (len >
FLB_KUBE_TOKEN_MAX_SIZE - size - 1) intact and only add the cap on new_capacity.
In
`@tests/integration/scenarios/filter_kubernetes/tests/test_filter_kubernetes_001.py`:
- Around line 54-60: The helper wait_for_log_contains performs
read_file(self.flb.log_file) twice each poll; modify wait_for_log_contains so
the lambda used with service.wait_for_condition reads the file once per
invocation (e.g., assign content = read_file(self.flb.log_file) inside the
lambda and then check if text in content and return content or None), keeping
the same timeout/interval/description and preserving use of self.flb.log_file,
service.wait_for_condition and read_file.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8aeaf638-b8fa-45de-8aaa-fa32d616e164
📒 Files selected for processing (2)
plugins/filter_kubernetes/kube_meta.ctests/integration/scenarios/filter_kubernetes/tests/test_filter_kubernetes_001.py
When using get_token_with_command, this mechanism really referring of the limit of memory.
So, we need to refer it carefully before executing flb_realloc to allocate heap memory.
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
Bug Fixes
Tests