Skip to content

filter_kubernetes: make memory growth rigidly on filter k8s#11731

Open
cosmo0920 wants to merge 2 commits intomasterfrom
cosmo0920-make-memory-growth-rigidly-on-filter_k8s
Open

filter_kubernetes: make memory growth rigidly on filter k8s#11731
cosmo0920 wants to merge 2 commits intomasterfrom
cosmo0920-make-memory-growth-rigidly-on-filter_k8s

Conversation

@cosmo0920
Copy link
Copy Markdown
Contributor

@cosmo0920 cosmo0920 commented Apr 21, 2026

When using get_token_with_command, this mechanism really referring of the limit of memory.
So, we need to refer it carefully before executing flb_realloc to allocate heap memory.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • Bug Fixes

    • Improved Kubernetes filter token handling with a strict 1MB limit to prevent oversized command outputs and related memory issues.
    • Enhanced buffering and error handling to fail early on excessive token output and to reliably report token load failures.
  • Tests

    • Added Linux-only integration tests that verify acceptance of large-but-valid token output and rejection of oversized multiline command output.

@cosmo0920 cosmo0920 marked this pull request as ready for review April 21, 2026 08:13
@cosmo0920 cosmo0920 requested a review from edsiper as a code owner April 21, 2026 08:13
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 21, 2026

📝 Walkthrough
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title references 'memory growth rigidly' but is grammatically awkward and vague about what 'rigidly' means; it doesn't clearly convey that the primary change is enforcing memory size limits on Kubernetes token command execution. Clarify the title to better reflect the main objective, such as: 'filter_kubernetes: enforce memory limits on token command output' or 'filter_kubernetes: add maximum size validation for token command'.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch cosmo0920-make-memory-growth-rigidly-on-filter_k8s

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/filter_kubernetes/kube_meta.c`:
- Around line 125-129: The token-loading branch can return -1 while leaving tk
== NULL, causing get_http_auth_header() to format "Bearer %s" with a NULL
pointer and overwrite ctx->token; update the logic around the token command/file
read (the branch that frees res, pclose(fp) and currently returns -1) so that
both the size-limit failure and the alternate token-file failure set a common
ret == -1 condition and return immediately before get_http_auth_header() runs;
specifically, ensure the branches using FLB_KUBE_TOKEN_MAX_SIZE, flb_free(res),
pclose(fp) and the other token-file branch do not leave tk NULL—move the return
-1 into a shared if (ret == -1) block after both branches (or return earlier) so
get_http_auth_header(), ctx->token and tk are never used when loading failed.

In
`@tests/integration/scenarios/filter_kubernetes/tests/test_filter_kubernetes_001.py`:
- Around line 23-24: The method log_message(self, format, *args) shadows the
built-in name format; rename the parameter (e.g., to fmt or message_format) to
avoid the A002 lint error while preserving behavior — update the method
signature in the log_message definition to use the new parameter name
(log_message(self, fmt, *args)) and adjust any internal references (if any) to
the new name; ensure the method still returns None as before.
- Around line 29-33: The test currently uses a probe socket then creates
ThreadingHTTPServer, allowing a race; instead construct ThreadingHTTPServer
bound to ("127.0.0.1", 0) directly (use ThreadingHTTPServer(("127.0.0.1", 0),
_KubeApiHandler)) and then read the assigned port from server.server_address[1]
to use in the test; remove the temporary probe socket and any
sock.bind/getsockname calls so the server owns the ephemeral port.
- Around line 78-105: In _write_config update the Kube_Token_Command entry so
the token command is quoted and uses the active Python interpreter: replace the
unquoted `python3 {script_file}` with a command built from `sys.executable` and
a properly quoted path to `script_file` (and any tmp_path components) so the
string written for Kube_Token_Command is safe when passed to popen(); ensure the
final value written into the config for `Kube_Token_Command` is a single quoted
command string (including the script path) to avoid shell-splitting on spaces or
metacharacters.
- Around line 122-143: In
test_filter_kubernetes_token_command_rejects_multiline_output_over_limit, ensure
the Service is always stopped by moving service.stop() into a finally block and
avoid snapshotting too early by waiting for both expected log markers: first
wait_for_log_contains("failed to run command", timeout=25) and then
wait_for_log_contains("kube token command test", timeout=25) (or vice versa)
before stopping the service; update the code around Service(str(config_file)),
service.start(), and wait_for_log_contains(...) so cleanup is in finally and
both markers are awaited to prevent flakiness.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 24c2a1b2-e396-486f-b335-f8bc8b00164a

📥 Commits

Reviewing files that changed from the base of the PR and between 29deec9 and 9bdc651.

📒 Files selected for processing (2)
  • plugins/filter_kubernetes/kube_meta.c
  • tests/integration/scenarios/filter_kubernetes/tests/test_filter_kubernetes_001.py

Comment thread plugins/filter_kubernetes/kube_meta.c
Comment thread tests/integration/scenarios/filter_kubernetes/tests/test_filter_kubernetes_001.py Outdated
Comment thread tests/integration/scenarios/filter_kubernetes/tests/test_filter_kubernetes_001.py Outdated
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
tests/integration/scenarios/filter_kubernetes/tests/test_filter_kubernetes_001.py (1)

118-129: ⚠️ Potential issue | 🟡 Minor

Accept-path test still misses the try/finally cleanup.

The rejects test now wraps service.stop() in finally (lines 141-145), but this accept test was left as-is. If wait_for_log_contains raises (e.g., timeout), the Fluent Bit process leaks across tests and _run_kube_api_server will still tear down on context exit but the service is never stopped.

🧪 Proposed fix
         service = Service(str(config_file))
-        service.start()
-        log_text = service.wait_for_log_contains("kube token command test", timeout=25)
-        service.stop()
+        service.start()
+        try:
+            log_text = service.wait_for_log_contains("kube token command test", timeout=25)
+        finally:
+            service.stop()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/integration/scenarios/filter_kubernetes/tests/test_filter_kubernetes_001.py`
around lines 118 - 129, The accept-path test
test_filter_kubernetes_token_command_accepts_multiline_output_over_8kb risks
leaking the Service when wait_for_log_contains raises; wrap the
Service.start()/wait_for_log_contains/Service.stop() sequence in a try/finally
so Service.stop() is always called (use the existing _run_kube_api_server()
context and ensure Service.stop() is invoked in the finally block); reference
the test function name and the Service.start(), Service.stop(), and
wait_for_log_contains calls to locate where to add the try/finally cleanup.
🧹 Nitpick comments (2)
tests/integration/scenarios/filter_kubernetes/tests/test_filter_kubernetes_001.py (1)

54-60: Minor: wait_for_log_contains reads the log file twice per poll.

The lambda calls read_file(self.flb.log_file) twice on every iteration (once for the predicate, once for the return value). For the huge-output test that polls over 25s with 0.5s interval while the file may grow into the hundreds of KB, this doubles the I/O. Binding the read once keeps the behavior identical and halves syscalls.

♻️ Suggested refactor
     def wait_for_log_contains(self, text, timeout=20):
+        def _check():
+            content = read_file(self.flb.log_file)
+            return content if text in content else None
         return self.service.wait_for_condition(
-            lambda: read_file(self.flb.log_file) if text in read_file(self.flb.log_file) else None,
+            _check,
             timeout=timeout,
             interval=0.5,
             description=f"log text {text!r}",
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/integration/scenarios/filter_kubernetes/tests/test_filter_kubernetes_001.py`
around lines 54 - 60, The helper wait_for_log_contains performs
read_file(self.flb.log_file) twice each poll; modify wait_for_log_contains so
the lambda used with service.wait_for_condition reads the file once per
invocation (e.g., assign content = read_file(self.flb.log_file) inside the
lambda and then check if text in content and return content or None), keeping
the same timeout/interval/description and preserving use of self.flb.log_file,
service.wait_for_condition and read_file.
plugins/filter_kubernetes/kube_meta.c (1)

122-154: Rigid growth logic looks correct; consider capping new_capacity at FLB_KUBE_TOKEN_MAX_SIZE.

The size check len > FLB_KUBE_TOKEN_MAX_SIZE - size - 1 is equivalent to size + len + 1 > FLB_KUBE_TOKEN_MAX_SIZE and cannot underflow because size is kept strictly below FLB_KUBE_TOKEN_MAX_SIZE by construction. The doubling loop is bounded (required_size ≤ 1MB, so new_capacity stops around 2MB — no size_t overflow risk), and cleanup on realloc failure is correct.

One minor nit: because the guard above already rejects inputs that would exceed 1MB, the doubling can still allocate up to ~2MB for a token that by policy cannot exceed 1MB. Capping avoids that slack.

♻️ Optional tightening
             new_capacity = capacity;

             while (new_capacity < required_size) {
                 new_capacity *= 2;
             }
+            if (new_capacity > FLB_KUBE_TOKEN_MAX_SIZE) {
+                new_capacity = FLB_KUBE_TOKEN_MAX_SIZE;
+            }

             temp = flb_realloc(res, new_capacity);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/filter_kubernetes/kube_meta.c` around lines 122 - 154, The growth
loop may overshoot and allocate up to ~2x FLB_KUBE_TOKEN_MAX_SIZE; after
computing new_capacity in the while (new_capacity < required_size) doubling loop
inside the token-read code, cap new_capacity to FLB_KUBE_TOKEN_MAX_SIZE (use
FLB_KUBE_TOKEN_MAX_SIZE when new_capacity > FLB_KUBE_TOKEN_MAX_SIZE) before
calling flb_realloc so res/new_capacity cannot exceed the configured token max;
keep the existing pre-check (len > FLB_KUBE_TOKEN_MAX_SIZE - size - 1) intact
and only add the cap on new_capacity.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In
`@tests/integration/scenarios/filter_kubernetes/tests/test_filter_kubernetes_001.py`:
- Around line 118-129: The accept-path test
test_filter_kubernetes_token_command_accepts_multiline_output_over_8kb risks
leaking the Service when wait_for_log_contains raises; wrap the
Service.start()/wait_for_log_contains/Service.stop() sequence in a try/finally
so Service.stop() is always called (use the existing _run_kube_api_server()
context and ensure Service.stop() is invoked in the finally block); reference
the test function name and the Service.start(), Service.stop(), and
wait_for_log_contains calls to locate where to add the try/finally cleanup.

---

Nitpick comments:
In `@plugins/filter_kubernetes/kube_meta.c`:
- Around line 122-154: The growth loop may overshoot and allocate up to ~2x
FLB_KUBE_TOKEN_MAX_SIZE; after computing new_capacity in the while (new_capacity
< required_size) doubling loop inside the token-read code, cap new_capacity to
FLB_KUBE_TOKEN_MAX_SIZE (use FLB_KUBE_TOKEN_MAX_SIZE when new_capacity >
FLB_KUBE_TOKEN_MAX_SIZE) before calling flb_realloc so res/new_capacity cannot
exceed the configured token max; keep the existing pre-check (len >
FLB_KUBE_TOKEN_MAX_SIZE - size - 1) intact and only add the cap on new_capacity.

In
`@tests/integration/scenarios/filter_kubernetes/tests/test_filter_kubernetes_001.py`:
- Around line 54-60: The helper wait_for_log_contains performs
read_file(self.flb.log_file) twice each poll; modify wait_for_log_contains so
the lambda used with service.wait_for_condition reads the file once per
invocation (e.g., assign content = read_file(self.flb.log_file) inside the
lambda and then check if text in content and return content or None), keeping
the same timeout/interval/description and preserving use of self.flb.log_file,
service.wait_for_condition and read_file.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8aeaf638-b8fa-45de-8aaa-fa32d616e164

📥 Commits

Reviewing files that changed from the base of the PR and between 9bdc651 and 702e2a6.

📒 Files selected for processing (2)
  • plugins/filter_kubernetes/kube_meta.c
  • tests/integration/scenarios/filter_kubernetes/tests/test_filter_kubernetes_001.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant