Skip to content

in_tail: fix file append cleanup on failure#11749

Open
sandy2008 wants to merge 1 commit intofluent:masterfrom
sandy2008:codex/in-tail-file-append-cleanup
Open

in_tail: fix file append cleanup on failure#11749
sandy2008 wants to merge 1 commit intofluent:masterfrom
sandy2008:codex/in-tail-file-append-cleanup

Conversation

@sandy2008
Copy link
Copy Markdown

@sandy2008 sandy2008 commented Apr 27, 2026

This fixes a tail setup failure path where a file could already be linked into the static/event tracking lists before a later setup step failed. In low-resource cases that left list, hash, and accounting state partly registered, which can make the next scan behave oddly.

What changed:

  • roll back listed files through the normal remove path
  • clean up partially initialized files by hand before they are listed
  • check static/event hash registration failures
  • keep the static file count in sync when removing static files
  • avoid moving a static file to the event list if event hash registration fails

Testing:

  • cmake -S . -B build -DFLB_TESTS_RUNTIME=On -DFLB_TESTS_INTERNAL=On
  • cmake --build build -j8
  • ctest --test-dir build -R in_tail --output-on-failure
  • tests/integration/.venv/bin/python -m pytest tests/integration/scenarios/in_tail/tests/test_in_tail_001.py -q -k "copytruncate_with_stale_writer or follows_rename_rotation or handles_multiple_rename_rotations or multi_file_rapid_rotation or rotate_wait_keeps_old_inode_then_purges_it"
  • Linux Docker: FLUENT_BIT_BINARY=/build/bin/fluent-bit VALGRIND=1 VALGRIND_STRICT=1 /tmp/flb-it-venv/bin/python -m pytest tests/integration/scenarios/in_tail/tests/test_in_tail_001.py -q -k "copytruncate_with_stale_writer or follows_rename_rotation or handles_multiple_rename_rotations or multi_file_rapid_rotation or rotate_wait_keeps_old_inode_then_purges_it"

Summary by CodeRabbit

  • Bug Fixes
    • Improved error handling and resource cleanup in file tail monitoring to prevent resource leaks and ensure graceful failure recovery.
    • More reliable file tracking and registration logic for static and event monitoring modes, including rollback on failures and corrected bookkeeping so file counts and monitoring state remain accurate.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 27, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 141e45bb-52e1-47a8-82ec-38f36658c2cf

📥 Commits

Reviewing files that changed from the base of the PR and between 3c7d8f7 and 1f032cc.

📒 Files selected for processing (2)
  • plugins/in_tail/tail_file.c
  • plugins/in_tail/tail_fs_stat.c

📝 Walkthrough

Walkthrough

Refactors error handling and cleanup in the tail plugin: flb_tail_file_append now uses staged cleanup labels and explicit rollback of registrations/resources; flb_tail_file_remove decrements static-file counters; flb_tail_file_to_event inserts into event hash before list swap; flb_tail_fs_stat_remove always frees the fs backend when present.

Changes

Cohort / File(s) Summary
Tail file append & lifecycle
plugins/in_tail/tail_file.c
Rewrote flb_tail_file_append error paths into multiple err_* labels with staged resource release (fd, hash/list registrations, fs monitoring, ML streams, msgpack/dmode buffers, decompression/encoders). Added allocation checks for docker/multiline/encoder buffers, moved files-opened metric increment to post-encoder creation, and adjusted flb_tail_file_remove/flb_tail_file_to_event to correctly handle static→event transitions and rollback on hash registration failure.
FS stat backend cleanup
plugins/in_tail/tail_fs_stat.c
Changed flb_tail_fs_stat_remove to free the file->fs_backend whenever non-NULL and clear the pointer, making stat backend cleanup independent of file->tail_mode.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant TailFile as Tail File
    participant Hash as Event/Static Hash
    participant FSWatch as FS Watcher
    participant Decoder
    participant Encoder
    participant Metrics

    Caller->>TailFile: flb_tail_file_append(file)
    TailFile->>TailFile: open fd (fd = -1 -> set)
    TailFile->>Hash: register file in hash (static/event)
    alt hash registration fails
        Hash-->>TailFile: error
        TailFile->>FSWatch: rollback fs monitor (if registered)
        TailFile->>Decoder: free decompression ctx (if created)
        TailFile->>Encoder: free encoders/buffers (if created)
        TailFile-->>Caller: return error
    else hash succeeds
        TailFile->>FSWatch: add fs monitoring (if event mode)
        TailFile->>Decoder: create decompression/hash/name contexts
        TailFile->>Encoder: create encoder & buffers
        alt encoder creation fails
            Encoder-->>TailFile: error
            TailFile->>Hash: unregister from hash/list
            TailFile->>FSWatch: rollback fs monitor
            TailFile->>Decoder: free decompression ctx
            TailFile-->>Caller: return error
        else success
            TailFile->>Metrics: increment files-opened metric
            TailFile-->>Caller: success
        end
    end
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested Reviewers

  • edsiper
  • koleini

Poem

🐰 A rabbit guards each opened file,
Hops back on errors with a tidy smile,
Hashes unmade, monitors rewound,
Buffers freed, no leak left found,
I nibble crumbs of cleanup, one hop at a time.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and accurately describes the primary change: fixing file append cleanup on failure paths in the in_tail plugin.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sandy2008 sandy2008 marked this pull request as ready for review April 27, 2026 10:31
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3c7d8f752a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/in_tail/tail_file.c
@sandy2008 sandy2008 closed this Apr 27, 2026
@sandy2008 sandy2008 reopened this Apr 27, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/in_tail/tail_file.c`:
- Around line 2157-2165: flb_hash_table_add failure leaves fs_backend leaked for
the stat backend because flb_tail_fs_stat_remove only frees fs_backend when
file->tail_mode == FLB_TAIL_EVENT; update flb_tail_fs_stat_remove to free
file->fs_backend unconditionally (or null it after free) so resources allocated
by flb_tail_fs_stat_add are always released on cleanup, and keep existing guards
for other event-specific cleanup; reference functions: flb_tail_fs_stat_add,
flb_tail_fs_stat_remove, flb_tail_fs_remove, symbols: fs_backend, tail_mode,
FLB_TAIL_EVENT, FLB_TAIL_STATIC, and event_hash/flb_hash_table_add to locate the
related code paths.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0de8aca1-40fa-4a59-b4c3-a72bc02c83ac

📥 Commits

Reviewing files that changed from the base of the PR and between 29deec9 and 3c7d8f7.

📒 Files selected for processing (1)
  • plugins/in_tail/tail_file.c

Comment thread plugins/in_tail/tail_file.c
Refactors the cleanup paths exercised when flb_tail_file_append() fails
partway through file setup, and plugs an fs_backend leak in the
flb_tail_file_to_event() rollback.

Cleanup design:

- flb_tail_file_append() now uses Linux-kernel-style staged-goto cleanup
  labels in reverse acquisition order. Each acquisition pairs with one
  label that undoes exactly that step and falls through to the previous
  one, so adding a new field only requires inserting the acquisition and
  its matching label rather than updating two divergent free paths.
- The matching cmt_files_opened increment moves to just before
  return 0, so any construction failure stays balanced with
  cmt_files_closed (which still fires only from flb_tail_file_remove()
  for fully constructed files).
- Defensive NULL checks on flb_sds_create_size() for dmode_buf and
  dmode_lastline so an OOM in docker mode no longer crashes later.

fs_backend leak fix:

- flb_tail_fs_stat_remove() now releases file->fs_backend whenever
  non-NULL and clears it, instead of gating on tail_mode == FLB_TAIL_EVENT.
  The previous gate leaked the stat-backend allocation in
  flb_tail_file_to_event() when flb_hash_table_add() failed after
  flb_tail_fs_add() had already allocated it but tail_mode was still
  FLB_TAIL_STATIC. The inotify backend already used watch_fd for
  idempotency.

Other fixes preserved from the prior revision:

- flb_hash_table_add() return values are now checked in static/event
  registration paths.
- files_static_count is decremented in flb_tail_file_remove() when a
  static file is removed.
- flb_tail_file_to_event() registers in event_hash before swapping the
  list, so an event-hash failure doesn't leave the file half-promoted.

Testing:

- cmake -S . -B build -DFLB_TESTS_RUNTIME=Off -DFLB_TESTS_INTERNAL=Off -DFLB_DEBUG=On
- cmake --build build -j8 --target fluent-bit-bin

Signed-off-by: Sandy Chen <Yuxuan.Chen@morganstanley.com>
@sandy2008 sandy2008 force-pushed the codex/in-tail-file-append-cleanup branch from 3c7d8f7 to 1f032cc Compare April 27, 2026 11:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant