Skip to content

Conversation

@kalavt
Copy link

@kalavt kalavt commented Dec 24, 2025


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • New Features

    • Parquet streaming encoder & schema API; S3 streaming, multipart, queue and blob subsystems; presigned‑URL S3 auth; Snappy framed compression; new S3 key variables ($FILE_PATH, $FILE_NAME); streaming MsgPack→Parquet and streaming compression processors.
  • Bug Fixes / Reliability

    • Improved multipart robustness, recovery/resume, auth retry, URI/path encoding, buffer handling, and clearer format/compression warnings.
  • Tests

    • Large expansion of unit and runtime tests covering Parquet, S3 (multipart/queue/blob), streaming, formats, compression, and edge cases.
  • Deprecated

    • Legacy Arrow compression alias marked deprecated — use format=parquet.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 24, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Walkthrough

Adds a Parquet encoder and Arrow/Parquet discovery; refactors Out S3 into modular subsystems (multipart, auth/presigned, blob, queue, stream, store); extends blob DB and S3 key templating; adds Snappy framed compression and a C++ Parquet streaming implementation; and introduces extensive tests and CI/CMake updates.

Changes

Cohort / File(s) Summary
Build / Parquet infra
CMakeLists.txt, cmake/parquet.cmake, src/CMakeLists.txt, tests/internal/CMakeLists.txt, .github/workflows/unit-tests.yaml
Add FLB_PARQUET_ENCODER option, new Arrow/Parquet detection module, wire ARROW/PARQUET libs into build/tests, update CI matrix and apt packages.
Parquet implementation & API
include/fluent-bit/flb_parquet.h, src/flb_parquet_impl.cpp, tests/internal/parquet.c
New public Parquet C API and opaque schema handle; C++ streaming MsgPack→Parquet implementation; extensive unit tests for conversion/validation.
Out S3 core & API surface
plugins/out_s3/s3.c, plugins/out_s3/s3.h, plugins/out_s3/CMakeLists.txt
Migrate to flb_sds_t buffers, change upload/construct/send signatures, add format/schema_str/cached schema, update upload APIs and constants (multipart sizing).
Multipart subsystem
plugins/out_s3/s3_multipart.c, plugins/out_s3/s3_multipart.h
New multipart lifecycle: create/initiate, upload part, complete, abort, header builders, MD5/base64 helpers, optimal part sizing and public APIs.
Auth / presigned URL client
plugins/out_s3/s3_auth.c, plugins/out_s3/s3_auth.h
New presigned-URL client with URL builders, upstream/TLS setup, presigned fetch helpers for multipart operations.
Blob subsystem & DB integration
plugins/out_s3/s3_blob.c, plugins/out_s3/s3_blob.h, include/fluent-bit/flb_blob_db.h, src/flb_blob_db.c, plugins/in_blob/blob_file.c
Blob lifecycle, part registration, delivery notifications, recovery flows; DB schema migration adds s3_key and new prepared statements and public APIs.
Upload queue & orchestration
plugins/out_s3/s3_queue.c, plugins/out_s3/s3_queue.h
New upload queue: enqueue file/part/pending entries, per-entry processing, three‑phase recovery, resume logic, and public queue APIs.
Streaming & store
plugins/out_s3/s3_stream.c, plugins/out_s3/s3_stream.h, plugins/out_s3/s3_store.c, plugins/out_s3/s3_store.h
Per-record streaming pipeline, streaming compression, fstore refactor, restoration logic, and context-free chunk opener API.
S3 plugin wiring & new sources
plugins/out_s3/*.c, plugins/out_s3/*.h
Add/reorder sources: s3_stream.c, s3_blob.c, s3_auth.c, s3_queue.c, s3_multipart.c; new headers and exported functions across out_s3.
AWS compression & Snappy
include/fluent-bit/aws/flb_aws_compress.h, src/aws/flb_aws_compress.c, include/fluent-bit/flb_snappy.h, src/flb_snappy.c, plugins/out_kinesis_firehose/firehose.c, tests/internal/aws_compress.c
Rework compression enums (add SNAPPY, remap values, deprecate Arrow mapping), add framed Snappy compression API and tests.
AWS util / S3 key templating
include/fluent-bit/flb_aws_util.h, src/aws/flb_aws_util.c, tests/internal/aws_util.c, tests/internal/fuzzers/aws_util_fuzzer.c
Centralize S3 key templating, add $FILE_PATH/$FILE_NAME, update flb_get_s3_key signature (adds file_path), add flb_aws_uri_encode_path, update tests and fuzzers.
Blob DB & SQLite APIs
include/fluent-bit/flb_blob_db.h, src/flb_blob_db.c
New SQL statements/prepared-stmt handles and APIs for parts, pending-file retrieval, part status, parts-in-progress, zombie reset; schema upgrade logic and new retrieval helpers.
Tests, mocks & helpers
tests/runtime/*, tests/internal/*, tests/include/*, tests/include/aws_client_mock.*
Large test addition: out_s3 runtime suites (config, multipart, format, errors, edge cases, queue, recovery, blob, special features), unit tests, helpers, and AWS mock improvements.
CMake / toolchain tweaks
cmake/kafka.cmake, src/aws/compression/CMakeLists.txt, src/aws/CMakeLists.txt, lib/*/CMakeLists.txt
Kafka/SASL flag changes, remove flb-aws-compress INTERFACE wiring, bump some CMake minimums, remove certain linkages, and CI toolchain adjustments.

Sequence Diagram(s)

sequenceDiagram
  participant Plugin as Out S3 plugin
  participant DB as Blob DB
  participant Store as Local Store (fstore)
  participant Auth as Auth service
  participant S3 as AWS S3

  Plugin->>DB: register file metadata & parts
  Plugin->>Store: write chunk / stream file
  Plugin->>Auth: request presigned URL (CREATE)
  Auth-->>Plugin: presigned create URL
  Plugin->>S3: initiate multipart (presigned)
  S3-->>Plugin: upload_id
  loop per part
    Plugin->>Auth: request presigned URL (UPLOAD_PART)
    Auth-->>Plugin: presigned part URL
    Plugin->>S3: upload part (PUT)
    S3-->>Plugin: ETag
    Plugin->>DB: mark part uploaded / store ETag
  end
  Plugin->>Auth: request presigned URL (COMPLETE)
  Auth-->>Plugin: presigned complete URL
  Plugin->>S3: complete multipart
  S3-->>Plugin: success/failure
  Plugin->>DB: notify delivery or trigger recovery
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested reviewers

  • edsiper
  • fujimotos
  • patrick-stephens
  • niedbalski
  • celalettin1286

Poem

🐰 I hopped through CMake and tiny queues,
Parquet carrots gleamed in rows of code,
Parts stitched by presigned lights and muse,
Blobs and streams now sing along the road,
Tests bloom — the rabbit dances, light and bold!

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.65% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 's3_output: add parquet format support' clearly and concisely describes the main change: adding Parquet format support to the S3 output plugin.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@kalavt kalavt force-pushed the feature/aws-s3-parquet branch from b4c0a26 to a821fec Compare December 24, 2025 14:01
@kalavt kalavt changed the title Refactor AWS S3 parquet s3_output: add parquet file support Dec 24, 2025
@kalavt kalavt changed the title s3_output: add parquet file support s3_output: add parquet format support Dec 24, 2025
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (10)
src/CMakeLists.txt (1)

525-528: Clarify SASL library linking scope.

The comments indicate "KAFKA SPECIFIC" for SASL library linking, but the linking appears in both the shared library and binary targets. This suggests it's a general requirement when FLB_SASL_ENABLED is true, not specific to this Parquet PR.

Consider updating the comment to be more accurate:

🔎 Suggested comment improvement
-  # KAFKA SPECIFIC - Link SASL library for shared library
+  # Link SASL library when SASL support is enabled (used by Kafka and other plugins)
   if(FLB_SASL_ENABLED)
     target_link_libraries(fluent-bit-shared sasl2)
   endif()
-  # KAFKA SPECIFIC - Link SASL library for binary
+  # Link SASL library when SASL support is enabled (used by Kafka and other plugins)
   if(FLB_SASL_ENABLED)
     target_link_libraries(fluent-bit-bin sasl2)
   endif()

Also applies to: 600-603

examples/s3_output/run-s3-parquet-test.sh (1)

38-39: Add error checking for fluent-bit binary and configuration.

The script assumes the Fluent Bit binary and configuration file exist without verification, which could lead to confusing errors.

🔎 Proposed pre-flight checks
+# Verify fluent-bit binary exists
+if [ ! -f ./build/bin/fluent-bit ]; then
+  echo "Error: fluent-bit binary not found at ./build/bin/fluent-bit"
+  echo "Please build Fluent Bit first:"
+  echo "  cd ../../build && cmake .. && make"
+  exit 1
+fi
+
+# Verify configuration file exists
+if [ ! -f fluent-bit-s3-parquet.conf ]; then
+  echo "Error: Configuration file fluent-bit-s3-parquet.conf not found"
+  exit 1
+fi
+
+# Check AWS credentials
+if [ -z "$AWS_ACCESS_KEY_ID" ] || [ -z "$AWS_SECRET_ACCESS_KEY" ]; then
+  echo "Warning: AWS credentials not set. Fluent Bit may fail to upload to S3."
+  echo "Set credentials with: export AWS_ACCESS_KEY_ID=... AWS_SECRET_ACCESS_KEY=..."
+  echo ""
+fi
+
 # Run Fluent Bit
 ./build/bin/fluent-bit -c fluent-bit-s3-parquet.conf
examples/s3_output/README.md (1)

112-114: Consider adding language identifiers to fenced code blocks.

Static analysis detected some fenced code blocks without language specifiers, which improves syntax highlighting and readability.

🔎 Minor markdown improvements
-```
+```text
 Upload Trigger = upload_timeout reached OR total_file_size reached

```diff
-```
+```text
 s3://s3-bucket-kafka-sink/fluent-bit-logs/
 ├── putobject/app/day=YYYYMMDD/      # Case 1: Small files via PutObject
 └── multipart/access/day=YYYYMMDD/   # Case 2: Large files via Multipart Upload

```diff
-```
+```text
 [parquet] Data quality summary for 53206 records:
 [parquet] Missing non-nullable fields (defaults used):
 [parquet]   field='level' count=52506
 [parquet]   field='timestamp' count=52506
</details>


Also applies to: 177-179, 270-272

</blockquote></details>
<details>
<summary>examples/s3_output/athena-queries.sql (2)</summary><blockquote>

`38-45`: **Use placeholder bucket names in example documentation.**

The hardcoded bucket name `s3-bucket-kafka-sink` appears throughout the file. Consider using a placeholder like `<your-bucket-name>` or `my-fluent-bit-bucket` to make it clearer that users should substitute their own bucket name.

---

`99-99`: **Document the placeholder date format.**

The hardcoded date `20251224` is used in all example queries. Consider adding a note at the top of the query section (around line 83) explaining that users should replace this with their actual date, similar to the note at line 346.

</blockquote></details>
<details>
<summary>cmake/parquet.cmake (2)</summary><blockquote>

`151-163`: **Consider allowing default paths as fallback.**

Using `NO_DEFAULT_PATH` exclusively may miss Arrow/Parquet installations in standard system locations not explicitly listed. Consider removing `NO_DEFAULT_PATH` or adding a second search without it as a fallback to improve detection success on systems with non-standard installations.


<details>
<summary>🔎 Suggested approach</summary>

```diff
   find_path(ARROW_INCLUDE_DIR
     NAMES arrow/api.h
     PATHS ${SEARCH_PATHS}
     PATH_SUFFIXES ${INCLUDE_SUFFIXES} include/arrow
-    NO_DEFAULT_PATH
   )

Or add a fallback search without NO_DEFAULT_PATH if the first search fails.


223-226: Unicode characters in CMake messages may cause issues.

The checkmark (✓) and cross (✗) Unicode characters in status messages may not render correctly on all terminals or CI systems. Consider using ASCII alternatives like [OK] and [MISSING].

🔎 Proposed fix
 if(ARROW_FOUND AND PARQUET_FOUND)
-  message(STATUS "✓ Arrow found: ${ARROW_LIBRARIES}")
+  message(STATUS "[OK] Arrow found: ${ARROW_LIBRARIES}")
   message(STATUS "  Include dirs: ${ARROW_INCLUDE_DIRS}")
-  message(STATUS "✓ Parquet found: ${PARQUET_LIBRARIES}")
+  message(STATUS "[OK] Parquet found: ${PARQUET_LIBRARIES}")
   message(STATUS "  Include dirs: ${PARQUET_INCLUDE_DIRS}")
 else()
-  message(STATUS "✗ Arrow/Parquet not found")
+  message(STATUS "[MISSING] Arrow/Parquet not found")
examples/s3_output/fluent-bit-s3-parquet.conf (1)

145-145: Very long schema_str line reduces readability.

The schema_str value is a single long JSON line that's difficult to read and maintain. Consider adding a comment above it with a formatted version of the schema for documentation purposes, or noting in the README that multi-line values would need escaping.

src/flb_parquet_impl.cpp (1)

250-251: Appending nullptr to BinaryBuilder may cause undefined behavior.

The append_default_value for BINARY type passes nullptr with size 0. While this may work, it's safer to use an empty byte array explicitly.

🔎 Proposed fix
         case arrow::Type::BINARY:
-            return static_cast<arrow::BinaryBuilder*>(builder)->Append(static_cast<const uint8_t*>(nullptr), 0).ok();
+            return static_cast<arrow::BinaryBuilder*>(builder)->Append(reinterpret_cast<const uint8_t*>(""), 0).ok();
plugins/out_s3/s3.c (1)

49-56: Duplicate forward declarations.

These forward declarations at lines 49-56 duplicate those at lines 86-91. Remove the duplicates to avoid confusion.

🔎 Proposed fix
-/* Forward declarations */
-static struct multipart_upload *get_upload(struct flb_s3 *ctx,
-                                           const char *tag, int tag_len);
-static struct multipart_upload *create_upload(struct flb_s3 *ctx,
-                                              const char *tag, int tag_len,
-                                              time_t file_first_log_time);
-static flb_sds_t flb_pack_msgpack_extract_log_key(void *out_context, const char *data,
-                                                  uint64_t bytes, struct flb_config *config);
-
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7b0c1ae and b4c0a26.

📒 Files selected for processing (19)
  • .github/workflows/unit-tests.yaml
  • CMakeLists.txt
  • cmake/parquet.cmake
  • examples/s3_output/README.md
  • examples/s3_output/athena-queries.sql
  • examples/s3_output/fluent-bit-s3-parquet.conf
  • examples/s3_output/run-s3-parquet-test.sh
  • include/fluent-bit/aws/flb_aws_compress.h
  • include/fluent-bit/flb_parquet.h
  • plugins/out_s3/s3.c
  • plugins/out_s3/s3.h
  • src/CMakeLists.txt
  • src/aws/CMakeLists.txt
  • src/aws/compression/CMakeLists.txt
  • src/aws/compression/arrow/CMakeLists.txt
  • src/aws/compression/arrow/compress.c
  • src/aws/compression/arrow/compress.h
  • src/aws/flb_aws_compress.c
  • src/flb_parquet_impl.cpp
💤 Files with no reviewable changes (5)
  • src/aws/compression/CMakeLists.txt
  • src/aws/compression/arrow/compress.c
  • src/aws/compression/arrow/compress.h
  • src/aws/CMakeLists.txt
  • src/aws/compression/arrow/CMakeLists.txt
🧰 Additional context used
🧠 Learnings (16)
📓 Common learnings
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.

Applied to files:

  • examples/s3_output/README.md
  • include/fluent-bit/flb_parquet.h
📚 Learning: 2025-08-07T10:15:46.187Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10691
File: .github/workflows/unit-tests.yaml:94-101
Timestamp: 2025-08-07T10:15:46.187Z
Learning: Apache Arrow latest packages have changing checksums, making hardcoded checksum verification impractical for CI workflows that download the latest version.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • cmake/parquet.cmake
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • src/CMakeLists.txt
  • cmake/parquet.cmake
  • CMakeLists.txt
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • src/CMakeLists.txt
  • cmake/parquet.cmake
  • CMakeLists.txt
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • src/CMakeLists.txt
  • src/aws/flb_aws_compress.c
  • CMakeLists.txt
  • plugins/out_s3/s3.c
  • include/fluent-bit/aws/flb_aws_compress.h
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • src/CMakeLists.txt
  • src/aws/flb_aws_compress.c
  • CMakeLists.txt
  • plugins/out_s3/s3.c
  • include/fluent-bit/aws/flb_aws_compress.h
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • src/CMakeLists.txt
  • src/aws/flb_aws_compress.c
  • CMakeLists.txt
  • plugins/out_s3/s3.c
  • include/fluent-bit/aws/flb_aws_compress.h
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • src/CMakeLists.txt
  • src/aws/flb_aws_compress.c
  • CMakeLists.txt
  • plugins/out_s3/s3.c
  • include/fluent-bit/aws/flb_aws_compress.h
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • src/CMakeLists.txt
  • src/aws/flb_aws_compress.c
  • CMakeLists.txt
  • include/fluent-bit/aws/flb_aws_compress.h
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • src/CMakeLists.txt
  • CMakeLists.txt
📚 Learning: 2025-12-22T05:39:02.291Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11250
File: lib/ripser-1.2.1/Makefile:1-18
Timestamp: 2025-12-22T05:39:02.291Z
Learning: In fluent/fluent-bit, the Makefile at lib/ripser-1.2.1/Makefile is imported from upstream Ripser and is not used in the actual build process. The project uses CMake for building (lib/ripser-1.2.1/CMakeLists.txt), so changes to the imported Makefile are not necessary.

Applied to files:

  • src/CMakeLists.txt
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.

Applied to files:

  • src/CMakeLists.txt
  • src/aws/flb_aws_compress.c
  • CMakeLists.txt
  • plugins/out_s3/s3.c
  • include/fluent-bit/aws/flb_aws_compress.h
📚 Learning: 2025-09-14T09:46:09.531Z
Learnt from: aminvakil
Repo: fluent/fluent-bit PR: 10844
File: conf/fluent-bit:13-15
Timestamp: 2025-09-14T09:46:09.531Z
Learning: For fluent-bit Debian packaging, /opt/fluent-bit/bin/ is the appropriate installation path since the package may be installed from non-official Debian sources, making /opt compliant with FHS for optional software packages.

Applied to files:

  • src/CMakeLists.txt
📚 Learning: 2025-09-04T07:28:37.083Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10832
File: src/wasm/CMakeLists.txt:112-131
Timestamp: 2025-09-04T07:28:37.083Z
Learning: In fluent-bit CMake files, the user cosmo0920 prefers treating Git as a command rather than a package, emphasizing that Git is not a pkg-config retrievable package but just a command.

Applied to files:

  • src/CMakeLists.txt
📚 Learning: 2025-09-04T12:35:22.872Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:1339-1344
Timestamp: 2025-09-04T12:35:22.872Z
Learning: In the Fluent Bit S3 plugin, the user prefers to maintain current retry_limit behavior without special handling for FLB_OUT_RETRY_UNLIMITED (-1), as there's no documentation indicating -1 should be used for infinite retries and consistency with current logic is preferred.

Applied to files:

  • plugins/out_s3/s3.c
🧬 Code graph analysis (3)
src/flb_parquet_impl.cpp (1)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
src/aws/flb_aws_compress.c (2)
src/flb_gzip.c (1)
  • flb_gzip_compress (157-252)
src/flb_zstd.c (1)
  • flb_zstd_compress (33-57)
plugins/out_s3/s3.h (1)
plugins/out_s3/s3.c (1)
  • s3_put_object (1663-1800)
🪛 Cppcheck (2.19.0)
plugins/out_s3/s3.c

[error] 791-791: failed to expand 'flb_plg_error', it is invalid to use a preprocessor directive as macro parameter

(syntaxError)

🪛 LanguageTool
examples/s3_output/README.md

[grammar] ~56-~56: Ensure spelling is correct
Context: ..., 201, 404, 500, 401) - Response times (2ms to 5000ms) - Client information (IP, us...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~465-~465: Ensure spelling is correct
Context: ...les uploaded when reaching 50MB (before 60min timeout) Verification Steps: 1. R...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.18.1)
examples/s3_output/README.md

112-112: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


177-177: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


270-270: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (14)
include/fluent-bit/flb_parquet.h (1)

82-85: LGTM! Well-documented API with clear ownership semantics.

The function signature and documentation are comprehensive and clear. The API design properly communicates:

  • Memory ownership (caller must free with flb_free)
  • Error handling (NULL on failure)
  • Schema format expectations (Arrow JSON schema)
  • Supported types and examples
include/fluent-bit/aws/flb_aws_compress.h (2)

33-46: Clear deprecation notice and migration guidance.

The documentation clearly separates true compression algorithms from format converters and provides excellent guidance:

  • ARROW marked as deprecated with rationale
  • PARQUET (5) directs users to use format=parquet instead
  • Clear distinction between compression and format conversion

This will help users migrate away from the deprecated patterns.


30-31: No backward compatibility issue—this renumbering is safe.

The compression constants are internal implementation details used only at runtime. Configuration files use string values (compression=gzip, compression=snappy, compression=zstd) which are mapped to numeric constants via flb_aws_compression_get_type(). Since the numeric constant values are never persisted to storage and users never interact with them directly, renumbering from FLB_AWS_COMPRESS_ZSTD=4 to FLB_AWS_COMPRESS_ZSTD=3 does not break backward compatibility.

Likely an incorrect or invalid review comment.

CMakeLists.txt (1)

284-286: LGTM! Consistent encoder enablement pattern.

When FLB_ALL is enabled, both Avro and Parquet encoders are automatically enabled, following the existing pattern for optional features. This provides a consistent user experience.

src/CMakeLists.txt (1)

157-166: LGTM! Proper C++ integration for Parquet implementation.

The Parquet encoder support is correctly integrated:

  • Source file added conditionally based on FLB_PARQUET_ENCODER
  • Explicit CXX language property set for the C++ implementation file
  • Follows the existing pattern for optional features like Avro encoder
src/aws/flb_aws_compress.c (2)

31-37: LGTM! Correct adapter pattern for Snappy compression.

The wrapper function properly adapts flb_snappy_compress (which uses char* pointers) to the standard compression interface (which uses void* pointers). This follows the same pattern as the existing flb_gzip_compress and flb_zstd_compress functions shown in the relevant code snippets.


46-76: Excellent documentation and clean separation of concerns.

The updated comments clearly distinguish:

  1. True compression algorithms (gzip, snappy, zstd)
  2. Deprecated format converters (ARROW, PARQUET)
  3. Guidance to use format=parquet instead of compression=parquet

The removal of Arrow/Parquet compression options from this module aligns with the PR's goal of decoupling format conversion from the compression subsystem.

examples/s3_output/README.md (1)

1-518: Exceptional documentation! Comprehensive and production-ready guidance.

This README provides outstanding documentation that covers:

  • Clear test scenarios with realistic use cases
  • Detailed explanation of upload triggers (upload_timeout vs total_file_size)
  • Schema design best practices with nullable/non-nullable guidance
  • Data quality monitoring and troubleshooting
  • Performance testing methodology
  • Athena integration examples

The documentation will significantly help users understand Parquet encoder usage in production scenarios.

plugins/out_s3/s3.h (1)

234-235: LGTM - API signature updated for SDS-based buffers.

The signature change from char* body to flb_sds_t body aligns with the SDS-based buffering approach used throughout the refactored code, ensuring consistent memory management.

examples/s3_output/fluent-bit-s3-parquet.conf (1)

104-117: LGTM - Well-documented PutObject configuration.

The configuration clearly documents the upload strategy, timeout behavior, and use case. The comments explaining the difference between upload_timeout and total_file_size are helpful for users.

src/flb_parquet_impl.cpp (1)

696-710: LGTM - Exception handling with proper cleanup.

The exception handlers correctly destroy the msgpack unpacked result and free the output buffer if allocated, preventing memory leaks on error paths.

plugins/out_s3/s3.c (3)

752-774: LGTM - Deprecated compression values handled gracefully.

The code properly warns users about deprecated arrow/parquet compression values while still supporting them by mapping to the new format=parquet approach. The build-time guard ensures helpful error messages when Parquet isn't available.


1583-1630: LGTM - Parquet conversion with proper error handling.

The Parquet conversion path validates schema_str is present, handles conversion failures gracefully, and properly manages memory by converting the Parquet buffer to flb_sds_t for consistent lifecycle management.


3951-3962: Storing raw msgpack improves efficiency for Parquet.

Storing raw msgpack data and deferring format conversion to upload time is the right approach for columnar formats like Parquet, which benefit from batching records together for better compression and schema inference.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugins/out_s3/s3.c (1)

3701-3725: Leak of chunk SDS in unit_test_flush() path.

In unit_test_flush() you now treat chunk as flb_sds_t, but never destroy it:

s3_store_buffer_put(..., chunk, ...);
ret = construct_request_buffer(ctx, chunk, upload_file, &buffer, &buffer_size);
...
ret = upload_data(..., buffer, ...);
flb_sds_destroy(buffer);
FLB_OUTPUT_RETURN(ret);

construct_request_buffer() only reads from chunk and does not take ownership, so unit_test_flush() should flb_sds_destroy(chunk) once that call returns. As written, every test‑mode flush leaks the SDS buffer.

Suggested minimal fix
 static void unit_test_flush(void *out_context, struct s3_file *upload_file,
                             const char *tag, int tag_len, flb_sds_t chunk,
                             int chunk_size, struct multipart_upload *m_upload_file,
                             time_t file_first_log_time)
 {
     int ret;
     flb_sds_t buffer = NULL;
     size_t buffer_size;
     struct flb_s3 *ctx = out_context;

     s3_store_buffer_put(ctx, upload_file, tag, tag_len,
                         chunk, (size_t) chunk_size, file_first_log_time);
-    ret = construct_request_buffer(ctx, chunk, upload_file, &buffer, &buffer_size);
+    ret = construct_request_buffer(ctx, chunk, upload_file, &buffer, &buffer_size);
+    flb_sds_destroy(chunk);
     if (ret < 0) {
         flb_plg_error(ctx->ins, "Could not construct request buffer for %s",
                       upload_file->file_path);
         FLB_OUTPUT_RETURN(FLB_RETRY);
     }

     ret = upload_data(ctx, upload_file, m_upload_file, buffer, buffer_size, tag, tag_len);
     flb_sds_destroy(buffer);

     FLB_OUTPUT_RETURN(ret);
 }
♻️ Duplicate comments (7)
examples/s3_output/run-s3-parquet-test.sh (1)

6-16: Logic issue: directories created then immediately cleaned.

The script creates directories (lines 8-11) and then immediately removes their contents (line 15). This sequence is confusing and redundant. If the intent is to ensure clean directories exist, reverse the order or use mkdir -p alone (which is idempotent).

🔎 Proposed fix
 # Create temporary directories
 STORE_DIR="/tmp/fluent-bit-test"
+
+# Clean old data first (if exists)
+if [ -d "$STORE_DIR" ]; then
+  rm -rf "$STORE_DIR"
+fi
+
+# Create fresh directories
 mkdir -p "$STORE_DIR/parquet-gzip-simple"
 mkdir -p "$STORE_DIR/json-snappy-complex-low"
 mkdir -p "$STORE_DIR/json-gzip-complex-medium"
 mkdir -p "$STORE_DIR/json-zstd-complex-high"
 echo "✓ Created store directories"
-
-# Clean old data
-rm -rf "$STORE_DIR"/*/
-echo "✓ Cleaned old data"

This also addresses the safety concern from the previous review about rm -rf on a potentially unset variable, by checking directory existence first.

plugins/out_s3/s3.h (1)

123-124: Unused apply_outer_compression field in struct flb_s3.

apply_outer_compression is not referenced anywhere in plugins/out_s3/s3.c (compression behavior is driven by ctx->format != FLB_S3_FORMAT_PARQUET instead), so this field is dead state and adds confusion. Consider removing it or wiring it into the compression path; otherwise future readers might assume it actually controls outer compression.

CMakeLists.txt (1)

202-202: Update Parquet C++ standard from C++11 to C++17.

The Parquet encoder block enables C++ and forces CMAKE_CXX_STANDARD 11, but current Apache Arrow/Parquet C++ releases require at least C++17. Keeping this at 11 will either fail the build or subtly misconfigure Arrow compilation when FLB_PARQUET_ENCODER is ON. Please bump this to 17 (and consider aligning the earlier simdutf CMAKE_CXX_STANDARD as well).

You should double‑check against the Arrow/Parquet version you target (their build docs list the minimum C++ standard) and confirm that setting CMAKE_CXX_STANDARD 17 is required when FLB_PARQUET_ENCODER is enabled.

Also applies to: 278-287, 1311-1316

plugins/out_s3/s3.c (1)

4210-4217: compression config map still writes a string into an int field.

The compression entry is declared as FLB_CONFIG_MAP_STR but its offset points to struct flb_s3::compression (an int):

{
  FLB_CONFIG_MAP_STR, "compression", NULL,
  0, FLB_TRUE, offsetof(struct flb_s3, compression),
  ...
},

During flb_output_config_map_set(), this causes the parser to store a char * into an int slot, corrupting compression and adjacent fields. You now parse compression explicitly via flb_output_get_property() in cb_s3_init(), so this map binding is both unsafe and unnecessary.

Recommend either:

  • Removing the struct offset (FLB_FALSE, 0) and treating compression as a pure property, or
  • Introducing a char *compression_str field and mapping the string there.
src/flb_parquet_impl.cpp (3)

138-151: Add overflow handling for uint64 → int64 in convert_to_int64().

MSGPACK_OBJECT_POSITIVE_INTEGER values are cast directly from uint64_t to int64_t:

if (obj->type == MSGPACK_OBJECT_POSITIVE_INTEGER) {
    return builder->Append(static_cast<int64_t>(obj->via.u64));
}

Values greater than INT64_MAX will silently wrap, corrupting data in the Parquet file. This mirrors the earlier int32 bug you already guarded against.

You should add an overflow check similar to convert_to_int32() and either clamp, append a default, or return an error so the caller can account for it.

Please confirm how Arrow/Parquet is expected to handle out‑of‑range uint64 values (e.g., whether to clamp, treat as error, or promote to unsigned logical type) based on the versions you target, and align this converter with that behavior.


213-226: Timestamp conversion ignores schema time unit.

convert_to_timestamp() always treats the input as seconds and multiplies by 1,000,000 to get microseconds:

return builder->Append(static_cast<int64_t>(obj->via.u64) * 1000000);

However, when parsing the schema you support "unit": "s"|"ms"|"us"|"ns" and build arrow::timestamp(time_unit) accordingly. The converter never looks at that unit, so a schema declaring ms or ns will still receive values scaled as if they were seconds→µs.

You likely want to:

  • Downcast type (or the builder’s type) to arrow::TimestampType,
  • Inspect its unit(),
  • Apply the appropriate multiplier/divider so the stored values actually match the declared unit.

Double‑check the Arrow TimestampType semantics for units (SECOND, MILLI, MICRO, NANO) and ensure the scaling here matches how your upstream logs encode timestamps (seconds vs ms vs ns).

Also applies to: 386-424, 505-555


301-321: Storing msgpack_object instances across msgpack_unpack_next() iterations may rely on msgpack zone semantics.

You collect msgpack_object instances into a std::vector and process them after the unpack loop:

std::vector<msgpack_object> records;
...
while (msgpack_unpack_next(&result, ...) == MSGPACK_UNPACK_SUCCESS) {
    ...
    records.push_back(array.ptr[i]);    // or result.data
}
...
for (auto &record : records) {
    ...
}

These msgpack_objects contain pointers into the msgpack_unpacked zone. This is fine only if msgpack_unpack_next() never invalidates existing zone allocations while reusing the same msgpack_unpacked across iterations; some msgpack documentation/patterns assume you consume each object immediately.

If the zone is reused or compacted between iterations, records could end up with dangling pointers once more data is unpacked.

Please verify against the msgpack‑c docs for msgpack_unpacked and msgpack_unpack_next() whether it is safe to keep msgpack_object copies after subsequent msgpack_unpack_next() calls, or whether you must either:

  • Process each record inside the loop, or
  • Deep‑copy the record into your own storage (including map/array/string contents) before the next unpack.

Also applies to: 474-555

🧹 Nitpick comments (9)
examples/s3_output/run-s3-parquet-test.sh (1)

38-39: Relative paths may cause execution failures.

The script uses relative paths for both the binary (./build/bin/fluent-bit) and config file (fluent-bit-s3-parquet.conf). This will fail unless run from the repository root.

Consider adding a working directory check or using $SCRIPT_DIR to make paths relative to the script location.

🔎 Proposed improvement
+# Get script directory for relative paths
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
+
 # Run Fluent Bit
-./build/bin/fluent-bit -c fluent-bit-s3-parquet.conf
+"$REPO_ROOT/build/bin/fluent-bit" -c "$SCRIPT_DIR/fluent-bit-s3-parquet.conf"
examples/s3_output/athena-queries.sql (2)

38-44: Hardcoded bucket name - consider adding placeholder instructions.

The S3 bucket s3-bucket-kafka-sink is hardcoded throughout. While acceptable for examples, consider adding a comment at the top noting that users should replace this with their own bucket name.


99-101: Hardcoded date filter in example queries.

All queries use day='20251224' which matches "today's date" per the context. For a reusable example, consider using a dynamic date or adding a note about updating this value.

examples/s3_output/README.md (3)

112-114: Add language specifier to fenced code block.

Per markdownlint, fenced code blocks should have a language specified. This block shows a logical expression.

🔎 Proposed fix
-```
+```text
 Upload Trigger = upload_timeout reached OR total_file_size reached
</details>

---

`177-181`: **Add language specifier to fenced code block.**

Per markdownlint, fenced code blocks should have a language specified for the directory tree.

<details>
<summary>🔎 Proposed fix</summary>

```diff
-```
+```text
 s3://s3-bucket-kafka-sink/fluent-bit-logs/
 ├── putobject/app/day=YYYYMMDD/      # Case 1: Small files via PutObject
 └── multipart/access/day=YYYYMMDD/   # Case 2: Large files via Multipart Upload
</details>

---

`270-275`: **Add language specifier to fenced code block.**

Per markdownlint, fenced code blocks should have a language specified for log output examples.

<details>
<summary>🔎 Proposed fix</summary>

```diff
-```
+```text
 [parquet] Data quality summary for 53206 records:
 [parquet] Missing non-nullable fields (defaults used):
 [parquet]   field='level' count=52506
 [parquet]   field='timestamp' count=52506
</details>

</blockquote></details>
<details>
<summary>cmake/parquet.cmake (1)</summary><blockquote>

`150-163`: **Consider adding default path fallback for Arrow discovery.**

The `NO_DEFAULT_PATH` option prevents CMake from searching standard system paths. This might miss Arrow installations in non-standard but common locations not listed in `SEARCH_PATHS`. Consider removing `NO_DEFAULT_PATH` or adding a secondary search without it as a fallback.

<details>
<summary>🔎 Proposed improvement</summary>

```diff
     find_path(ARROW_INCLUDE_DIR
       NAMES arrow/api.h
       PATHS ${SEARCH_PATHS}
       PATH_SUFFIXES ${INCLUDE_SUFFIXES} include/arrow
-      NO_DEFAULT_PATH
     )

     find_library(ARROW_LIBRARY
       NAMES ${ARROW_LIB_NAMES}
       PATHS ${SEARCH_PATHS}
       PATH_SUFFIXES ${LIB_SUFFIXES}
-      NO_DEFAULT_PATH
     )

Alternatively, keep NO_DEFAULT_PATH for the first search and add a fallback search without it if the first fails.

include/fluent-bit/flb_parquet.h (2)

23-25: Consider removing unused include.

The header includes <fluent-bit/flb_sds.h> but the function signature doesn't use flb_sds_t. If this is not used elsewhere in the implementation exposed by this header, consider removing it to reduce compilation dependencies.

#!/bin/bash
# Check if flb_sds types are used in the parquet implementation or header
rg -n "flb_sds" src/flb_parquet_impl.cpp include/fluent-bit/flb_parquet.h

82-85: Missing extern "C" guard for C++ compatibility.

Since this header will be included by C++ code (flb_parquet_impl.cpp), consider adding extern "C" guards to ensure proper C linkage when compiled as C++.

🔎 Proposed fix
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 void *flb_msgpack_raw_to_parquet(const void *in_buf, size_t in_size,
                                   const char *schema_str,
                                   int compression,
                                   size_t *out_size);

+#ifdef __cplusplus
+}
+#endif
+
 #endif
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b4c0a26 and a821fec.

📒 Files selected for processing (19)
  • .github/workflows/unit-tests.yaml
  • CMakeLists.txt
  • cmake/parquet.cmake
  • examples/s3_output/README.md
  • examples/s3_output/athena-queries.sql
  • examples/s3_output/fluent-bit-s3-parquet.conf
  • examples/s3_output/run-s3-parquet-test.sh
  • include/fluent-bit/aws/flb_aws_compress.h
  • include/fluent-bit/flb_parquet.h
  • plugins/out_s3/s3.c
  • plugins/out_s3/s3.h
  • src/CMakeLists.txt
  • src/aws/CMakeLists.txt
  • src/aws/compression/CMakeLists.txt
  • src/aws/compression/arrow/CMakeLists.txt
  • src/aws/compression/arrow/compress.c
  • src/aws/compression/arrow/compress.h
  • src/aws/flb_aws_compress.c
  • src/flb_parquet_impl.cpp
💤 Files with no reviewable changes (5)
  • src/aws/compression/arrow/compress.h
  • src/aws/compression/arrow/CMakeLists.txt
  • src/aws/compression/CMakeLists.txt
  • src/aws/CMakeLists.txt
  • src/aws/compression/arrow/compress.c
🚧 Files skipped from review as they are similar to previous changes (2)
  • .github/workflows/unit-tests.yaml
  • examples/s3_output/fluent-bit-s3-parquet.conf
🧰 Additional context used
🧠 Learnings (15)
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.

Applied to files:

  • src/aws/flb_aws_compress.c
  • src/CMakeLists.txt
  • CMakeLists.txt
  • plugins/out_s3/s3.c
  • include/fluent-bit/aws/flb_aws_compress.h
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

  • src/aws/flb_aws_compress.c
  • src/CMakeLists.txt
  • CMakeLists.txt
  • src/flb_parquet_impl.cpp
  • plugins/out_s3/s3.c
  • include/fluent-bit/aws/flb_aws_compress.h
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

  • src/aws/flb_aws_compress.c
  • src/CMakeLists.txt
  • CMakeLists.txt
  • plugins/out_s3/s3.c
  • include/fluent-bit/aws/flb_aws_compress.h
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.

Applied to files:

  • src/aws/flb_aws_compress.c
  • src/CMakeLists.txt
  • CMakeLists.txt
  • plugins/out_s3/s3.c
  • include/fluent-bit/aws/flb_aws_compress.h
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.

Applied to files:

  • src/aws/flb_aws_compress.c
  • src/CMakeLists.txt
  • CMakeLists.txt
  • plugins/out_s3/s3.c
  • include/fluent-bit/aws/flb_aws_compress.h
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.

Applied to files:

  • src/aws/flb_aws_compress.c
  • src/CMakeLists.txt
  • CMakeLists.txt
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.

Applied to files:

  • src/CMakeLists.txt
  • CMakeLists.txt
  • cmake/parquet.cmake
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.

Applied to files:

  • src/CMakeLists.txt
  • CMakeLists.txt
  • cmake/parquet.cmake
📚 Learning: 2025-12-22T05:39:02.291Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11250
File: lib/ripser-1.2.1/Makefile:1-18
Timestamp: 2025-12-22T05:39:02.291Z
Learning: In fluent/fluent-bit, the Makefile at lib/ripser-1.2.1/Makefile is imported from upstream Ripser and is not used in the actual build process. The project uses CMake for building (lib/ripser-1.2.1/CMakeLists.txt), so changes to the imported Makefile are not necessary.

Applied to files:

  • src/CMakeLists.txt
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

  • src/CMakeLists.txt
  • CMakeLists.txt
📚 Learning: 2025-09-14T09:46:09.531Z
Learnt from: aminvakil
Repo: fluent/fluent-bit PR: 10844
File: conf/fluent-bit:13-15
Timestamp: 2025-09-14T09:46:09.531Z
Learning: For fluent-bit Debian packaging, /opt/fluent-bit/bin/ is the appropriate installation path since the package may be installed from non-official Debian sources, making /opt compliant with FHS for optional software packages.

Applied to files:

  • src/CMakeLists.txt
📚 Learning: 2025-09-04T07:28:37.083Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10832
File: src/wasm/CMakeLists.txt:112-131
Timestamp: 2025-09-04T07:28:37.083Z
Learning: In fluent-bit CMake files, the user cosmo0920 prefers treating Git as a command rather than a package, emphasizing that Git is not a pkg-config retrievable package but just a command.

Applied to files:

  • src/CMakeLists.txt
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.

Applied to files:

  • include/fluent-bit/flb_parquet.h
  • examples/s3_output/README.md
📚 Learning: 2025-09-04T12:35:22.872Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:1339-1344
Timestamp: 2025-09-04T12:35:22.872Z
Learning: In the Fluent Bit S3 plugin, the user prefers to maintain current retry_limit behavior without special handling for FLB_OUT_RETRY_UNLIMITED (-1), as there's no documentation indicating -1 should be used for infinite retries and consistency with current logic is preferred.

Applied to files:

  • plugins/out_s3/s3.c
📚 Learning: 2025-08-07T10:15:46.187Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10691
File: .github/workflows/unit-tests.yaml:94-101
Timestamp: 2025-08-07T10:15:46.187Z
Learning: Apache Arrow latest packages have changing checksums, making hardcoded checksum verification impractical for CI workflows that download the latest version.

Applied to files:

  • cmake/parquet.cmake
🧬 Code graph analysis (4)
src/aws/flb_aws_compress.c (2)
src/flb_gzip.c (1)
  • flb_gzip_compress (157-252)
src/flb_zstd.c (1)
  • flb_zstd_compress (33-57)
src/flb_parquet_impl.cpp (1)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
include/fluent-bit/flb_parquet.h (1)
src/flb_parquet_impl.cpp (2)
  • flb_msgpack_raw_to_parquet (265-711)
  • flb_msgpack_raw_to_parquet (265-268)
plugins/out_s3/s3.h (1)
plugins/out_s3/s3.c (1)
  • s3_put_object (1663-1800)
🪛 Cppcheck (2.19.0)
plugins/out_s3/s3.c

[error] 791-791: failed to expand 'flb_plg_error', it is invalid to use a preprocessor directive as macro parameter

(syntaxError)

🪛 LanguageTool
examples/s3_output/README.md

[grammar] ~56-~56: Ensure spelling is correct
Context: ..., 201, 404, 500, 401) - Response times (2ms to 5000ms) - Client information (IP, us...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~465-~465: Ensure spelling is correct
Context: ...les uploaded when reaching 50MB (before 60min timeout) Verification Steps: 1. R...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.18.1)
examples/s3_output/README.md

112-112: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


177-177: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


270-270: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (19)
src/aws/flb_aws_compress.c (2)

31-37: LGTM! Wrapper function correctly adapts the signature.

The wrapper properly casts between void* and char* to adapt flb_snappy_compress's signature to the uniform compression interface. This follows the same pattern used for gzip and zstd.


45-76: Good refactoring to clarify compression algorithms vs format converters.

The updated comments clearly distinguish true compression algorithms (gzip, snappy, zstd) from format converters (Arrow/Parquet). Removing ARROW/PARQUET from the runtime compression_options table while keeping deprecated macros in the header is a clean backward-compatible approach.

One minor observation: the comment on line 59 ("FLB_AWS_COMPRESS_NONE which is 0 is reserved for array footer") is now orphaned after the blank line. Consider moving it closer to the footer entry on line 77.

include/fluent-bit/aws/flb_aws_compress.h (2)

33-46: Good deprecation strategy with clear documentation.

The comments clearly explain that ARROW is deprecated and PARQUET should use format=parquet instead. Keeping the macros prevents compilation errors for any downstream code that references them.


28-31: No backward compatibility issue from ZSTD renumbering.

Compression type configuration uses string keywords ("gzip", "snappy", "zstd"), not numeric values. The numeric constants (0, 1, 2, 3, etc.) are internal implementation details used only at runtime for comparisons and branching. The flb_aws_compression_get_type() function converts configuration strings to these numeric types at startup. Since numeric values are never persisted, serialized, or exposed to external systems, renumbering internal constants has no impact on user configurations or backward compatibility.

cmake/parquet.cmake (1)

222-226: Good: Clear status reporting with emoji indicators.

The status messages with ✓ and ✗ symbols provide clear visual feedback during CMake configuration. The inclusion of library paths and include directories aids debugging.

src/CMakeLists.txt (3)

157-166: LGTM! Parquet encoder integration follows existing patterns.

The Parquet encoder source is conditionally added and marked as C++ correctly. This follows the same pattern as the AVRO encoder block above.


525-528: SASL linking additions appear correct for Kafka integration.

The SASL library is conditionally linked when FLB_SASL_ENABLED is set, properly handling both the shared library and binary targets. This follows standard CMake patterns.

Also applies to: 600-605


398-405: The code is correctly guarded. When if(FLB_PARQUET_ENCODER) at line 399 is true, it means the parent CMakeLists.txt verified that both ARROW_FOUND and PARQUET_FOUND were true (lines 1320-1328), and it would have set FLB_PARQUET_ENCODER OFF otherwise. The variables ARROW_LIBRARIES and PARQUET_LIBRARIES are set and exported with PARENT_SCOPE in cmake/parquet.cmake, ensuring they are available in the subdirectory scope.

include/fluent-bit/flb_parquet.h (1)

27-85: Excellent API documentation with comprehensive schema examples.

The documentation thoroughly covers:

  • Parameter descriptions with format specifications
  • Supported type mappings
  • Usage example with proper memory management
  • Links to Arrow documentation

This level of documentation is exemplary for a public API.

plugins/out_s3/s3.h (1)

197-205: Format fields, macros, and s3_put_object prototype look consistent.

format/schema_str fields, FLB_S3_FORMAT_JSON/PARQUET macros, and the updated s3_put_object prototype (using flb_sds_t buffer) all match their usage in plugins/out_s3/s3.c and cleanly expose JSON vs Parquet behavior.

Also applies to: 234-235

plugins/out_s3/s3.c (8)

192-199: Correctly skipping Content-Encoding for Parquet payloads.

Conditioning the Content-Encoding header on (compression == gzip|zstd) && format != FLB_S3_FORMAT_PARQUET ensures Parquet objects are not wrapped in an outer HTTP compression layer while still advertising compression for JSON payloads. This matches the “internal only” compression model for Parquet and looks good.

Also applies to: 227-239


803-823: Parquet format path: good separation of internal vs outer compression, but hard‑fails without schema_str.

The Parquet branch in construct_request_buffer() correctly:

  • Avoids outer HTTP compression and uses ctx->compression only as the internal Parquet codec.
  • Requires schema_str and bails out early with a clear error if it’s missing.
  • Converts the Arrow buffer to flb_sds_t and frees the Arrow‑allocated memory.

This behavior is sound: format‑specific requirements are enforced once at conversion time, and buffer ownership is cleanly transferred to Fluent Bit.

Also applies to: 1515-1631


1217-1277: SDS compression path in upload_data() is generally correct, but relies solely on format to skip outer compression.

The new upload_data() signature and logic look good overall:

  • body is now flb_sds_t, with body_to_send/size_to_send correctly switched to the compressed SDS when compression succeeds.
  • The compression block is gated by ctx->compression != FLB_AWS_COMPRESS_NONE && ctx->format != FLB_S3_FORMAT_PARQUET, so Parquet payloads are never double‑compressed.
  • All error paths that allocate compressed_body also destroy it, and both PutObject and multipart paths clean up compressed_body after use.

No functional issues seen here; the behavior matches the intended “outer compression only for non‑Parquet” design.

Also applies to: 1325-1392


1442-1513: SDS buffer lifecycle around construct_request_buffer() / upload_data() is consistent (except unit tests).

  • put_all_chunks(), send_upload_request(), and cb_s3_upload() all:
    • Call construct_request_buffer() to obtain an flb_sds_t buffer.
    • Pass buffer into s3_put_object() / upload_data().
    • Always flb_sds_destroy(buffer) afterward, regardless of success/failure.

This gives a single clear owner for each formatted buffer and avoids leaks in the main upload paths.

Also applies to: 1991-2015, 3440-3488, 3701-3724


1515-1661: Error handling and locking in construct_request_buffer() look solid.

  • Handles the illegal (new_data == NULL && chunk == NULL) combination explicitly.
  • For file‑backed chunks, locks the s3_file only after a successful read, and reliably unlocks on every error path (realloc failure, Parquet schema error, conversion failure).
  • Frees the intermediate msgpack buffer in all paths (success and failure).
  • Ensures formatted_data/formatted_size are both valid before returning and destroys the SDS on conversion failure.

This is a good, consistent ownership pattern between the file store, msgpack buffer, and formatted SDS.


1663-1800: s3_put_object() SDS refactor is behavior‑preserving.

Switching the body parameter to flb_sds_t integrates cleanly:

  • MD5 calculation and HTTP request still operate on a contiguous buffer (body, body_size).
  • URI/key construction, seq_index updates, error paths, and logging remain unchanged.
  • No new ownership expectations are imposed on callers (they still manage the SDS lifetime).

3953-3963: Raw msgpack buffering in cb_s3_flush() is a good fit for Parquet.

Creating chunk as an SDS copy of the raw msgpack and deferring JSON/Parquet conversion to construct_request_buffer():

  • Avoids double conversion when retries occur.
  • Makes it easy to plug in additional formats later, since the on‑disk representation stays msgpack.
  • Keeps buffer ownership straightforward via buffer_chunk() / send_upload_request().

No issues spotted here.

Also applies to: 4036-4085


4219-4231: New format, schema_str, upload_timeout, and bearer token options are wired correctly.

  • format is exposed as a documented string option and interpreted via flb_output_get_property("format", ...).
  • schema_str is mapped into struct flb_s3::schema_str and used by the Parquet conversion path.
  • upload_timeout is now a proper FLB_CONFIG_MAP_TIME option and is referenced consistently in timeout checks and timer setup.
  • authorization_endpoint_bearer_token follows the existing auth option patterns.

These mappings and docs align with how the fields are used later in the file.

Also applies to: 4381-4386, 4406-4414

src/flb_parquet_impl.cpp (1)

265-299: Overall Parquet conversion pipeline and memory ownership look good.

  • Parameter validation, compression mapping, schema parsing (via cJSON), and Arrow schema construction are all checked with clear error logs.
  • Builders are created per field, and all failure paths free cJSON resources and destroy the msgpack zone before returning.
  • Record conversion tracks data‑quality stats (missing fields, type failures, int32 overflow, float→int clamps) and emits a concise summary only when issues exist.
  • Parquet writing uses BufferOutputStream (with auto‑growing buffer), closes the writer, then copies the final Arrow buffer into flb_malloc‑managed memory (output_buffer), which is freed on all exception paths.

The high‑level control flow and ownership model are solid; the remaining issues are mainly around numeric edge cases and msgpack object lifetime.

Also applies to: 331-446, 449-611, 616-692

@kalavt kalavt force-pushed the feature/aws-s3-parquet branch from a821fec to 5e868fc Compare December 24, 2025 14:26
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
CMakeLists.txt (1)

1311-1316: Parquet encoder should request C++17, not C++11

Arrow/Parquet C++ now require at least C++17; forcing CMAKE_CXX_STANDARD 11 in the Parquet block will fail with current Arrow toolchains or silently misconfigure builds. Please bump this to 17 (and keep it consistent with any other CMAKE_CXX_STANDARD overrides).

if(FLB_PARQUET_ENCODER)
  enable_language(CXX)
  set(CMAKE_CXX_STANDARD 17)
  include(cmake/parquet.cmake)
  ...
endif()
#!/bin/bash
# Verify Arrow C++ minimum standard in your environment
rg -n "CMAKE_CXX_STANDARD" CMakeLists.txt cmake -S
cmake/parquet.cmake (1)

149-157: Fix Arrow include search (include/arrow suffix still produces wrong include dir)

In the manual search:

find_path(ARROW_INCLUDE_DIR
  NAMES arrow/api.h
  PATHS ${SEARCH_PATHS}
  PATH_SUFFIXES ${INCLUDE_SUFFIXES} include/arrow
  NO_DEFAULT_PATH
)

using include/arrow as a suffix causes ARROW_INCLUDE_DIR to resolve to ${prefix}/include/arrow, so the later version check:

if(EXISTS "${ARROW_INCLUDE_DIRS}/arrow/util/config.h")

looks for ${prefix}/include/arrow/arrow/util/config.h, which is incorrect for standard Arrow layouts (${prefix}/include/arrow/...).

Drop the include/arrow suffix and rely on ${INCLUDE_SUFFIXES} (typically just include), e.g.:

find_path(ARROW_INCLUDE_DIR
  NAMES arrow/api.h
  PATHS ${SEARCH_PATHS}
  PATH_SUFFIXES ${INCLUDE_SUFFIXES}
  NO_DEFAULT_PATH
)

The Parquet search’s include/parquet suffix is fine and should remain.

Also applies to: 198-216

plugins/out_s3/s3.c (1)

744-775: Backward compatibility broken: compression=arrow|parquet now fails instead of mapping to format=parquet

The check at line 752 can never be true. When flb_aws_compression_get_type(tmp) is called with "arrow" or "parquet", it returns -1 because these strings are not in the compression_options table in flb_aws_compress.c (which only includes gzip, snappy, and zstd). This causes the code to hit the "Unknown compression type" error path instead of handling these deprecated aliases.

The constants FLB_AWS_COMPRESS_ARROW (4) and FLB_AWS_COMPRESS_PARQUET (5) are defined but unreachable, contradicting the documented deprecated behavior and the inline comments claiming support.

Handle these strings explicitly before calling flb_aws_compression_get_type:

tmp = flb_output_get_property("compression", ins);
if (tmp) {
    if (strcasecmp(tmp, "arrow") == 0 || strcasecmp(tmp, "parquet") == 0) {
        flb_plg_warn(ctx->ins,
                     "DEPRECATED: compression=%s is deprecated. Use format=parquet instead.", tmp);
#ifdef FLB_HAVE_PARQUET_ENCODER
        ctx->format = FLB_S3_FORMAT_PARQUET;
#else
        flb_plg_error(ctx->ins,
                      "Parquet format is not supported in this build. "
                      "Rebuild with -DFLB_PARQUET_ENCODER=On.");
        return -1;
#endif
    }
    else {
        ret = flb_aws_compression_get_type(tmp);
        if (ret == -1) {
            flb_plg_error(ctx->ins, "Unknown compression type: %s", tmp);
            return -1;
        }
        ctx->compression = ret;
    }
}
🧹 Nitpick comments (3)
plugins/out_s3/s3.c (2)

49-57: Remove duplicate static declarations for get_upload / create_upload

Both functions are declared twice (once near the top as forward declarations and again before their definitions). This is harmless but noisy and can confuse readers; you only need one forward declaration (or none if you reorder definitions).

Also applies to: 86-92


1218-1277: SDS-based outer compression flow is sound, but pre/post compression logging is now misleading

The refactor in upload_data to:

  • Use body_to_send / size_to_send (possibly compressed) while leaving the original body / body_size unchanged, and
  • Wrap the compressed payload in an flb_sds_t (compressed_body) with consistent cleanup,

looks correct and avoids raw malloc/free handling in callers.

However, the informational log in the "too small, using PutObject" branch still prints:

flb_plg_info(ctx->ins,
    "Pre-compression upload_chunk_size= %zu, After compression, chunk is only %zu bytes, ...",
    preCompress_size, body_size);

while the actual payload on the wire is size_to_send, not body_size. For Parquet, this message can also trigger even though no outer compression runs.

Consider:

  • Basing the "after compression" size on size_to_send, and
  • Gating this log on ctx->format != FLB_S3_FORMAT_PARQUET to avoid implying any outer compression there.

Also applies to: 1300-1314, 1330-1392

examples/s3_output/README.md (1)

63-72: Add language identifiers to fenced code blocks for better tooling support

The README content is solid and matches the new S3/Parquet behavior, but several fenced blocks (config snippets, shell commands, JSON schemas, SQL) are missing language tags. Adding them will quiet markdownlint and improve rendering/IDE support, e.g.:

```ini
[OUTPUT]
    Name s3
    ...
cmake ..
make -j8
{ "fields": [ ... ] }
SELECT ...;

Also applies to: 85-95, 144-148, 152-156, 160-164, 178-181, 209-218, 241-251, 296-314, 341-349, 361-365, 384-426, 432-438, 442-456, 460-466, 484-495

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a821fec and 5e868fc.

📒 Files selected for processing (17)
  • .github/workflows/unit-tests.yaml
  • CMakeLists.txt
  • cmake/parquet.cmake
  • examples/s3_output/README.md
  • examples/s3_output/athena-queries.sql
  • examples/s3_output/fluent-bit-s3-parquet.conf
  • examples/s3_output/run-s3-parquet-test.sh
  • include/fluent-bit/aws/flb_aws_compress.h
  • plugins/out_s3/s3.c
  • plugins/out_s3/s3.h
  • src/CMakeLists.txt
  • src/aws/CMakeLists.txt
  • src/aws/compression/CMakeLists.txt
  • src/aws/compression/arrow/CMakeLists.txt
  • src/aws/compression/arrow/compress.c
  • src/aws/compression/arrow/compress.h
  • src/aws/flb_aws_compress.c
💤 Files with no reviewable changes (5)
  • src/aws/compression/arrow/CMakeLists.txt
  • src/aws/compression/arrow/compress.c
  • src/aws/compression/arrow/compress.h
  • src/aws/CMakeLists.txt
  • src/aws/compression/CMakeLists.txt
🚧 Files skipped from review as they are similar to previous changes (4)
  • examples/s3_output/athena-queries.sql
  • examples/s3_output/fluent-bit-s3-parquet.conf
  • examples/s3_output/run-s3-parquet-test.sh
  • .github/workflows/unit-tests.yaml
🧰 Additional context used
🧠 Learnings (16)
📓 Common learnings
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.

Applied to files:

  • src/CMakeLists.txt
  • cmake/parquet.cmake
  • CMakeLists.txt
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.

Applied to files:

  • src/CMakeLists.txt
  • cmake/parquet.cmake
  • CMakeLists.txt
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

  • src/CMakeLists.txt
  • include/fluent-bit/aws/flb_aws_compress.h
  • plugins/out_s3/s3.c
  • CMakeLists.txt
  • src/aws/flb_aws_compress.c
📚 Learning: 2025-12-22T05:39:02.291Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11250
File: lib/ripser-1.2.1/Makefile:1-18
Timestamp: 2025-12-22T05:39:02.291Z
Learning: In fluent/fluent-bit, the Makefile at lib/ripser-1.2.1/Makefile is imported from upstream Ripser and is not used in the actual build process. The project uses CMake for building (lib/ripser-1.2.1/CMakeLists.txt), so changes to the imported Makefile are not necessary.

Applied to files:

  • src/CMakeLists.txt
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.

Applied to files:

  • src/CMakeLists.txt
  • include/fluent-bit/aws/flb_aws_compress.h
  • plugins/out_s3/s3.c
  • CMakeLists.txt
  • src/aws/flb_aws_compress.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

  • src/CMakeLists.txt
  • CMakeLists.txt
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.

Applied to files:

  • src/CMakeLists.txt
  • include/fluent-bit/aws/flb_aws_compress.h
  • plugins/out_s3/s3.c
  • CMakeLists.txt
  • src/aws/flb_aws_compress.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

  • src/CMakeLists.txt
  • include/fluent-bit/aws/flb_aws_compress.h
  • plugins/out_s3/s3.c
  • CMakeLists.txt
  • src/aws/flb_aws_compress.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.

Applied to files:

  • src/CMakeLists.txt
  • include/fluent-bit/aws/flb_aws_compress.h
  • plugins/out_s3/s3.c
  • CMakeLists.txt
  • src/aws/flb_aws_compress.c
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.

Applied to files:

  • src/CMakeLists.txt
  • include/fluent-bit/aws/flb_aws_compress.h
  • CMakeLists.txt
  • src/aws/flb_aws_compress.c
📚 Learning: 2025-09-14T09:46:09.531Z
Learnt from: aminvakil
Repo: fluent/fluent-bit PR: 10844
File: conf/fluent-bit:13-15
Timestamp: 2025-09-14T09:46:09.531Z
Learning: For fluent-bit Debian packaging, /opt/fluent-bit/bin/ is the appropriate installation path since the package may be installed from non-official Debian sources, making /opt compliant with FHS for optional software packages.

Applied to files:

  • src/CMakeLists.txt
📚 Learning: 2025-09-04T07:28:37.083Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10832
File: src/wasm/CMakeLists.txt:112-131
Timestamp: 2025-09-04T07:28:37.083Z
Learning: In fluent-bit CMake files, the user cosmo0920 prefers treating Git as a command rather than a package, emphasizing that Git is not a pkg-config retrievable package but just a command.

Applied to files:

  • src/CMakeLists.txt
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.

Applied to files:

  • examples/s3_output/README.md
📚 Learning: 2025-08-07T10:15:46.187Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10691
File: .github/workflows/unit-tests.yaml:94-101
Timestamp: 2025-08-07T10:15:46.187Z
Learning: Apache Arrow latest packages have changing checksums, making hardcoded checksum verification impractical for CI workflows that download the latest version.

Applied to files:

  • cmake/parquet.cmake
📚 Learning: 2025-09-04T12:35:22.872Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:1339-1344
Timestamp: 2025-09-04T12:35:22.872Z
Learning: In the Fluent Bit S3 plugin, the user prefers to maintain current retry_limit behavior without special handling for FLB_OUT_RETRY_UNLIMITED (-1), as there's no documentation indicating -1 should be used for infinite retries and consistency with current logic is preferred.

Applied to files:

  • plugins/out_s3/s3.c
🪛 Cppcheck (2.19.0)
plugins/out_s3/s3.c

[error] 791-791: failed to expand 'flb_plg_error', it is invalid to use a preprocessor directive as macro parameter

(syntaxError)

🪛 LanguageTool
examples/s3_output/README.md

[grammar] ~56-~56: Ensure spelling is correct
Context: ..., 201, 404, 500, 401) - Response times (2ms to 5000ms) - Client information (IP, us...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~465-~465: Ensure spelling is correct
Context: ...les uploaded when reaching 50MB (before 60min timeout) Verification Steps: 1. R...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.18.1)
examples/s3_output/README.md

112-112: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


177-177: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


270-270: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (11)
plugins/out_s3/s3.h (1)

123-199: New format/schema fields and S3 format macros look consistent with implementation

Appending format, schema_str, the JSON/PARQUET macros, and the updated s3_put_object prototype matches how s3.c now routes JSON vs Parquet and SDS bodies. No structural or API consistency issues here.

Also applies to: 201-204, 233-234

plugins/out_s3/s3.c (7)

192-239: Header logic correctly skips outer Content-Encoding for Parquet

The updated create_headers now omits Content-Encoding when ctx->format == FLB_S3_FORMAT_PARQUET, matching the fact that Parquet uses its own internal compression. This aligns headers with the new upload semantics.


777-823: Format selection and Parquet encoder guards are consistent with new design

The new format parsing and #ifdef FLB_HAVE_PARQUET_ENCODER checks correctly:

  • Default to JSON unless Parquet was set (including via legacy compression mapping once fixed).
  • Reject format=parquet builds that lack FLB_PARQUET_ENCODER.
  • For Parquet builds, keep ctx->compression for internal Parquet compression only and skip outer compression.

This matches the intended separation of JSON vs Parquet and internal-vs-outer compression.


1663-1800: Updated s3_put_object signature and SDS handling are consistent

The new s3_put_object taking flb_sds_t body, size_t body_size is wired correctly:

  • Callers pass SDS buffers and later destroy them (or the owning wrappers).
  • MD5 computation and HTTP request use the SDS pointer as a regular char *.
  • Error paths free uri and preserve seq_index semantics via the decrement_index label.

No memory lifetime issues are apparent around the new body type.


1991-2015: SDS buffer ownership in send_upload_request and unit_test_flush is now explicit

Both paths:

  • Create an SDS buffer via construct_request_buffer,
  • Pass it into upload_data,
  • Always flb_sds_destroy(buffer) after the call,

which removes the previous mixed ownership around raw buffers and avoids leaks in error paths.

Also applies to: 3706-3724


3953-4089: Flush path correctly preserves raw MsgPack and defers formatting

cb_s3_flush now:

  • Stores raw MsgPack into an SDS (chunk),
  • Uses s3_store_file_get/buffer_chunk to persist raw MsgPack,
  • Only converts to JSON/Parquet inside construct_request_buffer right before upload.

This is the right abstraction for supporting multiple output formats (JSON vs Parquet) without duplicating buffer logic.


4210-4231: Compression/format/schema/upload_timeout config docs largely match behavior, but note legacy compression caveat

The updated config map entries:

  • Document compression as none|gzip|snappy|zstd with legacy arrow/parquet values deprecated,
  • Introduce format and schema_str (required for format=parquet),
  • Add a documented upload_timeout parameter, and
  • Add authorization_endpoint_bearer_token,

which aligns with the new logic in cb_s3_init and the Parquet encoder path—once the legacy compression=arrow|parquet handling is fixed as noted above.

Also applies to: 4381-4413


1450-1660: Raw MsgPack buffering + deferred JSON/Parquet conversion is correctly implemented

The new construct_request_buffer / put_all_chunks flow:

  • Reads raw MsgPack from s3_store_file_read,
  • Optionally appends in-memory MsgPack (new_data) via flb_realloc + memcpy,
  • Converts once per upload to either:
    • log_key projection,
    • JSON lines, or
    • Parquet via flb_msgpack_raw_to_parquet (with schema_str and ctx->compression),
  • Properly cleans up all internal allocations (msgpack buffer, intermediate formatted data) on all error paths before returning,
  • Never sets *out_buf on error, ensuring callers receive only valid or NULL pointers, and
  • Returns an flb_sds_t which all four callers properly destroy on the success path.

Chunk locking/unlocking on error paths is balanced throughout. Memory management is sound.

src/CMakeLists.txt (1)

157-166: Parquet encoder and SASL library wiring in src/CMakeLists.txt look correct

  • flb_parquet_impl.cpp is only built when FLB_PARQUET_ENCODER is enabled and is explicitly marked as C++, matching the Arrow/Parquet dependency.
  • ARROW_LIBRARIES / PARQUET_LIBRARIES are only appended to FLB_DEPS under the same flag, consistent with cmake/parquet.cmake.
  • SASL is linked into both fluent-bit-shared and fluent-bit-bin only when FLB_SASL_ENABLED is set, so non-Kafka or non-SASL builds remain unaffected.

Also applies to: 398-405, 525-528, 600-603

include/fluent-bit/aws/flb_aws_compress.h (1)

23-47: Compression enum layout is fine; ensure callers don’t rely on ARROW/PARQUET in flb_aws_compression_get_type

The new assignments (NONE=0, GZIP=1, SNAPPY=2, ZSTD=3, ARROW=4 [deprecated], PARQUET=5 [“use format=parquet instead”]) are internally consistent and safe as long as all code uses the macros, not raw integers.

Given flb_aws_compress.c now only recognizes gzip|snappy|zstd, ARROW/PARQUET are effectively compile-time constants for legacy conditionals; callers must not expect flb_aws_compression_get_type() to ever return these values anymore. The S3 plugin currently still does (and should be fixed as noted in s3.c).

src/aws/flb_aws_compress.c (1)

24-38: SNAPPY support is wired correctly; note that ARROW/PARQUET are no longer valid compression types

  • flb_snappy_compress_wrapper correctly adapts flb_snappy_compress to the common (void **out_data, size_t *out_len) signature and is registered under "snappy" in compression_options.
  • The options table now only contains true compression algorithms (gzip/snappy/zstd); ARROW/PARQUET have been removed, so flb_aws_compression_get_type("arrow") / "parquet" will now return -1.

This matches the new separation between compression and format conversion, but requires callers (notably the S3 plugin) to treat "arrow" / "parquet" as legacy format aliases rather than compression types—see the earlier comment in s3.c.

Also applies to: 57-78

@kalavt kalavt force-pushed the feature/aws-s3-parquet branch from 5e868fc to 2555471 Compare December 24, 2025 15:03
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
examples/s3_output/athena-queries.sql (1)

168-181: Replace PERCENTILE with approx_percentile for Athena compatibility.

Athena does not support the PERCENTILE(duration_ms, 0.95) syntax. Use approx_percentile(duration_ms, 0.95) instead.

🔎 Proposed fix
 SELECT
     path,
     method,
     COUNT(*) as request_count,
     AVG(duration_ms) as avg_duration_ms,
-    PERCENTILE(duration_ms, 0.95) as p95_duration_ms,
+    approx_percentile(duration_ms, 0.95) as p95_duration_ms,
     MAX(duration_ms) as max_duration_ms
 FROM access_logs_parquet
cmake/parquet.cmake (1)

149-170: Remove the include/arrow suffix from Arrow header search.

The PATH_SUFFIXES include/arrow in find_path for Arrow headers is incorrect. When searching for arrow/api.h, CMake would find it at ${prefix}/include/arrow/arrow/api.h which doesn't exist. The correct include directory should be ${prefix}/include so that #include <arrow/api.h> resolves correctly.

🔎 Proposed fix
   # Search for Arrow
   if(NOT ARROW_FOUND)
     find_path(ARROW_INCLUDE_DIR
       NAMES arrow/api.h
       PATHS ${SEARCH_PATHS}
-      PATH_SUFFIXES ${INCLUDE_SUFFIXES} include/arrow
+      PATH_SUFFIXES ${INCLUDE_SUFFIXES}
       NO_DEFAULT_PATH
     )
plugins/out_s3/s3.c (1)

744-824: Legacy compression path should explicitly default to GZIP.

The comment at line 764 says "Keep default gzip compression" but ctx->compression remains FLB_AWS_COMPRESS_NONE (set at line 745). This contradicts user expectations for backward compatibility.

🔎 Suggested fix
 #ifndef FLB_HAVE_PARQUET_ENCODER
             flb_plg_error(ctx->ins,
                          "Parquet format is not supported in this build. "
                          "Rebuild with -DFLB_PARQUET_ENCODER=On.");
             return -1;
 #else
             ctx->format = FLB_S3_FORMAT_PARQUET;
-            /* Keep default gzip compression */
+            /* Default to gzip compression for backward compatibility */
+            if (ctx->compression == FLB_AWS_COMPRESS_NONE) {
+                ctx->compression = FLB_AWS_COMPRESS_GZIP;
+            }
 #endif
🧹 Nitpick comments (3)
examples/s3_output/README.md (2)

112-114: Add language specifier to fenced code block.

The code block at line 112 is missing a language specifier, which affects syntax highlighting and linting.

🔎 Proposed fix
-```
+```text
 Upload Trigger = upload_timeout reached OR total_file_size reached
</details>

---

`177-181`: **Add language specifier to fenced code block.**

The code block showing S3 upload paths is missing a language specifier.



<details>
<summary>🔎 Proposed fix</summary>

```diff
-```
+```text
 s3://s3-bucket-kafka-sink/fluent-bit-logs/
 ├── putobject/app/day=YYYYMMDD/      # Case 1: Small files via PutObject
 └── multipart/access/day=YYYYMMDD/   # Case 2: Large files via Multipart Upload
</details>

</blockquote></details>
<details>
<summary>src/flb_parquet_impl.cpp (1)</summary><blockquote>

`708-736`: **Memory management is correct, but ownership should be documented.**

The function allocates output_buffer with flb_malloc (line 710) and copies Arrow data into it (line 716). Exception handlers properly clean up on error (line 726). 

**Suggestion:** Add a comment in the function documentation (or in the header file) explicitly stating that the caller must free the returned buffer with flb_free().

</blockquote></details>

</blockquote></details>

<details>
<summary>📜 Review details</summary>

**Configuration used**: defaults

**Review profile**: CHILL

**Plan**: Pro

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 5e868fceaf3521954dee257936901981215314ea and 2555471660ff1c95ba1231130e46af6cf36d9017.

</details>

<details>
<summary>📒 Files selected for processing (19)</summary>

* `.github/workflows/unit-tests.yaml`
* `CMakeLists.txt`
* `cmake/parquet.cmake`
* `examples/s3_output/README.md`
* `examples/s3_output/athena-queries.sql`
* `examples/s3_output/fluent-bit-s3-parquet.conf`
* `examples/s3_output/run-s3-parquet-test.sh`
* `include/fluent-bit/aws/flb_aws_compress.h`
* `include/fluent-bit/flb_parquet.h`
* `plugins/out_s3/s3.c`
* `plugins/out_s3/s3.h`
* `src/CMakeLists.txt`
* `src/aws/CMakeLists.txt`
* `src/aws/compression/CMakeLists.txt`
* `src/aws/compression/arrow/CMakeLists.txt`
* `src/aws/compression/arrow/compress.c`
* `src/aws/compression/arrow/compress.h`
* `src/aws/flb_aws_compress.c`
* `src/flb_parquet_impl.cpp`

</details>

<details>
<summary>💤 Files with no reviewable changes (5)</summary>

* src/aws/compression/arrow/compress.h
* src/aws/CMakeLists.txt
* src/aws/compression/CMakeLists.txt
* src/aws/compression/arrow/CMakeLists.txt
* src/aws/compression/arrow/compress.c

</details>

<details>
<summary>🚧 Files skipped from review as they are similar to previous changes (4)</summary>

* include/fluent-bit/flb_parquet.h
* src/aws/flb_aws_compress.c
* examples/s3_output/run-s3-parquet-test.sh
* examples/s3_output/fluent-bit-s3-parquet.conf

</details>

<details>
<summary>🧰 Additional context used</summary>

<details>
<summary>🧠 Learnings (16)</summary>

<details>
<summary>📓 Common learnings</summary>

Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.


Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like #ifdef FLB_HAVE_ZSTD. Unlike some other optional components such as ARROW/PARQUET (which use #ifdef FLB_HAVE_ARROW guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.


</details>
<details>
<summary>📚 Learning: 2025-08-07T10:15:46.187Z</summary>

Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10691
File: .github/workflows/unit-tests.yaml:94-101
Timestamp: 2025-08-07T10:15:46.187Z
Learning: Apache Arrow latest packages have changing checksums, making hardcoded checksum verification impractical for CI workflows that download the latest version.


**Applied to files:**
- `cmake/parquet.cmake`
- `.github/workflows/unit-tests.yaml`

</details>
<details>
<summary>📚 Learning: 2025-08-31T12:46:11.940Z</summary>

Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.


**Applied to files:**
- `cmake/parquet.cmake`
- `.github/workflows/unit-tests.yaml`
- `src/CMakeLists.txt`
- `CMakeLists.txt`

</details>
<details>
<summary>📚 Learning: 2025-08-31T12:46:11.940Z</summary>

Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is FLB_PREFER_SYSTEM_LIB_KAFKA=ON.


**Applied to files:**
- `cmake/parquet.cmake`
- `.github/workflows/unit-tests.yaml`
- `src/CMakeLists.txt`
- `CMakeLists.txt`

</details>
<details>
<summary>📚 Learning: 2025-08-29T06:25:27.250Z</summary>

Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like #ifdef FLB_HAVE_ZSTD. Unlike some other optional components such as ARROW/PARQUET (which use #ifdef FLB_HAVE_ARROW guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.


**Applied to files:**
- `.github/workflows/unit-tests.yaml`
- `include/fluent-bit/aws/flb_aws_compress.h`
- `plugins/out_s3/s3.c`
- `src/CMakeLists.txt`
- `CMakeLists.txt`

</details>
<details>
<summary>📚 Learning: 2025-08-29T06:24:55.855Z</summary>

Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.


**Applied to files:**
- `.github/workflows/unit-tests.yaml`
- `src/flb_parquet_impl.cpp`
- `include/fluent-bit/aws/flb_aws_compress.h`
- `plugins/out_s3/s3.c`
- `src/CMakeLists.txt`
- `CMakeLists.txt`

</details>
<details>
<summary>📚 Learning: 2025-08-29T06:25:27.250Z</summary>

Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like #ifdef FLB_HAVE_ZSTD. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.


**Applied to files:**
- `.github/workflows/unit-tests.yaml`
- `include/fluent-bit/aws/flb_aws_compress.h`
- `plugins/out_s3/s3.c`
- `src/CMakeLists.txt`
- `CMakeLists.txt`

</details>
<details>
<summary>📚 Learning: 2025-08-29T06:24:26.170Z</summary>

Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.


**Applied to files:**
- `.github/workflows/unit-tests.yaml`
- `include/fluent-bit/aws/flb_aws_compress.h`
- `plugins/out_s3/s3.c`
- `src/CMakeLists.txt`
- `CMakeLists.txt`

</details>
<details>
<summary>📚 Learning: 2025-08-29T06:24:44.797Z</summary>

Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.


**Applied to files:**
- `.github/workflows/unit-tests.yaml`
- `src/CMakeLists.txt`
- `CMakeLists.txt`

</details>
<details>
<summary>📚 Learning: 2025-09-08T11:21:33.975Z</summary>

Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.


**Applied to files:**
- `.github/workflows/unit-tests.yaml`
- `src/CMakeLists.txt`
- `CMakeLists.txt`

</details>
<details>
<summary>📚 Learning: 2025-08-29T06:25:02.561Z</summary>

Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at lib/zstd-1.5.7 and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like FLB_HAVE_ZSTD and is always available. Headers like <fluent-bit/flb_zstd.h> can be included directly without guards.


**Applied to files:**
- `include/fluent-bit/aws/flb_aws_compress.h`
- `plugins/out_s3/s3.c`
- `src/CMakeLists.txt`
- `CMakeLists.txt`

</details>
<details>
<summary>📚 Learning: 2025-09-04T12:35:22.872Z</summary>

Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:1339-1344
Timestamp: 2025-09-04T12:35:22.872Z
Learning: In the Fluent Bit S3 plugin, the user prefers to maintain current retry_limit behavior without special handling for FLB_OUT_RETRY_UNLIMITED (-1), as there's no documentation indicating -1 should be used for infinite retries and consistency with current logic is preferred.


**Applied to files:**
- `plugins/out_s3/s3.c`

</details>
<details>
<summary>📚 Learning: 2025-12-22T05:39:02.291Z</summary>

Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11250
File: lib/ripser-1.2.1/Makefile:1-18
Timestamp: 2025-12-22T05:39:02.291Z
Learning: In fluent/fluent-bit, the Makefile at lib/ripser-1.2.1/Makefile is imported from upstream Ripser and is not used in the actual build process. The project uses CMake for building (lib/ripser-1.2.1/CMakeLists.txt), so changes to the imported Makefile are not necessary.


**Applied to files:**
- `src/CMakeLists.txt`

</details>
<details>
<summary>📚 Learning: 2025-09-14T09:46:09.531Z</summary>

Learnt from: aminvakil
Repo: fluent/fluent-bit PR: 10844
File: conf/fluent-bit:13-15
Timestamp: 2025-09-14T09:46:09.531Z
Learning: For fluent-bit Debian packaging, /opt/fluent-bit/bin/ is the appropriate installation path since the package may be installed from non-official Debian sources, making /opt compliant with FHS for optional software packages.


**Applied to files:**
- `src/CMakeLists.txt`

</details>
<details>
<summary>📚 Learning: 2025-09-04T07:28:37.083Z</summary>

Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10832
File: src/wasm/CMakeLists.txt:112-131
Timestamp: 2025-09-04T07:28:37.083Z
Learning: In fluent-bit CMake files, the user cosmo0920 prefers treating Git as a command rather than a package, emphasizing that Git is not a pkg-config retrievable package but just a command.


**Applied to files:**
- `src/CMakeLists.txt`

</details>
<details>
<summary>📚 Learning: 2025-11-21T06:23:29.770Z</summary>

Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.


**Applied to files:**
- `examples/s3_output/README.md`

</details>

</details><details>
<summary>🧬 Code graph analysis (2)</summary>

<details>
<summary>src/flb_parquet_impl.cpp (1)</summary><blockquote>

<details>
<summary>include/fluent-bit/flb_mem.h (1)</summary>

* `flb_free` (126-128)

</details>

</blockquote></details>
<details>
<summary>plugins/out_s3/s3.h (1)</summary><blockquote>

<details>
<summary>plugins/out_s3/s3.c (1)</summary>

* `s3_put_object` (1663-1800)

</details>

</blockquote></details>

</details><details>
<summary>🪛 Cppcheck (2.19.0)</summary>

<details>
<summary>plugins/out_s3/s3.c</summary>

[error] 791-791: failed to expand 'flb_plg_error', it is invalid to use a preprocessor directive as macro parameter

(syntaxError)

</details>

</details>
<details>
<summary>🪛 LanguageTool</summary>

<details>
<summary>examples/s3_output/README.md</summary>

[grammar] ~56-~56: Ensure spelling is correct
Context: ..., 201, 404, 500, 401) - Response times (2ms to 5000ms) - Client information (IP, us...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

---

[grammar] ~465-~465: Ensure spelling is correct
Context: ...les uploaded when reaching 50MB (before 60min timeout)  **Verification Steps:**  1. R...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

</details>

</details>
<details>
<summary>🪛 markdownlint-cli2 (0.18.1)</summary>

<details>
<summary>examples/s3_output/README.md</summary>

112-112: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

177-177: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

---

270-270: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

</details>

<details>
<summary>🔇 Additional comments (28)</summary><blockquote>

<details>
<summary>.github/workflows/unit-tests.yaml (2)</summary><blockquote>

`56-79`: **CI matrix configuration for Parquet encoder looks correct.**

The Parquet encoder test configuration properly:
- Uses the new `FLB_PARQUET_ENCODER` flag
- Excludes Clang compiler (consistent with other specialized build options like coverage)
- Aligns with the broader transition from Arrow GLib to C++ Arrow

---

`117-125`: **Apache Arrow setup uses correct packages and official distribution source.**

The configuration correctly:
- Uses `packages.apache.org` as the official distribution source
- Installs `libarrow-dev` and `libparquet-dev` (C++ libraries) instead of the GLib variants
- Aligns the step name and condition with the new `FLB_PARQUET_ENCODER` flag

</blockquote></details>
<details>
<summary>include/fluent-bit/aws/flb_aws_compress.h (2)</summary><blockquote>

`33-46`: **Good documentation distinguishing compression from format conversion.**

The comment block clearly explains:
- The distinction between actual compression algorithms and file format conversions
- Arrow's deprecation status with guidance not to use in new code
- That Parquet should be used via `format=parquet` rather than `compression=parquet`

This helps prevent user confusion about the difference between compression and format conversion.

---

`24-46`: The concern about backward compatibility from renumbering constants is not applicable. The compression type configuration uses string-based names ("gzip", "snappy", "zstd", "parquet") via an internal lookup table, not numeric values. The enum constants are purely internal implementation details that never get serialized, persisted, or exposed to external code. Renumbering these internal values has no backward compatibility impact.

</blockquote></details>
<details>
<summary>examples/s3_output/athena-queries.sql (1)</summary><blockquote>

`1-80`: **Well-structured Athena table definitions and query examples.**

The SQL file provides comprehensive examples covering:
- Partition projection for automatic partition discovery
- Both JSON and Parquet format table definitions
- Realistic query patterns for log analysis

This is valuable documentation for users implementing the new Parquet format support.

</blockquote></details>
<details>
<summary>examples/s3_output/README.md (1)</summary><blockquote>

`1-80`: **Excellent documentation for Parquet format support.**

The README provides comprehensive coverage of:
- Upload methods (PutObject vs Multipart)
- Format and compression options
- Schema definition and type handling with best practices
- Troubleshooting guidance
- Performance testing scenarios

This is high-quality documentation that will help users adopt the new Parquet format feature.

</blockquote></details>
<details>
<summary>plugins/out_s3/s3.h (2)</summary><blockquote>

`195-203`: **Format and schema fields properly added for Parquet support.**

The new struct members and constants are well-organized:
- `format` field for output format selection
- `schema_str` for user-defined Parquet schema
- Clear format type constants with JSON=0 as default

---

`233-234`: **Updated `s3_put_object` signature uses SDS buffer type.**

The function signature change from raw buffer to `flb_sds_t` aligns with the SDS-based buffer handling throughout the S3 upload flow.

</blockquote></details>
<details>
<summary>src/CMakeLists.txt (3)</summary><blockquote>

`157-166`: **Parquet encoder C++ integration correctly configured.**

The Parquet encoder integration:
- Adds `flb_parquet_impl.cpp` under `FLB_PARQUET_ENCODER` guard
- Properly marks the file as C++ using `set_source_files_properties`
- Follows the existing pattern for conditional source inclusion

---

`398-405`: **Parquet library dependencies properly linked.**

The `ARROW_LIBRARIES` and `PARQUET_LIBRARIES` are correctly added to `FLB_DEPS` when `FLB_PARQUET_ENCODER` is enabled, ensuring proper linkage with the detected Arrow/Parquet libraries from `cmake/parquet.cmake`.

---

`525-528`: **Verify SASL linking changes are intentionally part of this PR.**

The SASL library linking for Kafka support (`sasl2`) is added here but appears unrelated to the Parquet encoder feature. Please confirm this change is intentional for this PR or if it should be in a separate commit.




Also applies to: 600-605

</blockquote></details>
<details>
<summary>cmake/parquet.cmake (1)</summary><blockquote>

`1-83`: **Well-designed cross-platform library detection module.**

The parquet.cmake module provides robust library detection with:
- Three fallback methods (pkg-config → CMake config → manual search)
- Platform-specific path handling for Windows, macOS (including Apple Silicon), and Linux
- Proper handling of both shared and static library targets
- Clear status messaging

</blockquote></details>
<details>
<summary>CMakeLists.txt (3)</summary><blockquote>

`202-202`: **New `FLB_PARQUET_ENCODER` option properly defined.**

The option follows the existing naming convention for encoders (`FLB_AVRO_ENCODER`) and is disabled by default, which is appropriate for an optional feature requiring external dependencies.

---

`284-286`: **`FLB_ALL` properly enables both encoders.**

The "Enable all features" mode now consistently enables both Avro and Parquet encoders.

---

`1311-1329`: **Parquet encoder integration is well-structured.**

The configuration block:
- Enables C++ language support as required by Arrow/Parquet
- Uses C++17 standard (addressing past review comment about Arrow 10.0+ requiring C++17)
- Integrates with `cmake/parquet.cmake` for library detection
- Gracefully disables the feature with warnings when dependencies are missing

</blockquote></details>
<details>
<summary>src/flb_parquet_impl.cpp (6)</summary><blockquote>

`1-40`: **LGTM: Headers and includes are properly organized.**

The extern "C" wrapper correctly isolates C headers from C++ code, and all necessary dependencies are included.

---

`41-94`: **LGTM: Converter class design is clean and comprehensive.**

The statistics tracking for overflows and type conversions will help users identify data quality issues.

---

`139-158`: **Overflow handling correctly implemented.**

The uint64 to int64 overflow check at lines 142-146 properly addresses the previous review concern. Values exceeding INT64_MAX are clamped and tracked in statistics.

---

`220-257`: **Timestamp unit handling correctly implemented.**

The function now properly reads the target time unit from the builder's type (lines 221-223) and applies the appropriate multiplier. This addresses the previous review concern.

**Note:** The code assumes input timestamps are in seconds (line 225). Ensure this is documented in the public API or consider accepting a parameter for input unit.

---

`458-547`: **Excellent: Immediate record processing avoids dangling pointer issue.**

The lambda processes each msgpack record immediately within the unpack loop (lines 469-546), which correctly addresses the past review comment about storing msgpack_object pointers that become invalid. The use of string_view for field lookup (line 493) also avoids unnecessary string copies.

---

`332-443`: **Schema parsing is robust with good error messages.**

The JSON schema parsing properly validates required fields (lines 365-377) and provides helpful error messages that include the problematic schema content (lines 337-338, 347-348). Timestamp unit parsing (lines 410-420) correctly handles all standard Arrow time units.

</blockquote></details>
<details>
<summary>plugins/out_s3/s3.c (7)</summary><blockquote>

`42-57`: **LGTM: Proper conditional compilation for Parquet support.**

The FLB_HAVE_PARQUET_ENCODER guard ensures the header is only included when Parquet support is enabled.

---

`195-229`: **Correct: Content-Encoding header skipped for Parquet.**

Lines 196-197 and 228-229 properly skip the Content-Encoding header when `format=parquet`, since Parquet handles compression internally within the file format. This avoids double-compression overhead.

---

`1244-1277`: **LGTM: Compression properly skipped for Parquet with correct memory management.**

The check at line 1245 ensures Parquet format bypasses outer compression. The compressed data is properly converted to flb_sds_t (lines 1260-1272) for consistent memory management, and cleanup is handled in all code paths (lines 1333-1335, 1362-1364, 1377-1379, 1388-1391).

---

`1582-1630`: **Parquet conversion properly integrated with good error handling.**

The Parquet conversion block (lines 1582-1630) correctly:
- Validates schema_str is provided (lines 1588-1596)
- Calls the Parquet encoder with compression setting (lines 1598-1603)
- Converts the result to flb_sds_t for consistent memory management (lines 1615-1627)
- Frees the intermediate parquet_buffer (line 1617)
- Handles all error paths with proper cleanup




Verify that the schema_str requirement is documented in user-facing documentation and error messages are clear enough for users to understand the schema format.

---

`1663-1664`: **LGTM: Signature change to flb_sds_t improves consistency.**

The s3_put_object signature change to accept flb_sds_t (line 1664) aligns with the overall refactoring for consistent SDS-based buffer management.

---

`3953-3962`: **Excellent architectural decision with clear documentation.**

The comment at lines 3953-3956 clearly explains the design choice to store raw msgpack and defer format conversion until upload. This is particularly beneficial for Parquet, which can batch multiple records into optimized columnar storage.

---

`4210-4231`: **Config map correctly defined with comprehensive documentation.**

The configuration parameters are properly defined:
- `compression` and `format` use offset 0 (lines 4211, 4220), indicating manual parsing in cb_s3_init, which is correct
- `schema_str` is mapped to `ctx->schema_str` (line 4228)
- Descriptions clearly document supported values, defaults, and usage (lines 4212-4217, 4221-4225, 4229-4231)

**Note:** The past review comment about compression type mismatch is not applicable here since offset 0 means no automatic field mapping.

</blockquote></details>

</blockquote></details>

</details>

<!-- This is an auto-generated comment by CodeRabbit for review status -->

@kalavt kalavt force-pushed the feature/aws-s3-parquet branch from 664cb4e to eff671c Compare December 24, 2025 16:00
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
plugins/out_s3/s3.c (2)

1217-1324: Use size_to_send (compressed size) for routing decisions and logging

Inside upload_data(), after optional compression you introduce:

flb_sds_t body_to_send = body;
size_t size_to_send = body_size;
...
preCompress_size = body_size;
body_to_send = compressed_body;
size_to_send = temp_size;

but the subsequent logic still uses the original body_size for decisions and logging:

  • Timeout/size checks for whether to goto put_object vs multipart.
  • The “Pre-compression ... After compression ...” log still uses body_size for the “after” value.
  • Existing upload path compares body_size to MIN_CHUNKED_UPLOAD_SIZE.

With compression enabled, this means routing decisions and logs are based on the uncompressed size, which diverges from the actual payload size now stored in size_to_send. It also breaks the intent of the informational log (both values end up identical).

Switching those uses of body_size to size_to_send preserves the old semantics (decisions based on the payload actually sent) and fixes the log.

Suggested diff
-        else if (body_size >= ctx->file_size) {
+        else if (size_to_send >= ctx->file_size) {
@@
-        else if(body_size > MIN_CHUNKED_UPLOAD_SIZE) {
+        else if(size_to_send > MIN_CHUNKED_UPLOAD_SIZE) {
@@
-        else {
-            if ((ctx->use_put_object == FLB_FALSE && (ctx->compression == FLB_AWS_COMPRESS_GZIP || ctx->compression == FLB_AWS_COMPRESS_ZSTD))) {
-                flb_plg_info(ctx->ins, "Pre-compression upload_chunk_size= %zu, After compression, chunk is only %zu bytes, "
-                                       "the chunk was too small, using PutObject to upload", preCompress_size, body_size);
-            }
+        else {
+            if (ctx->use_put_object == FLB_FALSE &&
+                (ctx->compression == FLB_AWS_COMPRESS_GZIP ||
+                 ctx->compression == FLB_AWS_COMPRESS_ZSTD)) {
+                flb_plg_info(ctx->ins,
+                             "Pre-compression upload_chunk_size= %zu, "
+                             "After compression, chunk is only %zu bytes, "
+                             "the chunk was too small, using PutObject to upload",
+                             preCompress_size, size_to_send);
+            }
@@
-        if (body_size < MIN_CHUNKED_UPLOAD_SIZE) {
+        if (size_to_send < MIN_CHUNKED_UPLOAD_SIZE) {

1450-1500: SDS lifetime is mostly correct; fix leak in unit_test_flush

The SDS refactor is generally consistent:

  • construct_request_buffer() now always returns flb_sds_t and callers destroy buffers with flb_sds_destroy().
  • send_upload_request(), put_all_chunks(), and cb_s3_upload() correctly treat the returned buffer as an SDS and free it once upload_data() completes.
  • upload_data() cleans up the optional compressed_body SDS in all exit paths.

One leak remains in unit_test_flush():

s3_store_buffer_put(ctx, upload_file, tag, tag_len,
                    chunk, (size_t) chunk_size, file_first_log_time);
ret = construct_request_buffer(ctx, chunk, upload_file, &buffer, &buffer_size);
...
ret = upload_data(ctx, upload_file, m_upload_file, buffer, buffer_size, tag, tag_len);
/* buffer destroyed, but chunk is never freed */

Unlike buffer_chunk() and other call sites, chunk is never destroyed here, so every unit‑test flush leaks one SDS buffer when FLB_S3_PLUGIN_UNDER_TEST is set.

You can mirror buffer_chunk() and free chunk immediately after construct_request_buffer():

Suggested diff
-    s3_store_buffer_put(ctx, upload_file, tag, tag_len,
-                        chunk, (size_t) chunk_size, file_first_log_time);
-    ret = construct_request_buffer(ctx, chunk, upload_file, &buffer, &buffer_size);
-    if (ret < 0) {
+    s3_store_buffer_put(ctx, upload_file, tag, tag_len,
+                        chunk, (size_t) chunk_size, file_first_log_time);
+    ret = construct_request_buffer(ctx, chunk, upload_file, &buffer, &buffer_size);
+    flb_sds_destroy(chunk);
+    if (ret < 0) {
         flb_plg_error(ctx->ins, "Could not construct request buffer for %s",
                       upload_file->file_path);
         FLB_OUTPUT_RETURN(FLB_RETRY);
     }

Also applies to: 1991-2015, 3701-3724

🧹 Nitpick comments (9)
examples/s3_output/run-s3-parquet-test.sh (2)

1-5: Consider adding shell script safety options.

For test scripts, adding safety options helps catch errors early:

🔎 Suggested improvement
 #!/bin/bash
+set -euo pipefail
 
 echo "=== Fluent Bit S3 Comprehensive Test ==="
 echo ""

37-39: Cleanup section appears incomplete.

The script echoes "Cleaning up..." but doesn't perform any actual cleanup. If cleanup is intentionally omitted (e.g., to allow inspection of test artifacts), consider adding a comment explaining this. Otherwise, add the cleanup logic:

🔎 Suggested completion
 # Cleanup
 echo ""
 echo "Cleaning up..."
+# Uncomment below to remove test artifacts after run
+# rm -rf "$STORE_DIR"
+echo "Test artifacts preserved in $STORE_DIR for inspection"
examples/s3_output/README.md (3)

112-114: Add language specifier to fenced code block.

Per markdownlint MD040, fenced code blocks should have a language specified for proper syntax highlighting.

🔎 Suggested fix
-```
+```text
 Upload Trigger = upload_timeout reached OR total_file_size reached
</details>

---

`177-181`: **Add language specifier to fenced code block.**

<details>
<summary>🔎 Suggested fix</summary>

```diff
-```
+```text
 s3://s3-bucket-kafka-sink/fluent-bit-logs/
 ├── putobject/app/day=YYYYMMDD/      # Case 1: Small files via PutObject
 └── multipart/access/day=YYYYMMDD/   # Case 2: Large files via Multipart Upload
</details>

---

`270-275`: **Add language specifier to fenced code block.**

<details>
<summary>🔎 Suggested fix</summary>

```diff
-```
+```text
 [parquet] Data quality summary for 53206 records:
 [parquet] Missing non-nullable fields (defaults used):
 [parquet]   field='level' count=52506
 [parquet]   field='timestamp' count=52506
</details>

</blockquote></details>
<details>
<summary>cmake/parquet.cmake (1)</summary><blockquote>

`149-171`: **Manual include-path search is fine but extra `parquet` suffix is probably unnecessary**

The manual search for Parquet uses:

```cmake
PATH_SUFFIXES ${INCLUDE_SUFFIXES} parquet

Given INCLUDE_SUFFIXES is already include, this will check both ${prefix}/include/parquet/... (via include + the parquet/api/... name) and ${prefix}/parquet/..., which is a bit unusual. It’s not harmful, but simplifying to just ${INCLUDE_SUFFIXES} would avoid probing odd ${prefix}/parquet trees and be more in line with the Arrow search just above.

Also applies to: 173-195

CMakeLists.txt (1)

679-687: Consider unifying the C++ standard to avoid C++11/C++17 mix

The simdutf block sets CMAKE_CXX_STANDARD 11, while the Parquet block later sets it to 17. CMake will honor the last assignment (17 when Parquet is enabled), but this mix of settings is slightly confusing and could surprise future refactors.

You might want to:

  • Drop the explicit set(CMAKE_CXX_STANDARD 11) in the simdutf block, or
  • Set it to 17 there as well when Parquet support is enabled.

Functionally it works today; this is mainly about clarity and avoiding subtle build surprises.

Also applies to: 1311-1316

plugins/out_s3/s3.c (2)

42-57: Avoid duplicate forward declarations for get_upload / create_upload

get_upload and create_upload are forward‑declared twice (once in the new “Forward declarations” block and again just before their definitions). This is harmless but unnecessary noise.

You can keep the new declaration for flb_pack_msgpack_extract_log_key and drop the duplicated get_upload / create_upload prototypes to reduce clutter.

Also applies to: 86-91


744-823: Format/compression config flow is solid; consider early schema_str validation

The three‑step flow in cb_s3_init:

  • Defaults compression to NONE, parses the compression property, and treats arrow|parquet as legacy values that map to format=parquet (with compile‑time checks around FLB_HAVE_PARQUET_ENCODER).
  • Adds an explicit format option (json/parquet) that overrides the legacy compression mapping.
  • Validates that format=parquet is not allowed when Parquet support isn’t compiled in.

This is a good cleanup and keeps legacy configs working.

However, schema_str is only validated later inside construct_request_buffer() and flb_msgpack_raw_to_parquet, which means a misconfigured format=parquet without schema_str will only surface on first flush, failing every chunk.

To fail fast and give clearer feedback, you could add an init‑time check like:

#ifdef FLB_HAVE_PARQUET_ENCODER
if (ctx->format == FLB_S3_FORMAT_PARQUET && ctx->schema_str == NULL) {
    flb_plg_error(ctx->ins,
                  "format=parquet requires 'schema_str' to be set");
    return -1;
}
#endif

right after the format/validation block.

Also applies to: 4210-4231

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 664cb4e and eff671c.

📒 Files selected for processing (18)
  • .github/workflows/unit-tests.yaml
  • CMakeLists.txt
  • cmake/parquet.cmake
  • examples/s3_output/README.md
  • examples/s3_output/athena-queries.sql
  • examples/s3_output/fluent-bit-s3-parquet.conf
  • examples/s3_output/run-s3-parquet-test.sh
  • include/fluent-bit/aws/flb_aws_compress.h
  • plugins/out_kinesis_firehose/firehose.c
  • plugins/out_s3/s3.c
  • plugins/out_s3/s3.h
  • src/CMakeLists.txt
  • src/aws/CMakeLists.txt
  • src/aws/compression/CMakeLists.txt
  • src/aws/compression/arrow/CMakeLists.txt
  • src/aws/compression/arrow/compress.c
  • src/aws/compression/arrow/compress.h
  • src/aws/flb_aws_compress.c
💤 Files with no reviewable changes (5)
  • src/aws/compression/arrow/compress.c
  • src/aws/compression/arrow/compress.h
  • src/aws/compression/CMakeLists.txt
  • src/aws/compression/arrow/CMakeLists.txt
  • src/aws/CMakeLists.txt
🚧 Files skipped from review as they are similar to previous changes (1)
  • examples/s3_output/athena-queries.sql
🧰 Additional context used
🧠 Learnings (16)
📓 Common learnings
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
📚 Learning: 2025-08-07T10:15:46.187Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10691
File: .github/workflows/unit-tests.yaml:94-101
Timestamp: 2025-08-07T10:15:46.187Z
Learning: Apache Arrow latest packages have changing checksums, making hardcoded checksum verification impractical for CI workflows that download the latest version.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • cmake/parquet.cmake
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • CMakeLists.txt
  • cmake/parquet.cmake
  • src/CMakeLists.txt
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • CMakeLists.txt
  • cmake/parquet.cmake
  • src/CMakeLists.txt
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • include/fluent-bit/aws/flb_aws_compress.h
  • src/aws/flb_aws_compress.c
  • CMakeLists.txt
  • plugins/out_kinesis_firehose/firehose.c
  • src/CMakeLists.txt
  • plugins/out_s3/s3.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • include/fluent-bit/aws/flb_aws_compress.h
  • src/aws/flb_aws_compress.c
  • CMakeLists.txt
  • plugins/out_kinesis_firehose/firehose.c
  • src/CMakeLists.txt
  • plugins/out_s3/s3.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • include/fluent-bit/aws/flb_aws_compress.h
  • src/aws/flb_aws_compress.c
  • CMakeLists.txt
  • plugins/out_kinesis_firehose/firehose.c
  • src/CMakeLists.txt
  • plugins/out_s3/s3.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • include/fluent-bit/aws/flb_aws_compress.h
  • src/aws/flb_aws_compress.c
  • CMakeLists.txt
  • plugins/out_kinesis_firehose/firehose.c
  • src/CMakeLists.txt
  • plugins/out_s3/s3.c
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • src/aws/flb_aws_compress.c
  • CMakeLists.txt
  • src/CMakeLists.txt
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • CMakeLists.txt
  • src/CMakeLists.txt
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.

Applied to files:

  • examples/s3_output/README.md
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.

Applied to files:

  • include/fluent-bit/aws/flb_aws_compress.h
  • src/aws/flb_aws_compress.c
  • CMakeLists.txt
  • src/CMakeLists.txt
  • plugins/out_s3/s3.c
📚 Learning: 2025-12-22T05:39:02.291Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11250
File: lib/ripser-1.2.1/Makefile:1-18
Timestamp: 2025-12-22T05:39:02.291Z
Learning: In fluent/fluent-bit, the Makefile at lib/ripser-1.2.1/Makefile is imported from upstream Ripser and is not used in the actual build process. The project uses CMake for building (lib/ripser-1.2.1/CMakeLists.txt), so changes to the imported Makefile are not necessary.

Applied to files:

  • src/CMakeLists.txt
📚 Learning: 2025-09-14T09:46:09.531Z
Learnt from: aminvakil
Repo: fluent/fluent-bit PR: 10844
File: conf/fluent-bit:13-15
Timestamp: 2025-09-14T09:46:09.531Z
Learning: For fluent-bit Debian packaging, /opt/fluent-bit/bin/ is the appropriate installation path since the package may be installed from non-official Debian sources, making /opt compliant with FHS for optional software packages.

Applied to files:

  • src/CMakeLists.txt
📚 Learning: 2025-09-04T07:28:37.083Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10832
File: src/wasm/CMakeLists.txt:112-131
Timestamp: 2025-09-04T07:28:37.083Z
Learning: In fluent-bit CMake files, the user cosmo0920 prefers treating Git as a command rather than a package, emphasizing that Git is not a pkg-config retrievable package but just a command.

Applied to files:

  • src/CMakeLists.txt
📚 Learning: 2025-09-04T12:35:22.872Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:1339-1344
Timestamp: 2025-09-04T12:35:22.872Z
Learning: In the Fluent Bit S3 plugin, the user prefers to maintain current retry_limit behavior without special handling for FLB_OUT_RETRY_UNLIMITED (-1), as there's no documentation indicating -1 should be used for infinite retries and consistency with current logic is preferred.

Applied to files:

  • plugins/out_s3/s3.c
🧬 Code graph analysis (2)
plugins/out_s3/s3.h (1)
plugins/out_s3/s3.c (1)
  • s3_put_object (1663-1800)
plugins/out_s3/s3.c (6)
src/flb_output.c (1)
  • flb_output_get_property (1108-1111)
src/aws/flb_aws_compress.c (2)
  • flb_aws_compression_get_type (80-97)
  • flb_aws_compression_compress (99-116)
src/flb_sds.c (2)
  • flb_sds_create_len (58-76)
  • flb_sds_destroy (389-399)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
plugins/out_s3/s3_multipart.c (1)
  • upload_part (685-799)
src/flb_pack.c (1)
  • flb_pack_msgpack_to_json_format (1169-1450)
🪛 Cppcheck (2.19.0)
plugins/out_s3/s3.c

[error] 791-791: failed to expand 'flb_plg_error', it is invalid to use a preprocessor directive as macro parameter

(syntaxError)

🪛 LanguageTool
examples/s3_output/README.md

[grammar] ~56-~56: Ensure spelling is correct
Context: ..., 201, 404, 500, 401) - Response times (2ms to 5000ms) - Client information (IP, us...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~465-~465: Ensure spelling is correct
Context: ...les uploaded when reaching 50MB (before 60min timeout) Verification Steps: 1. R...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.18.1)
examples/s3_output/README.md

112-112: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


177-177: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


270-270: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (18)
plugins/out_kinesis_firehose/firehose.c (1)

496-501: LGTM! Documentation updated to reflect new compression options.

The help text correctly lists the now-supported compression algorithms: gzip, snappy, and zstd. This aligns with the changes in flb_aws_compress.c that add Snappy support and remove Arrow/Parquet compression options.

src/aws/flb_aws_compress.c (2)

31-37: LGTM! Snappy wrapper correctly adapts the function signature.

The wrapper properly casts void * to char * as required by flb_snappy_compress, and the return value is passed through directly.


57-78: Well-structured separation of compression algorithms from format converters.

The updated compression_options table cleanly includes only true compression algorithms (gzip, snappy, zstd), with clear documentation explaining that Arrow/Parquet are format converters handled separately via the format parameter.

include/fluent-bit/aws/flb_aws_compress.h (1)

24-46: Well-documented deprecation and backward compatibility approach.

The header cleanly separates compression algorithms from format converters, retains deprecated values for backward compatibility, and provides clear guidance to use format=parquet instead of compression=parquet. Based on learnings, this follows the established pattern where Arrow/Parquet use conditional compilation guards unlike ZSTD which is always available.

examples/s3_output/fluent-bit-s3-parquet.conf (2)

143-145: Parquet schema correctly matches the access log data structure.

The schema_str defines fields that align with the dummy access log inputs: timestamp, level, message, method, path, status, duration_ms, client_ip, and user_agent. Using int64 for timestamp and status, and float64 for duration_ms are appropriate choices for avoiding overflow issues as noted in the README.


1-117: Comprehensive and well-documented test configuration.

The configuration provides excellent coverage of both upload scenarios (PutObject vs Multipart) with realistic traffic patterns. The inline comments clearly explain the rationale for each setting, making this a valuable reference for users.

.github/workflows/unit-tests.yaml (2)

56-57: LGTM! Build flag renamed to reflect Parquet encoder focus.

The change from -DFLB_ARROW=On to -DFLB_PARQUET_ENCODER=On correctly reflects the shift from Arrow-based compression to Parquet encoder support.


117-125: Apache Arrow C++ package names are correct.

The packages libarrow-dev and libparquet-dev are the correct C++ development packages for Apache Arrow and Parquet, providing the required headers and libraries for the flb_parquet_impl.cpp implementation.

plugins/out_s3/s3.h (3)

196-199: New Parquet format fields properly integrated into struct.

The format and schema_str fields provide clean support for runtime format selection and user-defined Parquet schemas. The placement at the end of the struct with clear section comments improves readability.


201-203: LGTM! Format type macros clearly defined.

The FLB_S3_FORMAT_JSON (0) and FLB_S3_FORMAT_PARQUET (1) macros provide a clean enum-like interface for format selection.


233-235: Function signature change verified as consistent across the codebase.

All s3_put_object call sites (s3.c:1330 and s3.c:1494) have been properly updated to use the new flb_sds_t buffer parameter. The declaration in s3.h and implementation in s3.c both align with the new signature.

src/CMakeLists.txt (2)

157-166: Parquet encoder source and dependency wiring look consistent

flb_parquet_impl.cpp is only added when FLB_PARQUET_ENCODER is enabled and is explicitly marked as C++ via set_source_files_properties, with Arrow/Parquet libs added to FLB_DEPS under the same flag. This keeps Parquet support fully optional and avoids impacting non-Parquet builds.

Also applies to: 398-405


525-528: SASL linkage for Kafka/MSK is correctly scoped to SASL-enabled builds

Conditionally linking sasl2 into fluent-bit-shared and fluent-bit-bin under FLB_SASL_ENABLED matches how Kafka/MSK SASL is detected and avoids dragging SASL into builds that don’t need it.

Also applies to: 600-606

CMakeLists.txt (1)

200-202: Parquet encoder option and C++17 enabling are wired correctly

FLB_PARQUET_ENCODER is now a dedicated option, enabled by FLB_ALL, and the Parquet block:

  • Calls enable_language(CXX) and sets CMAKE_CXX_STANDARD 17 when the encoder is requested.
  • Delegates Arrow/Parquet discovery to cmake/parquet.cmake.
  • Only defines FLB_HAVE_PARQUET_ENCODER and adds include directories when both Arrow and Parquet are found, otherwise it warns and flips FLB_PARQUET_ENCODER back to OFF.

This keeps Parquet strictly opt‑in and prevents partial configuration when dependencies are missing.

Also applies to: 278-287, 1311-1329

plugins/out_s3/s3.c (3)

97-121: Header handling for compression and Parquet looks correct

The new zstd Content-Encoding header and the guards in create_headers() to skip Content-Encoding when ctx->format == FLB_S3_FORMAT_PARQUET align with the intended behavior:

  • GZIP/ZSTD still get the appropriate header for JSON output.
  • Parquet relies on its internal codec only, without an outer content‑encoding header.

Also applies to: 183-239


1516-1661: SDS-based request/flush pipeline is a nice cleanup

The move to flb_sds_t for all request buffers (construct_request_buffer(), upload_data(), send_upload_request(), put_all_chunks(), cb_s3_upload(), cb_s3_flush()) significantly simplifies memory management:

  • All conversions (JSON / log_key / Parquet) produce SDS and are freed in a single place per call chain.
  • construct_request_buffer() now cleanly owns and frees the intermediate msgpack buffer and unlocks chunks on all error paths.
  • Callers always destroy the returned SDS (buffer) after upload_data()/s3_put_object() regardless of success or failure, avoiding mixed free() vs flb_sds_destroy() semantics.

This should make future format additions (or tweaks to Parquet encoding) much less error‑prone.

Also applies to: 1943-2015, 3440-3488, 3953-4068


4210-4231: Config map docs for compression/format/schema/upload_timeout/auth look coherent

The updated config map entries:

  • Clarify that compression supports none|gzip|snappy|zstd and that arrow|parquet are deprecated legacy values.
  • Introduce an explicit format option (json / parquet) and schema_str for Parquet, with docs that match the runtime behavior.
  • Document upload_timeout semantics more clearly and add authorization_endpoint_bearer_token.

These descriptions match the new logic in cb_s3_init() and construct_request_buffer().

Also applies to: 4382-4413

cmake/parquet.cmake (1)

249-258: No action needed—the URLs in the code already match the official Apache Arrow documentation.

The URLs currently use https://packages.apache.org/artifactory/arrow/…, which is the correct format per official Apache Arrow installation guides. The /artifactory/arrow/ segment is not obsolete; it is the official repository path for both APT and RPM packages. Removing it would break the URLs.

Likely an incorrect or invalid review comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
examples/s3_output/README.md (1)

112-114: Consider adding language specifiers to fenced code blocks.

Several code blocks in this documentation lack language specifiers, which can affect syntax highlighting. Consider adding appropriate language identifiers (e.g., text, ini, json) for better readability.

tests/internal/parquet.c (1)

27-31: Consider including the header instead of redefining compression constants.

These compression type constants are duplicated from flb_aws_compress.h. If the values change in the source header, these tests will silently use incorrect values.

🔎 Proposed fix
-/* Compression types from flb_aws_compress.h */
-#define FLB_AWS_COMPRESS_NONE    0
-#define FLB_AWS_COMPRESS_GZIP    1
-#define FLB_AWS_COMPRESS_SNAPPY  2
-#define FLB_AWS_COMPRESS_ZSTD    3
+#include <fluent-bit/aws/flb_aws_compress.h>

If the header cannot be included due to dependencies, consider adding a comment noting that these values must be kept in sync with the source file.

src/flb_parquet_impl.cpp (2)

178-197: Unnecessary try/catch block around strtol.

The try/catch block on lines 195-197 is unnecessary since strtol is a C function that does not throw exceptions. The errno and endptr checks already handle all error cases. The same pattern appears in other string parsing functions.

🔎 Proposed fix
         else if (obj->type == MSGPACK_OBJECT_STR) {
-            try {
                 std::string str(obj->via.str.ptr, obj->via.str.size);
                 char* endptr;
                 errno = 0;
                 long val = std::strtol(str.c_str(), &endptr, 10);

                 if (endptr == str.c_str() || *endptr != '\0' || errno == ERANGE) {
                     return arrow::Status::Invalid("Cannot parse string to int32");
                 }

                 if (val > INT32_MAX || val < INT32_MIN) {
                     int32_overflow_by_field[field_name]++;
                     return builder->Append(val > INT32_MAX ? INT32_MAX : INT32_MIN);
                 }

                 string_parsed_to_int_by_field[field_name]++;
                 return builder->Append(static_cast<int32_t>(val));
-            } catch (...) {
-                return arrow::Status::Invalid("Cannot parse string to int32");
-            }
         }

Note: The std::string constructor can throw std::bad_alloc, so if you want to be defensive against memory allocation failures, you could keep a narrower catch or let it propagate to the outer try/catch in flb_msgpack_raw_to_parquet.


619-622: Prefer ValueUnsafe() or check status before ValueOrDie().

ValueOrDie() will terminate the program if the status is not OK, but here the status is already checked on line 618. Using ValueUnsafe() or MoveValueUnsafe() would be more semantically correct since you've already verified success.

🔎 Proposed fix
         if (!schema_result.ok()) {
             flb_error("[parquet] Failed to parse schema: %s\n", schema_result.status().ToString().c_str());
             return NULL;
         }
-        auto schema = schema_result.ValueOrDie();
+        auto schema = std::move(schema_result).ValueUnsafe();

This pattern should be applied to other ValueOrDie() calls where the status is already checked (lines 635, 859, 877, 899, 922).

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eff671c and c99a15c.

📒 Files selected for processing (8)
  • CMakeLists.txt
  • cmake/kafka.cmake
  • examples/s3_output/README.md
  • lib/avro/CMakeLists.txt
  • lib/jansson-e23f558/CMakeLists.txt
  • src/flb_parquet_impl.cpp
  • tests/internal/CMakeLists.txt
  • tests/internal/parquet.c
🧰 Additional context used
🧠 Learnings (11)
📓 Common learnings
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.

Applied to files:

  • examples/s3_output/README.md
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.

Applied to files:

  • CMakeLists.txt
  • cmake/kafka.cmake
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.

Applied to files:

  • CMakeLists.txt
  • cmake/kafka.cmake
  • src/flb_parquet_impl.cpp
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.

Applied to files:

  • CMakeLists.txt
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.

Applied to files:

  • CMakeLists.txt
  • cmake/kafka.cmake
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

  • CMakeLists.txt
  • cmake/kafka.cmake
  • src/flb_parquet_impl.cpp
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.

Applied to files:

  • CMakeLists.txt
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

  • CMakeLists.txt
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.

Applied to files:

  • CMakeLists.txt
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

  • CMakeLists.txt
🧬 Code graph analysis (1)
tests/internal/parquet.c (1)
src/flb_parquet_impl.cpp (2)
  • flb_msgpack_raw_to_parquet (579-952)
  • flb_msgpack_raw_to_parquet (579-582)
🪛 LanguageTool
examples/s3_output/README.md

[grammar] ~56-~56: Ensure spelling is correct
Context: ..., 201, 404, 500, 401) - Response times (2ms to 5000ms) - Client information (IP, us...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~737-~737: Ensure spelling is correct
Context: ...les uploaded when reaching 50MB (before 60min timeout) Verification Steps: 1. R...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.18.1)
examples/s3_output/README.md

112-112: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


177-177: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


286-286: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


294-294: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


305-305: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


315-315: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


328-328: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


344-344: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


362-362: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


436-436: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


447-447: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


542-542: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (10)
lib/avro/CMakeLists.txt (1)

19-19: CMake version bump aligns with project-wide requirements.

The minimum CMake version increase to 3.12 is consistent with modern CMake practices and aligns with the Parquet/Arrow integration requirements introduced in this PR.

lib/jansson-e23f558/CMakeLists.txt (1)

567-569: Test target disabled with documented rationale.

Commenting out the check target to avoid conflicts with other libraries is acceptable. The comment explains the rationale clearly.

cmake/kafka.cmake (1)

35-42: SASL configuration logic is well-structured.

The changes correctly enable built-in SASL features (PLAIN/SCRAM/OAUTHBEARER) unconditionally while making Cyrus-SASL optional for GSSAPI/Kerberos. The comments clearly explain the rationale for each setting.

tests/internal/parquet.c (1)

74-103: Comprehensive test coverage with good structure.

The test suite is well-organized into logical categories with good coverage of edge cases, type conversions, and error handling. The consistent pattern of setup/execute/verify/cleanup is followed throughout.

CMakeLists.txt (2)

284-287: Good documentation for FLB_PARQUET_ENCODER exclusion from FLB_ALL.

The comment explaining that Parquet encoder requires explicit enabling due to external dependencies is helpful for users and maintainers.


1337-1355: Parquet encoder integration is well-structured.

The Parquet encoder block properly:

  • Enables C++ with C++17 standard (required for Arrow/Parquet)
  • Delegates discovery to cmake/parquet.cmake, which comprehensively handles detection across platforms with three fallback methods (pkg-config, CMake Config, manual search)
  • Gracefully disables the feature with clear warnings if dependencies are not found

The cmake/parquet.cmake module correctly exports all required variables (ARROW_FOUND, PARQUET_FOUND, ARROW_INCLUDE_DIRS, PARQUET_INCLUDE_DIRS, ARROW_LIBRARIES, PARQUET_LIBRARIES) to the parent scope, with robust platform-specific library search paths, version validation, and helpful installation instructions.

src/flb_parquet_impl.cpp (4)

39-42: Forward declaration approach is reasonable.

The forward declaration of flb_msgpack_to_json_str to avoid C++ compilation conflicts with mpack.h is a pragmatic solution. Consider adding a brief comment explaining why the full header cannot be included.


650-727: Record processing logic is well-designed.

The on-the-fly processing approach addresses the previous review comment about dangling pointers. The lambda properly handles:

  • Non-map records by appending nulls
  • Missing fields with nullable vs non-nullable handling
  • Conversion failures with appropriate fallbacks
  • Statistics tracking for data quality monitoring

939-951: Exception handling properly cleans up resources.

The catch blocks correctly free output_buffer if it was allocated before the exception. However, note that output_buffer is only assigned on line 926, so the null check is appropriate but the variable could be initialized to nullptr for clarity (which it is on line 586).


863-865: RecordBatch creation uses correct row count.

The record_count variable correctly tracks the number of processed records and is used when creating the RecordBatch, ensuring consistency between the arrays and the batch metadata.

@kalavt kalavt force-pushed the feature/aws-s3-parquet branch 2 times, most recently from a409e9f to 37a602a Compare December 25, 2025 07:28
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
examples/s3_output/README.md (1)

486-486: Typo already flagged in previous review.

This issue was previously identified: "2020-203c" should be "2020-2030".

tests/internal/CMakeLists.txt (1)

136-141: Missing Parquet library linkage will cause linker errors.

The Parquet test file is correctly added to UNIT_TESTS_FILES, but the prepare_unit_tests function (lines 234-236) only links Avro libraries when FLB_AVRO_ENCODER is enabled. There's no corresponding linkage for ARROW_LIBRARIES and PARQUET_LIBRARIES when FLB_PARQUET_ENCODER is enabled.

🔎 Proposed fix in prepare_unit_tests function

Add the following after line 236:

       if(FLB_AVRO_ENCODER)
         target_link_libraries(${source_file_we} avro-static jansson)
       endif()
+
+      if(FLB_PARQUET_ENCODER)
+        target_link_libraries(${source_file_we} ${ARROW_LIBRARIES} ${PARQUET_LIBRARIES})
+      endif()

       add_test(NAME ${source_file_we}
🧹 Nitpick comments (5)
examples/s3_output/README.md (2)

465-465: Fix spacing in time reference.

Line 465 uses "60min" which should be "60-minute" or "60 minutes" for consistency with similar references elsewhere in the document (e.g., "5 minutes" at line 127).

🔎 Proposed fix
- **Fallback**: `upload_timeout=60m` (acts as safety net)
+ **Fallback**: `upload_timeout=60m` (acts as safety net for 60 minutes)

112-112: Add language identifiers to fenced code blocks.

Markdown linting expects language specifications on all code blocks for syntax highlighting and consistency. Add language identifiers to the three blocks without them:

  • Line 112: Plain-text block showing a formula/logic
  • Line 177: S3 file path tree structure
  • Line 270: Log output example
🔎 Proposed fixes

At line 112, specify a language (or use plain):

-```
+```text
 Upload Trigger = upload_timeout reached OR total_file_size reached
-```
+```

At line 177, specify a language for the tree structure:

-```
+```
 s3://s3-bucket-kafka-sink/fluent-bit-logs/
 ├── putobject/app/day=YYYYMMDD/      # Case 1: Small files via PutObject
 └── multipart/access/day=YYYYMMDD/   # Case 2: Large files via Multipart Upload
-```
+```

At line 270, specify a language for log output:

-```
+```log
 [parquet] Data quality summary for 53206 records:
 [parquet] Missing non-nullable fields (defaults used):
 [parquet]   field='level' count=52506
 [parquet]   field='timestamp' count=52506
-```
+```

Also applies to: 177-177, 270-270

include/fluent-bit/flb_parquet.h (1)

23-25: Consider removing unused includes.

The flb_sds.h include appears unnecessary since the API uses void* and size_t rather than flb_sds_t. The msgpack.h include is also not required in the public header since input is typed as const void*. Only stdlib.h (for size_t) is strictly needed.

Removing unused includes reduces header dependencies and compile times.

🔎 Proposed fix
-#include <fluent-bit/flb_sds.h>
-#include <msgpack.h>
 #include <stdlib.h>
cmake/parquet.cmake (1)

222-228: Unicode symbols may not render on all terminals.

The checkmarks (, ) in status messages are visually helpful but may not display correctly on terminals without UTF-8 support or in CI logs. Consider using ASCII alternatives like [OK] / [FAIL] for broader compatibility.

🔎 Alternative using ASCII
 if(ARROW_FOUND AND PARQUET_FOUND)
-  message(STATUS "✓ Arrow found: ${ARROW_LIBRARIES}")
+  message(STATUS "[OK] Arrow found: ${ARROW_LIBRARIES}")
   message(STATUS "  Include dirs: ${ARROW_INCLUDE_DIRS}")
-  message(STATUS "✓ Parquet found: ${PARQUET_LIBRARIES}")
+  message(STATUS "[OK] Parquet found: ${PARQUET_LIBRARIES}")
   message(STATUS "  Include dirs: ${PARQUET_INCLUDE_DIRS}")
 else()
-  message(STATUS "✗ Arrow/Parquet not found")
+  message(STATUS "[NOT FOUND] Arrow/Parquet not found")
src/flb_parquet_impl.cpp (1)

51-61: Consider documenting the buffer size parameter.

The 256 parameter in flb_msgpack_to_json_str appears to be an initial buffer size hint. Adding a brief comment or using a named constant would improve code clarity.

🔎 Suggested improvement
 std::string msgpack_object_to_json_string(const msgpack_object* obj) {
     /* Use Fluent Bit's built-in msgpack to JSON converter */
-    char *json_str = flb_msgpack_to_json_str(256, obj, FLB_FALSE);
+    /* Initial buffer size hint for JSON conversion */
+    char *json_str = flb_msgpack_to_json_str(256, obj, FLB_FALSE);
     if (!json_str) {
         return "{}";
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c99a15c and a409e9f.

📒 Files selected for processing (25)
  • .github/workflows/unit-tests.yaml
  • CMakeLists.txt
  • cmake/kafka.cmake
  • cmake/parquet.cmake
  • examples/s3_output/README.md
  • examples/s3_output/athena-queries.sql
  • examples/s3_output/fluent-bit-s3-parquet.conf
  • examples/s3_output/run-s3-parquet-test.sh
  • include/fluent-bit/aws/flb_aws_compress.h
  • include/fluent-bit/flb_parquet.h
  • lib/avro/CMakeLists.txt
  • lib/jansson-e23f558/CMakeLists.txt
  • plugins/out_kinesis_firehose/firehose.c
  • plugins/out_s3/s3.c
  • plugins/out_s3/s3.h
  • src/CMakeLists.txt
  • src/aws/CMakeLists.txt
  • src/aws/compression/CMakeLists.txt
  • src/aws/compression/arrow/CMakeLists.txt
  • src/aws/compression/arrow/compress.c
  • src/aws/compression/arrow/compress.h
  • src/aws/flb_aws_compress.c
  • src/flb_parquet_impl.cpp
  • tests/internal/CMakeLists.txt
  • tests/internal/parquet.c
💤 Files with no reviewable changes (5)
  • src/aws/CMakeLists.txt
  • src/aws/compression/arrow/CMakeLists.txt
  • src/aws/compression/arrow/compress.c
  • src/aws/compression/arrow/compress.h
  • src/aws/compression/CMakeLists.txt
🚧 Files skipped from review as they are similar to previous changes (6)
  • examples/s3_output/fluent-bit-s3-parquet.conf
  • .github/workflows/unit-tests.yaml
  • examples/s3_output/athena-queries.sql
  • plugins/out_kinesis_firehose/firehose.c
  • src/CMakeLists.txt
  • examples/s3_output/run-s3-parquet-test.sh
🧰 Additional context used
🧠 Learnings (13)
📓 Common learnings
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.

Applied to files:

  • include/fluent-bit/flb_parquet.h
  • examples/s3_output/README.md
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.

Applied to files:

  • src/aws/flb_aws_compress.c
  • include/fluent-bit/aws/flb_aws_compress.h
  • cmake/kafka.cmake
  • plugins/out_s3/s3.c
  • CMakeLists.txt
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

  • src/aws/flb_aws_compress.c
  • include/fluent-bit/aws/flb_aws_compress.h
  • src/flb_parquet_impl.cpp
  • plugins/out_s3/s3.c
  • CMakeLists.txt
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

  • src/aws/flb_aws_compress.c
  • include/fluent-bit/aws/flb_aws_compress.h
  • plugins/out_s3/s3.c
  • CMakeLists.txt
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.

Applied to files:

  • src/aws/flb_aws_compress.c
  • include/fluent-bit/aws/flb_aws_compress.h
  • plugins/out_s3/s3.c
  • CMakeLists.txt
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.

Applied to files:

  • src/aws/flb_aws_compress.c
  • include/fluent-bit/aws/flb_aws_compress.h
  • plugins/out_s3/s3.c
  • CMakeLists.txt
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.

Applied to files:

  • src/aws/flb_aws_compress.c
  • CMakeLists.txt
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.

Applied to files:

  • cmake/kafka.cmake
  • CMakeLists.txt
  • cmake/parquet.cmake
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.

Applied to files:

  • cmake/kafka.cmake
  • CMakeLists.txt
  • cmake/parquet.cmake
📚 Learning: 2025-09-04T12:35:22.872Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:1339-1344
Timestamp: 2025-09-04T12:35:22.872Z
Learning: In the Fluent Bit S3 plugin, the user prefers to maintain current retry_limit behavior without special handling for FLB_OUT_RETRY_UNLIMITED (-1), as there's no documentation indicating -1 should be used for infinite retries and consistency with current logic is preferred.

Applied to files:

  • plugins/out_s3/s3.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

  • CMakeLists.txt
📚 Learning: 2025-08-07T10:15:46.187Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10691
File: .github/workflows/unit-tests.yaml:94-101
Timestamp: 2025-08-07T10:15:46.187Z
Learning: Apache Arrow latest packages have changing checksums, making hardcoded checksum verification impractical for CI workflows that download the latest version.

Applied to files:

  • cmake/parquet.cmake
🧬 Code graph analysis (6)
plugins/out_s3/s3.h (1)
plugins/out_s3/s3.c (1)
  • s3_put_object (1663-1800)
include/fluent-bit/flb_parquet.h (1)
src/flb_parquet_impl.cpp (2)
  • flb_msgpack_raw_to_parquet (579-952)
  • flb_msgpack_raw_to_parquet (579-582)
src/aws/flb_aws_compress.c (2)
src/flb_gzip.c (1)
  • flb_gzip_compress (157-252)
src/flb_zstd.c (1)
  • flb_zstd_compress (33-57)
src/flb_parquet_impl.cpp (1)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
plugins/out_s3/s3.c (5)
src/aws/flb_aws_compress.c (2)
  • flb_aws_compression_get_type (80-97)
  • flb_aws_compression_compress (99-116)
src/flb_sds.c (2)
  • flb_sds_create_len (58-76)
  • flb_sds_destroy (389-399)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
plugins/out_s3/s3_multipart.c (1)
  • upload_part (685-799)
src/flb_pack.c (1)
  • flb_pack_msgpack_to_json_format (1169-1450)
tests/internal/parquet.c (2)
src/flb_parquet_impl.cpp (2)
  • flb_msgpack_raw_to_parquet (579-952)
  • flb_msgpack_raw_to_parquet (579-582)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
🪛 Cppcheck (2.19.0)
plugins/out_s3/s3.c

[error] 791-791: failed to expand 'flb_plg_error', it is invalid to use a preprocessor directive as macro parameter

(syntaxError)

🪛 LanguageTool
examples/s3_output/README.md

[grammar] ~56-~56: Ensure spelling is correct
Context: ..., 201, 404, 500, 401) - Response times (2ms to 5000ms) - Client information (IP, us...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~465-~465: Ensure spelling is correct
Context: ...les uploaded when reaching 50MB (before 60min timeout) Verification Steps: 1. R...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.18.1)
examples/s3_output/README.md

112-112: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


177-177: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


270-270: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (23)
examples/s3_output/README.md (1)

1-512: Excellent comprehensive documentation for S3 Parquet testing.

This new README provides a thorough, well-structured guide covering realistic test scenarios (two input sources, two output configurations), upload trigger mechanics, schema design best practices, data quality considerations, and Athena integration. The pedagogical tone helps users understand why decisions matter (e.g., nullable vs non-nullable fields, PutObject vs Multipart). The inclusion of expected behavior verification steps and troubleshooting guidance adds practical value. This fulfills the docs-required label requirement.

src/aws/flb_aws_compress.c (3)

31-37: LGTM!

The wrapper function correctly adapts the flb_snappy_compress signature to match the generic compression interface. The pointer casts are safe and the return value is properly propagated.


66-70: The FLB_AWS_COMPRESS_SNAPPY constant is properly defined.

The constant is defined in include/fluent-bit/aws/flb_aws_compress.h:30 with value 2, and the flb_snappy_compress_wrapper function is correctly defined in src/aws/flb_aws_compress.c:32. The array entry is correctly integrated.


26-26: No action needed. The include statement is correct as written.

Snappy is a core bundled dependency (located at lib/snappy-fef67ac/) built unconditionally in CMakeLists.txt. Unlike Arrow/Parquet which required #ifdef FLB_HAVE_ARROW guards as optional components, Snappy has no FLB_HAVE_SNAPPY conditional guards anywhere in the codebase and is included unconditionally across 12 existing files (http_server, flb_snappy.c, multiple plugins, etc.). The include at line 26 follows the established pattern.

lib/avro/CMakeLists.txt (1)

19-19: CMake version bump looks appropriate.

Bumping to CMake 3.12 aligns with the broader build system changes in this PR and enables modern CMake features. This is a reasonable minimum version for 2024+ projects.

lib/jansson-e23f558/CMakeLists.txt (2)

1-1: CMake version bump consistent with the PR.

Aligns with the 3.12 minimum version used across other CMake files in this PR.


567-569: Reasonable approach to avoid target name collisions.

Disabling the check target prevents conflicts with other libraries (e.g., nghttp2) that may define the same target name. The comment adequately explains the rationale.

plugins/out_s3/s3.h (2)

195-203: Clean addition of format support.

The new format field and schema_str pointer cleanly separate output format concerns from compression. The FLB_S3_FORMAT_JSON / FLB_S3_FORMAT_PARQUET constants follow a clear naming convention consistent with Fluent Bit patterns.


233-234: Function prototype aligns with SDS-based buffer handling.

The use of flb_sds_t buffer is consistent with Fluent Bit's string handling patterns. Note that the parameter names differ slightly from the implementation in s3.c (create_time vs file_first_log_time, buffer vs body), but this is cosmetic and doesn't affect functionality.

include/fluent-bit/flb_parquet.h (1)

27-85: Excellent documentation and clean API design.

The header provides comprehensive documentation including:

  • Clear parameter descriptions
  • Schema format specification with supported types
  • Usage examples
  • Memory ownership semantics (caller frees with flb_free)

This level of documentation is valuable for a public API.

cmake/parquet.cmake (1)

269-278: Variable exports look correct.

All discovered variables are properly exported to the parent scope, including conditional export of ARROW_VERSION. This ensures the discovery results are available to the main CMakeLists.txt.

cmake/kafka.cmake (2)

9-48: Clean simplification of SASL configuration.

The refactoring correctly recognizes that librdkafka has built-in support for SASL/PLAIN, SASL/SCRAM, and SASL/OAUTHBEARER without external dependencies. Only SASL/GSSAPI (Kerberos) requires cyrus-sasl.

Key improvements:

  • FLB_SASL_ENABLED is always ON (correct for built-in mechanisms)
  • FLB_SASL_OAUTHBEARER_ENABLED is always ON (built into librdkafka)
  • FLB_KAFKA_MSK_IAM_ENABLED is always ON (depends on OAuth Bearer)
  • WITH_SASL_CYRUS is conditional on finding cyrus-sasl (for Kerberos only)

This removes unnecessary complexity while maintaining full functionality.


77-81: Helpful feature summary for build diagnostics.

The summary block clearly shows which Kafka features are enabled, aiding troubleshooting of build configurations.

include/fluent-bit/aws/flb_aws_compress.h (1)

24-46: Well-documented enum reorganization with clear deprecation guidance.

The separation of compression algorithms (NONE=0, GZIP=1, SNAPPY=2, ZSTD=3) from format conversions (ARROW=4, PARQUET=5) improves semantic clarity. The deprecation comments provide clear migration paths.

The codebase exclusively uses symbolic constants (FLB_AWS_COMPRESS_*) rather than hardcoded numeric values, so the enum value changes are transparent to all callers and pose no breaking change risk.

Likely an incorrect or invalid review comment.

CMakeLists.txt (2)

1091-1111: Well-designed Homebrew libyaml detection for macOS.

The automatic detection of libyaml via Homebrew (brew --prefix libyaml) is a helpful developer experience improvement for macOS users. The graceful fallback sequence (pkg-config → Homebrew → compile test) ensures broad compatibility.


1337-1355: Parquet encoder integration looks correct.

The Parquet encoder setup properly:

  • Enables C++ and sets C++17 standard (addressing past review feedback)
  • Includes cmake/parquet.cmake for library detection
  • Guards all configuration behind FLB_PARQUET_ENCODER
  • Provides clear warnings when dependencies are missing
tests/internal/parquet.c (2)

33-66: Excellent test organization and coverage.

The test suite provides comprehensive coverage across 38 tests in 9 well-documented categories. The structured organization with clear category headers makes the test suite maintainable and easy to navigate.


74-103: Consistent and correct test pattern.

All tests follow a clean pattern: construct msgpack input, invoke the conversion function, assert success, and cleanup resources. The consistent use of flb_free for output buffers ensures no memory leaks.

src/flb_parquet_impl.cpp (3)

649-727: Immediate record processing correctly avoids dangling pointers.

Processing each record immediately during the msgpack unpack loop (rather than storing msgpack_object* pointers for later) correctly addresses the lifetime issue flagged in previous reviews. The msgpack_unpacked zone is reused on each iteration, so this approach ensures data integrity.


579-952: Robust implementation with comprehensive error handling.

The main conversion function demonstrates solid engineering:

  • Input validation and compression mapping
  • Proper Arrow/Parquet API usage with status checking
  • Immediate record processing to avoid lifetime issues
  • Statistics tracking for data quality monitoring
  • Clean exception handling with resource cleanup
  • Correct memory management between Arrow and Fluent Bit allocators

775-847: Fix missing newlines in debug log statements.

All flb_debug calls in the statistics output section are missing trailing \n characters, which will cause log messages to run together incorrectly. Each flb_debug call should end with a newline.

🔎 Proposed fix for debug log formatting
         if (has_issues) {
-            flb_debug("[parquet] Data quality summary for %zu records:", record_count);
+            flb_debug("[parquet] Data quality summary for %zu records:\n", record_count);

             if (!converter.complex_to_string_by_field.empty()) {
-                flb_debug("[parquet] Complex types serialized to JSON string:");
+                flb_debug("[parquet] Complex types serialized to JSON string:\n");
                 for (const auto& pair : converter.complex_to_string_by_field) {
-                    flb_debug("[parquet]   field='%s' count=%zu",
-                            pair.first.c_str(), pair.second);
+                    flb_debug("[parquet]   field='%s' count=%zu\n",
+                            pair.first.c_str(), pair.second);
                 }
             }

             if (!converter.string_parsed_to_int_by_field.empty()) {
-                flb_debug("[parquet] Strings parsed to integers:");
+                flb_debug("[parquet] Strings parsed to integers:\n");
                 for (const auto& pair : converter.string_parsed_to_int_by_field) {
-                    flb_debug("[parquet]   field='%s' count=%zu",
-                            pair.first.c_str(), pair.second);
+                    flb_debug("[parquet]   field='%s' count=%zu\n",
+                            pair.first.c_str(), pair.second);
                 }
             }

             /* Apply similar fixes to remaining flb_debug calls in this section */

Apply this pattern to all remaining flb_debug calls through line 847.

Likely an incorrect or invalid review comment.

plugins/out_s3/s3.c (2)

744-824: Verify legacy Parquet compression behavior.

The legacy compression=arrow/parquet path sets format=parquet but leaves ctx->compression = FLB_AWS_COMPRESS_NONE (initialized at line 745). A previous review comment suggested this should default to GZIP for backward compatibility. The comment at line 764 says "Keep default no compression," but if the old Arrow/Parquet path applied GZIP by default, this represents a breaking change.

Please verify whether the legacy compression path should default to GZIP when format=parquet is set via the deprecated compression=arrow/parquet values.

Based on the previous review comment at lines 744-824.


42-44: LGTM: Parquet integration is well-implemented.

The Parquet format support is thoughtfully designed:

  1. Clean separation: Parquet uses internal compression (passed to flb_msgpack_raw_to_parquet), while outer compression is correctly skipped for Parquet format throughout the codebase.

  2. Efficient buffering: The architectural change to store raw msgpack data (lines 3953-3962) instead of pre-converting to JSON is a smart optimization that:

    • Avoids repeated conversions when buffering
    • Enables efficient batching for columnar formats
    • Defers format conversion until upload time (line 1598)
  3. Proper memory management: All SDS buffers are consistently managed with flb_sds_create_len and flb_sds_destroy. Arrow-allocated Parquet buffers are correctly freed with flb_free (line 1617) after copying to SDS.

  4. Good error handling: Schema validation (lines 1588-1596), conversion error checking (lines 1605-1613), and cleanup on all error paths are properly implemented.

Also applies to: 1520-1661, 3953-3962

@kalavt kalavt force-pushed the feature/aws-s3-parquet branch from 37a602a to 7f32dd4 Compare December 25, 2025 07:32
@kalavt kalavt force-pushed the feature/aws-s3-parquet branch from f52b618 to b5a7522 Compare January 13, 2026 06:00
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🤖 Fix all issues with AI agents
In @plugins/out_s3/s3_auth.c:
- Around line 380-385: The allocated buffer full_url created via
flb_sds_create_size can leak if flb_sds_printf fails; update the failure path
after calling flb_sds_printf to destroy full_url (and url_path) before returning
-1. Specifically, in the block handling flb_sds_printf returning NULL, call
flb_sds_destroy(full_url) and flb_sds_destroy(url_path) and then return -1 so
the original allocation is freed; keep the rest of the logic around
flb_sds_printf, full_url, and url_path unchanged.

In @plugins/out_s3/s3_queue.h:
- Around line 89-97: The comment above s3_queue_recover_all lists phases as
"Phase 1, Phase 2, Phase 3" but the implementation uses phases numbered 0, 1, 2;
update the documentation in the block comment to use the same zero-based
numbering as the function (Phase 0: Cleanup..., Phase 1: State transitions...,
Phase 2: Queue rebuild...) or alternatively change the implementation's phase
constants to 1..3 to match the prose — ensure the numbering in the comment and
in s3_queue_recover_all are consistent and reference the same phase identifiers.

In @plugins/out_s3/s3_store.c:
- Around line 360-383: The stream name formats differ between branches; make
them consistent by using the Windows-safe format with hyphens for time
separators ("%Y-%m-%dT%H-%M-%S") in both the FLB_SYSTEM_WINDOWS and POSIX code
paths inside create_stream_name, updating the strftime call in the POSIX branch
(and leave the existing Windows branch format) so generated stream names are
identical across platforms while still avoiding colons on Windows.

In @plugins/out_s3/s3_stream.c:
- Around line 443-458: When allocation or initialization fails after partially
setting up the stream context (e.g., failure of flb_malloc for ctx->read_buffer
or msgpack_unpacker_init), ensure you clean up any already-created resources
(free ctx->read_buffer, unlink/free temp_path, close temp_fp and msgpack_fp if
opened) and deinitialize msgpack state; call or emulate stream_context_destroy
behavior before returning error and correctly set ctx->unpacker_initialized and
ctx->result_initialized only when those inits succeed so the cleanup path knows
what to tear down.

In @src/aws/flb_aws_compress.c:
- Around line 32-43: The comment above flb_snappy_compress_wrapper incorrectly
cites "RFC 7093"; update the comment to remove the RFC reference and instead
reference the correct Snappy framing specification used by
flb_snappy_compress_framed_data (e.g., "Snappy framing format as documented by
Google / framing_format.txt"). Ensure the comment still explains that the framed
format supports streaming/concatenation and is safe for chunked compression, but
replace "RFC 7093" with the correct Snappy framing spec reference and wording.

In @src/flb_snappy.c:
- Around line 119-131: The comment block above FLB_SNAPPY_MAX_BLOCK_SIZE
incorrectly cites "RFC 7093" as the Snappy framing format; update the header
comment in flb_snappy.c to reference the correct spec (Google's Snappy framing
format / Snappy framing format documentation) and remove or replace the RFC 7093
mention, keeping the rest of the description (stream identifier, frame layout,
max block size) intact so the comment accurately documents the source and
format.

In @tests/include/s3_test_helpers.h:
- Around line 340-365: The s3_test_is_safe_temp_path function fails to accept
Windows-style paths because it only checks for '/' after the temp dir prefix and
in fallback prefixes; update the checks in s3_test_is_safe_temp_path to also
accept backslash ('\\') as a separator (and consider the case where the prefix
is the entire path without a trailing separator) and extend the fallback
comparisons for "/tmp/" and "/var/tmp/" to also match "\\tmp\\" and
"\\var\\tmp\\" (and their variants without trailing separators) so Windows temp
paths are treated as safe.

In @tests/internal/parquet.c:
- Line 1119: Remove the unused local variable `ret` from the test function
`test_destructive_invalid_compression`; locate the declaration `int ret;` in
that function and delete it (or if it was intended to capture a return value,
replace its usage with direct assertions or use the appropriate variable),
ensuring the function compiles without unused variable warnings.
- Around line 54-57: The local duplicate defines for FLB_AWS_COMPRESS_NONE,
FLB_AWS_COMPRESS_GZIP, FLB_AWS_COMPRESS_SNAPPY, and FLB_AWS_COMPRESS_ZSTD should
be removed and the shared definitions used instead: add an #include
<fluent-bit/aws/flb_aws_compress.h> to the top of the file and delete the local
#define lines so the code uses the canonical constants from flb_aws_compress.h.
🧹 Nitpick comments (18)
src/flb_blob_db.c (1)

308-341: Schema migration handles the common case well but could be more robust.

The migration logic correctly detects and ignores the "duplicate column name" error for existing databases. However, the error message string comparison is locale-dependent and could fail on SQLite builds with different language settings.

Consider using a more robust approach that queries PRAGMA table_info(blob_files) first to check if the column exists, avoiding reliance on error message text.

♻️ Alternative approach using PRAGMA table_info
     /*
      * Schema upgrade for existing databases: add s3_key column if not exists.
-     * This ensures backward compatibility with databases created before the s3_key
-     * column was added. We only ignore the "duplicate column name" error;
-     * other errors (permissions, disk space, etc.) are propagated.
      */
     {
         char *errmsg = NULL;
+        int column_exists = 0;
+        sqlite3_stmt *stmt = NULL;
+        
+        /* Check if column already exists */
+        result = sqlite3_prepare_v2(db->handler,
+                                    "PRAGMA table_info(blob_files);",
+                                    -1, &stmt, NULL);
+        if (result == SQLITE_OK) {
+            while (sqlite3_step(stmt) == SQLITE_ROW) {
+                const char *col_name = (const char *)sqlite3_column_text(stmt, 1);
+                if (col_name && strcmp(col_name, "s3_key") == 0) {
+                    column_exists = 1;
+                    break;
+                }
+            }
+            sqlite3_finalize(stmt);
+        }
+        
+        if (!column_exists) {
             result = sqlite3_exec(db->handler,
                               "ALTER TABLE blob_files ADD COLUMN s3_key TEXT DEFAULT '';",
                               NULL, NULL, &errmsg);
-        if (result != SQLITE_OK) {
-            /*
-             * SQLITE_ERROR with "duplicate column name" is expected when
-             * the column already exists - this is the success case for upgrades.
-             * Any other error indicates a real problem.
-             */
-            if (result == SQLITE_ERROR && errmsg != NULL &&
-                strstr(errmsg, "duplicate column name") != NULL) {
-                /* Column already exists - this is expected, ignore */
-            }
-            else {
-                /* Real error - clean up and return */
+            if (result != SQLITE_OK) {
                 if (errmsg != NULL) {
                     sqlite3_free(errmsg);
                 }
                 flb_sqldb_close(db);
                 return FLB_BLOB_DB_ERROR_FILE_TABLE_CREATION;
             }
+            if (errmsg != NULL) {
+                sqlite3_free(errmsg);
+            }
         }
-        if (errmsg != NULL) {
-            sqlite3_free(errmsg);
-        }
     }
tests/include/aws_client_mock.c (1)

296-301: Redundant condition in macro expansion.

The condition && response_config->config_parameter != FLB_AWS_CLIENT_MOCK_SET_DATA on line 298 is redundant because this else if block will only be reached if none of the previous conditions matched, and FLB_AWS_CLIENT_MOCK_SET_DATA is already handled in the preceding else if block (lines 260-289).

While this doesn't cause incorrect behavior, it adds unnecessary runtime overhead.

Suggested fix
 #define EXPAND_CLIENT_RESPONSE_PARAMETER(lower, UPPER, type)                           \
-        else if (response_config->config_parameter == FLB_AWS_CLIENT_MOCK_SET_##UPPER  \
-                 && response_config->config_parameter != FLB_AWS_CLIENT_MOCK_SET_DATA) \
+        else if (response_config->config_parameter == FLB_AWS_CLIENT_MOCK_SET_##UPPER) \
         {                                                                              \
             c->resp.lower = CONVERT_##type((char *)val1);                              \
         }

Note: If the X-macro definition file includes a DATA entry that would generate FLB_AWS_CLIENT_MOCK_SET_DATA, the condition may be intentional to prevent the macro from applying to DATA. In that case, keep the condition but add a comment explaining why.

tests/include/s3_test_helpers.h (1)

407-420: Buffer size may be insufficient for long Windows paths.

The buffer size of 300 bytes may be too small for Windows paths, which can exceed 260 characters with long temp directory paths plus the suffix. Consider using a larger buffer or MAX_PATH on Windows.

Suggested fix
 static inline void s3_test_cleanup_temp_db(const char *db_path)
 {
-    char buf[300];
+    char buf[512];  /* Match template_path size in other functions */
     if (!db_path) {
         return;
     }
tests/runtime/out_s3_edge_cases.c (1)

118-123: Redundant assertion after teardown.

The TEST_CHECK(ret == 0) at line 123 re-checks ret which was already verified at line 112. Since flb_stop() and flb_destroy() don't modify ret, this check is misleading—it doesn't verify that teardown succeeded.

This pattern repeats throughout the file (lines 210, 294, 377, 463, 548, 636, 719, 805, 891, 976, 1057, 1123). While not a bug, these could be removed or replaced with meaningful assertions about cleanup success.

tests/internal/fuzzers/aws_util_fuzzer.c (1)

72-76: API call updated correctly for new signature.

Passing NULL for the new file_path parameter is appropriate since this fuzzer focuses on format string, tag, and delimiter parsing. The implementation correctly handles NULL by skipping file path substitutions.

Consider extending fuzzer coverage to also test the file_path parameter with fuzzer-generated paths to ensure robustness of $FILE_PATH and $FILE_NAME substitutions.

tests/internal/parquet.c (1)

916-927: Use flb_malloc/flb_free for consistency.

Fluent Bit codebase conventions use flb_malloc and flb_free wrappers for memory allocation. This allows for consistent memory tracking and potential future instrumentation.

♻️ Suggested fix
-    char *long_str = (char *)malloc(long_str_size);
+    char *long_str = (char *)flb_malloc(long_str_size);

And at line 944:

-    free(long_str);
+    flb_free(long_str);
plugins/out_s3/s3_auth.c (1)

230-230: Consider using DEBUG level for successful URL retrieval.

Logging at INFO level for every presigned URL retrieval could be verbose in production environments with high throughput. Consider using DEBUG level for success messages.

♻️ Suggested change
-    flb_plg_info(ctx->ins, "Pre-signed URL retrieved successfully");
+    flb_plg_debug(ctx->ins, "Pre-signed URL retrieved successfully");
include/fluent-bit/flb_parquet.h (1)

23-25: Minor: stdlib.h may be redundant.

The size_t type is already available from msgpack.h or flb_sds.h. Consider removing stdlib.h unless it's needed for other definitions.

tests/runtime/out_s3_queue.c (1)

133-135: Consider checking flb_lib_push return values for consistency.

These calls don't check the return value, unlike other tests in the same file (e.g., lines 69-70, 198-199). While test failures would eventually be caught by mock verification, consistent error checking improves debuggability.

Suggested fix
     /* Push multiple chunks to test worker distribution */
-    flb_lib_push(ctx, in_ffd, (char *)JSON_TD, sizeof(JSON_TD) - 1);
-    flb_lib_push(ctx, in_ffd, (char *)JSON_TD, sizeof(JSON_TD) - 1);
-    flb_lib_push(ctx, in_ffd, (char *)JSON_TD, sizeof(JSON_TD) - 1);
+    ret = flb_lib_push(ctx, in_ffd, (char *)JSON_TD, sizeof(JSON_TD) - 1);
+    TEST_CHECK(ret >= 0);
+    ret = flb_lib_push(ctx, in_ffd, (char *)JSON_TD, sizeof(JSON_TD) - 1);
+    TEST_CHECK(ret >= 0);
+    ret = flb_lib_push(ctx, in_ffd, (char *)JSON_TD, sizeof(JSON_TD) - 1);
+    TEST_CHECK(ret >= 0);
plugins/out_s3/s3_blob.c (1)

429-435: Minor indentation issue.

The flb_plg_error on line 430 has inconsistent indentation.

Suggested fix
         ret = flb_blob_db_file_insert(&ctx->blob_db, event_chunk->tag, source,
                                       ctx->endpoint, file_path, file_size);
         if (ret == -1) {
-        flb_plg_error(ctx->ins, "cannot insert blob file: %s (size=%zu)",
-                      file_path, file_size);
+            flb_plg_error(ctx->ins, "cannot insert blob file: %s (size=%zu)",
+                          file_path, file_size);
             cfl_sds_destroy(file_path);
src/aws/flb_aws_util.c (1)

1063-1071: Minor optimization opportunity.

The strlen(tag_delimiter) is called on each loop iteration. While not a performance concern for typical use, hoisting it outside the loop would be cleaner.

Suggested optimization
+    size_t delimiter_len = strlen(tag_delimiter);
     /* Check if delimiter(s) specified exists in the tag. */
-    for (i = 0; i < strlen(tag_delimiter); i++) {
+    for (i = 0; i < delimiter_len; i++) {
         if (strchr(tag, tag_delimiter[i])) {
             ret = 1;
             break;
         }
     }
src/flb_parquet_impl.cpp (1)

913-921: Consider documenting why total_file_size is unused.

The parameter is silenced with (void)total_file_size, but it's unclear why it's part of the API signature if unused. Consider adding a brief comment explaining the intended future use or removing the parameter if not needed.

plugins/out_s3/s3_stream.c (1)

81-123: Variable current_offset is unused.

The variable current_offset is incremented but never actually used for any logic. The remaining variable already tracks the bytes left to process.

🔧 Suggested fix
-    current_offset = offset_start;
     remaining = (offset_end > 0) ? (offset_end - offset_start) : -1;
 
     while (1) {
@@ -113,9 +112,7 @@
         compressed_chunk = NULL;
 
         if (remaining > 0) {
             remaining -= bytes_read;
-            current_offset += bytes_read;
-            if (remaining <= 0 || current_offset >= offset_end) {
+            if (remaining <= 0) {
                 break;
             }
         }
plugins/out_s3/s3_store.c (1)

244-272: Potential integer underflow in size calculation.

In calculate_chunk_data_size, the subtraction on line 271 could theoretically underflow if st.st_size is less than CIO_FILE_HEADER_MIN + meta_size but greater than CIO_FILE_HEADER_MIN + meta_size individually (unlikely given the check on line 267, but the types involved are mixed signed/unsigned). The check on line 267 guards against this, but the comparison mixes off_t (signed) with size_t (unsigned).

🔧 Suggested fix for type safety
-    if (st.st_size <= CIO_FILE_HEADER_MIN + meta_size) {
+    if (st.st_size < 0 || (size_t)st.st_size <= CIO_FILE_HEADER_MIN + meta_size) {
         return 0;
     }
 
-    return st.st_size - CIO_FILE_HEADER_MIN - meta_size;
+    return (size_t)st.st_size - CIO_FILE_HEADER_MIN - meta_size;
plugins/out_s3/s3_queue.c (1)

343-349: Zero-size file check could indicate real issues.

The is_queue_entry_valid function rejects entries where upload_file->size <= 0 (line 343), but this check uses <= which would also reject size 0. A zero-size file might be a valid edge case (empty log file), or it might indicate a bug. Consider whether this should log at error level rather than warn.

plugins/out_s3/s3_multipart.c (2)

1166-1187: Exponential backoff has overly complex overflow handling.

The backoff calculation logic is hard to follow. The check if (shift_exp > 30) on line 1174 after computing shift_exp = 1ULL << (part_attempt - 1) is confusing because shift_exp will already be a large value, not > 30. The intent seems to be capping at 30 seconds, but the logic is convoluted.

🔧 Simplified backoff calculation
             /* Apply exponential backoff before retry */
             if (part_attempt < max_attempts) {
-                uint64_t shift_exp;
-                /* Prevent overflow in shift operation */
-                if (part_attempt - 1 >= 30) {
-                    backoff_ms = 30000;  /* Max out immediately if exponent too large */
-                }
-                else {
-                    shift_exp = 1ULL << (part_attempt - 1);
-                    if (shift_exp > 30) {
-                        backoff_ms = 30000;
-                    }
-                    else {
-                        backoff_ms = 1000 * (int)shift_exp;
-                        if (backoff_ms > 30000) {
-                            backoff_ms = 30000;  /* Cap at 30 seconds */
-                        }
-                    }
-                }
+                /* Exponential backoff: 1s, 2s, 4s, 8s, ... capped at 30s */
+                int exponent = (part_attempt - 1 < 5) ? part_attempt - 1 : 5;
+                backoff_ms = 1000 * (1 << exponent);
+                if (backoff_ms > 30000) {
+                    backoff_ms = 30000;
+                }
                 flb_plg_info(ctx->ins, "Retrying part upload after %d ms", backoff_ms);
                 sleep_ms(backoff_ms);
             }

1256-1262: Size mismatch check uses mixed types.

The comparison m_upload->bytes != (size_t)file_stat.st_size compares size_t with a cast of off_t. On platforms where off_t is 64-bit and size_t is 32-bit, this could truncate large file sizes. However, since this runs after successfully uploading all parts, the practical impact is minimal.

plugins/out_s3/s3.h (1)

180-181: Format constants should be an enum for type safety.

Using #define for format constants works but an enum would provide better type safety and IDE support.

🔧 Suggested improvement
-#define FLB_S3_FORMAT_JSON     0
-#define FLB_S3_FORMAT_PARQUET  1
+enum flb_s3_format {
+    FLB_S3_FORMAT_JSON = 0,
+    FLB_S3_FORMAT_PARQUET = 1
+};
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f52b618 and b5a7522.

📒 Files selected for processing (51)
  • .github/workflows/unit-tests.yaml
  • CMakeLists.txt
  • include/fluent-bit/aws/flb_aws_compress.h
  • include/fluent-bit/flb_aws_util.h
  • include/fluent-bit/flb_blob_db.h
  • include/fluent-bit/flb_parquet.h
  • include/fluent-bit/flb_snappy.h
  • plugins/in_blob/blob_file.c
  • plugins/out_kinesis_firehose/firehose.c
  • plugins/out_s3/CMakeLists.txt
  • plugins/out_s3/s3.c
  • plugins/out_s3/s3.h
  • plugins/out_s3/s3_auth.c
  • plugins/out_s3/s3_auth.h
  • plugins/out_s3/s3_blob.c
  • plugins/out_s3/s3_blob.h
  • plugins/out_s3/s3_multipart.c
  • plugins/out_s3/s3_multipart.h
  • plugins/out_s3/s3_queue.c
  • plugins/out_s3/s3_queue.h
  • plugins/out_s3/s3_store.c
  • plugins/out_s3/s3_store.h
  • plugins/out_s3/s3_stream.c
  • plugins/out_s3/s3_stream.h
  • src/CMakeLists.txt
  • src/aws/CMakeLists.txt
  • src/aws/compression/CMakeLists.txt
  • src/aws/flb_aws_compress.c
  • src/aws/flb_aws_util.c
  • src/flb_blob_db.c
  • src/flb_parquet_impl.cpp
  • src/flb_signv4.c
  • src/flb_snappy.c
  • tests/include/aws_client_mock.c
  • tests/include/aws_client_mock.h
  • tests/include/s3_test_helpers.h
  • tests/internal/CMakeLists.txt
  • tests/internal/aws_compress.c
  • tests/internal/aws_util.c
  • tests/internal/fuzzers/aws_util_fuzzer.c
  • tests/internal/parquet.c
  • tests/runtime/CMakeLists.txt
  • tests/runtime/out_s3.c
  • tests/runtime/out_s3_config.c
  • tests/runtime/out_s3_edge_cases.c
  • tests/runtime/out_s3_error_handling.c
  • tests/runtime/out_s3_format.c
  • tests/runtime/out_s3_multipart.c
  • tests/runtime/out_s3_queue.c
  • tests/runtime/out_s3_recovery.c
  • tests/runtime/out_s3_special_features.c
💤 Files with no reviewable changes (3)
  • src/aws/compression/CMakeLists.txt
  • tests/runtime/out_s3.c
  • src/aws/CMakeLists.txt
🚧 Files skipped from review as they are similar to previous changes (7)
  • src/flb_signv4.c
  • include/fluent-bit/flb_snappy.h
  • include/fluent-bit/aws/flb_aws_compress.h
  • plugins/in_blob/blob_file.c
  • tests/runtime/out_s3_config.c
  • plugins/out_s3/CMakeLists.txt
  • plugins/out_s3/s3_blob.h
🧰 Additional context used
🧠 Learnings (18)
📚 Learning: 2025-08-07T10:15:46.187Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10691
File: .github/workflows/unit-tests.yaml:94-101
Timestamp: 2025-08-07T10:15:46.187Z
Learning: Apache Arrow latest packages have changing checksums, making hardcoded checksum verification impractical for CI workflows that download the latest version.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • src/flb_parquet_impl.cpp
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • CMakeLists.txt
  • tests/internal/aws_compress.c
  • src/flb_parquet_impl.cpp
  • src/CMakeLists.txt
  • tests/internal/CMakeLists.txt
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • CMakeLists.txt
  • src/CMakeLists.txt
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • CMakeLists.txt
  • src/aws/flb_aws_compress.c
  • src/flb_snappy.c
  • plugins/out_kinesis_firehose/firehose.c
  • tests/internal/aws_compress.c
  • plugins/out_s3/s3_stream.c
  • src/flb_parquet_impl.cpp
  • src/CMakeLists.txt
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • CMakeLists.txt
  • src/aws/flb_aws_compress.c
  • plugins/out_kinesis_firehose/firehose.c
  • tests/internal/aws_compress.c
  • src/flb_parquet_impl.cpp
  • src/CMakeLists.txt
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • CMakeLists.txt
  • src/aws/flb_aws_compress.c
  • src/flb_snappy.c
  • plugins/out_kinesis_firehose/firehose.c
  • tests/internal/aws_compress.c
  • plugins/out_s3/s3_stream.c
  • src/CMakeLists.txt
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • CMakeLists.txt
  • src/aws/flb_aws_compress.c
  • src/flb_snappy.c
  • plugins/out_kinesis_firehose/firehose.c
  • tests/internal/aws_compress.c
  • plugins/out_s3/s3_stream.c
  • src/CMakeLists.txt
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • CMakeLists.txt
  • src/aws/flb_aws_compress.c
  • tests/internal/aws_compress.c
  • src/flb_parquet_impl.cpp
  • src/CMakeLists.txt
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • CMakeLists.txt
  • tests/internal/aws_compress.c
  • plugins/out_s3/s3_store.c
  • src/flb_parquet_impl.cpp
  • src/CMakeLists.txt
  • tests/include/s3_test_helpers.h
📚 Learning: 2025-09-04T12:35:22.872Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:1339-1344
Timestamp: 2025-09-04T12:35:22.872Z
Learning: In the Fluent Bit S3 plugin, the user prefers to maintain current retry_limit behavior without special handling for FLB_OUT_RETRY_UNLIMITED (-1), as there's no documentation indicating -1 should be used for infinite retries and consistency with current logic is preferred.

Applied to files:

  • plugins/out_s3/s3_auth.c
  • plugins/out_s3/s3_multipart.c
  • plugins/out_s3/s3_queue.c
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.

Applied to files:

  • include/fluent-bit/flb_aws_util.h
  • include/fluent-bit/flb_parquet.h
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.

Applied to files:

  • CMakeLists.txt
  • src/aws/flb_aws_compress.c
  • src/flb_snappy.c
  • tests/internal/aws_compress.c
  • src/CMakeLists.txt
📚 Learning: 2025-09-04T12:35:36.904Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:3275-3282
Timestamp: 2025-09-04T12:35:36.904Z
Learning: The out_s3 plugin intentionally uses a simple numeric comparison for retry_limit (chunk->failures >= ctx->ins->retry_limit) rather than the standard Fluent Bit pattern that checks for FLB_OUT_RETRY_UNLIMITED (-1). The maintainer wants to keep this current behavior for consistency within the plugin.

Applied to files:

  • plugins/out_s3/s3_multipart.c
  • plugins/out_s3/s3_queue.c
  • plugins/out_s3/s3_stream.c
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.

Applied to files:

  • plugins/out_s3/s3_stream.c
  • src/aws/flb_aws_util.c
📚 Learning: 2025-09-04T07:28:37.083Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10832
File: src/wasm/CMakeLists.txt:112-131
Timestamp: 2025-09-04T07:28:37.083Z
Learning: In fluent-bit CMake files, the user cosmo0920 prefers treating Git as a command rather than a package, emphasizing that Git is not a pkg-config retrievable package but just a command.

Applied to files:

  • src/flb_parquet_impl.cpp
  • src/CMakeLists.txt
📚 Learning: 2025-12-22T05:39:02.291Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11250
File: lib/ripser-1.2.1/Makefile:1-18
Timestamp: 2025-12-22T05:39:02.291Z
Learning: In fluent/fluent-bit, the Makefile at lib/ripser-1.2.1/Makefile is imported from upstream Ripser and is not used in the actual build process. The project uses CMake for building (lib/ripser-1.2.1/CMakeLists.txt), so changes to the imported Makefile are not necessary.

Applied to files:

  • src/CMakeLists.txt
📚 Learning: 2025-09-14T09:46:09.531Z
Learnt from: aminvakil
Repo: fluent/fluent-bit PR: 10844
File: conf/fluent-bit:13-15
Timestamp: 2025-09-14T09:46:09.531Z
Learning: For fluent-bit Debian packaging, /opt/fluent-bit/bin/ is the appropriate installation path since the package may be installed from non-official Debian sources, making /opt compliant with FHS for optional software packages.

Applied to files:

  • src/CMakeLists.txt
📚 Learning: 2025-12-17T18:45:02.283Z
Learnt from: edsiper
Repo: fluent/fluent-bit PR: 11286
File: src/flb_http_client.c:1657-1715
Timestamp: 2025-12-17T18:45:02.283Z
Learning: In the Fluent Bit OAuth2 implementation (src/flb_oauth2.c), the function flb_oauth2_get_access_token returns a pointer to the cached token stored in the OAuth2 context (ctx->access_token), not a new allocation. The OAuth2 context manages the token lifecycle, so callers must not free the returned token pointer.

Applied to files:

  • plugins/out_s3/s3.h
🧬 Code graph analysis (23)
plugins/out_s3/s3_stream.h (1)
plugins/out_s3/s3_stream.c (4)
  • stream_compress_file (142-213)
  • stream_process_msgpack_file (561-629)
  • stream_json_processor (631-659)
  • stream_log_key_processor (734-763)
include/fluent-bit/flb_aws_util.h (1)
src/aws/flb_aws_util.c (2)
  • flb_get_s3_key (1262-1344)
  • flb_aws_uri_encode_path (1445-1481)
tests/internal/parquet.c (3)
src/flb_parquet_impl.cpp (1)
  • field_name (548-548)
src/flb_pack.c (1)
  • flb_pack_json (530-535)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
tests/runtime/out_s3_multipart.c (1)
tests/include/aws_client_mock.c (1)
  • flb_aws_client_mock_clear_generator_instance (61-64)
include/fluent-bit/flb_parquet.h (1)
src/flb_parquet_impl.cpp (8)
  • flb_parquet_validate_schema (853-876)
  • flb_parquet_validate_schema (853-855)
  • flb_parquet_schema_create (878-903)
  • flb_parquet_schema_create (878-880)
  • flb_parquet_schema_destroy (905-911)
  • flb_parquet_schema_destroy (905-905)
  • flb_msgpack_to_parquet_streaming (913-1239)
  • flb_msgpack_to_parquet_streaming (913-918)
tests/runtime/out_s3_format.c (2)
tests/include/s3_test_helpers.h (5)
  • s3_test_next (428-436)
  • s3_test_create_temp_db_path (137-194)
  • s3_test_create_temp_store_dir (204-257)
  • s3_test_cleanup_temp_db (407-420)
  • s3_test_cleanup_temp_store_dir (374-399)
tests/include/aws_client_mock.c (3)
  • flb_aws_client_mock_configure_generator (32-37)
  • flb_aws_client_get_mock_generator (127-130)
  • flb_aws_client_mock_clear_generator_instance (61-64)
plugins/out_s3/s3_auth.h (1)
plugins/out_s3/s3_auth.c (3)
  • s3_auth_init_endpoint (79-128)
  • s3_auth_request_presigned_url (173-260)
  • s3_auth_fetch_presigned_url (359-400)
plugins/out_s3/s3_store.h (1)
plugins/out_s3/s3_store.c (1)
  • flb_chunk_file_open (533-569)
src/flb_snappy.c (1)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
plugins/out_s3/s3_blob.c (7)
plugins/out_s3/s3_multipart.c (2)
  • s3_multipart_abort (737-799)
  • s3_multipart_check_upload_exists (805-879)
plugins/out_s3/s3_auth.c (1)
  • s3_auth_fetch_presigned_url (359-400)
src/flb_input_blob.c (1)
  • flb_input_blob_delivery_notification_destroy (222-231)
src/flb_blob_db.c (1)
  • flb_blob_file_update_remote_id (586-617)
src/aws/flb_aws_util.c (1)
  • flb_get_s3_key (1262-1344)
src/flb_log_event_decoder.c (1)
  • flb_log_event_decoder_next (309-456)
plugins/out_s3/s3_queue.c (1)
  • s3_queue_add_pending_file (260-273)
plugins/out_s3/s3_multipart.c (3)
include/fluent-bit/flb_compat.h (1)
  • usleep (129-133)
plugins/out_s3/s3.c (1)
  • write_seq_index (127-147)
plugins/out_s3/s3_auth.c (1)
  • s3_auth_fetch_presigned_url (359-400)
plugins/out_s3/s3_queue.h (1)
plugins/out_s3/s3_queue.c (11)
  • s3_queue_remove (308-316)
  • s3_queue_buffer_chunk (356-378)
  • s3_queue_add_file (117-131)
  • s3_queue_add_part (133-203)
  • s3_queue_add_pending_file (260-273)
  • s3_queue_add_file_unlocked (59-112)
  • s3_queue_add_pending_file_unlocked (209-255)
  • s3_queue_process_entry (748-807)
  • s3_queue_recover_all (850-895)
  • s3_queue_recover_from_database (988-1076)
  • s3_queue_recover_from_fstore (1141-1201)
plugins/out_s3/s3_stream.c (3)
src/aws/flb_aws_compress.c (1)
  • flb_aws_compression_compress (109-126)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
plugins/out_s3/s3_store.c (1)
  • flb_chunk_file_open (533-569)
plugins/out_s3/s3_store.c (4)
src/flb_sds.c (4)
  • flb_sds_create_size (92-95)
  • flb_sds_printf (336-387)
  • flb_sds_destroy (389-399)
  • flb_sds_create (78-90)
src/flb_fstore.c (8)
  • flb_fstore_file_delete (259-277)
  • flb_fstore_file_create (153-190)
  • flb_fstore_file_meta_set (74-108)
  • flb_fstore_file_append (298-329)
  • flb_fstore_create (479-525)
  • flb_fstore_destroy (527-564)
  • flb_fstore_stream_create (335-402)
  • flb_fstore_file_inactive (238-256)
include/fluent-bit/flb_mem.h (2)
  • flb_free (126-128)
  • flb_calloc (84-96)
include/fluent-bit/flb_compat.h (1)
  • gmtime_r (76-82)
src/aws/flb_aws_util.c (4)
src/flb_http_client.c (1)
  • flb_http_client_destroy (1821-1828)
src/flb_sds.c (3)
  • flb_sds_create_size (92-95)
  • flb_sds_cat (120-141)
  • flb_sds_destroy (389-399)
include/fluent-bit/flb_compat.h (1)
  • gmtime_r (76-82)
include/fluent-bit/flb_mem.h (1)
  • flb_calloc (84-96)
src/flb_parquet_impl.cpp (1)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
plugins/out_s3/s3.h (1)
plugins/out_s3/s3.c (4)
  • s3_format_chunk (1196-1337)
  • s3_upload_file (1015-1120)
  • s3_initiate_multipart_upload (898-1013)
  • s3_plugin_under_test (73-79)
tests/internal/fuzzers/aws_util_fuzzer.c (1)
src/aws/flb_aws_util.c (1)
  • flb_get_s3_key (1262-1344)
plugins/out_s3/s3_multipart.h (1)
plugins/out_s3/s3_multipart.c (13)
  • flb_s3_calculate_optimal_part_size (164-216)
  • s3_multipart_get_md5_base64 (218-248)
  • s3_multipart_create_headers (357-417)
  • s3_multipart_upload_new (450-499)
  • s3_multipart_upload_destroy (419-444)
  • s3_multipart_initiate (501-558)
  • s3_multipart_upload_part (560-652)
  • s3_multipart_complete (654-735)
  • s3_multipart_abort (737-799)
  • s3_multipart_abort_with_url (1021-1042)
  • s3_multipart_check_upload_exists (805-879)
  • s3_multipart_upload_file_part (893-1015)
  • s3_multipart_upload_file (1202-1293)
tests/internal/aws_util.c (1)
src/aws/flb_aws_util.c (1)
  • flb_get_s3_key (1262-1344)
include/fluent-bit/flb_blob_db.h (1)
src/flb_blob_db.c (18)
  • flb_blob_file_update_s3_key (619-650)
  • flb_blob_file_update_s3_key (2053-2058)
  • flb_blob_db_file_get_next_aborted (822-953)
  • flb_blob_db_file_get_next_aborted (1934-1945)
  • flb_blob_db_file_fetch_part_ids (1479-1553)
  • flb_blob_db_file_fetch_part_ids (2026-2033)
  • flb_blob_db_file_fetch_part_count (1555-1584)
  • flb_blob_db_file_fetch_part_count (2035-2039)
  • flb_blob_db_file_fetch_all_parts (1586-1672)
  • flb_blob_db_file_fetch_all_parts (2060-2069)
  • flb_blob_db_file_get_next_pending (1678-1806)
  • flb_blob_db_file_get_next_pending (2041-2051)
  • flb_blob_db_file_part_check_uploaded (1812-1847)
  • flb_blob_db_file_part_check_uploaded (2071-2076)
  • flb_blob_db_file_parts_in_progress (1098-1133)
  • flb_blob_db_file_parts_in_progress (2078-2083)
  • flb_blob_db_reset_zombie_parts (1850-1874)
  • flb_blob_db_reset_zombie_parts (2085-2088)
src/flb_blob_db.c (3)
src/flb_sqldb.c (2)
  • flb_sqldb_lock (173-178)
  • flb_sqldb_unlock (180-185)
lib/cfl/src/cfl_sds.c (2)
  • cfl_sds_create (113-125)
  • cfl_sds_destroy (127-137)
include/fluent-bit/flb_mem.h (2)
  • flb_calloc (84-96)
  • flb_free (126-128)
tests/include/s3_test_helpers.h (2)
plugins/out_s3/s3.c (1)
  • setenv (66-69)
tests/include/aws_client_mock.c (1)
  • flb_aws_client_mock_generator_count_unused_requests (133-137)

@kalavt kalavt force-pushed the feature/aws-s3-parquet branch from b5a7522 to 56d51f6 Compare January 13, 2026 15:33
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/aws/flb_aws_util.c (1)

1011-1340: $FILE_NAME/$FILE_PATH handling should be Windows-aware (\ separator + leading \).
Right now stripping and filename extraction only consider /. On Windows log paths, $FILE_NAME will likely not work, and $FILE_PATH may keep leading backslashes.

Suggested direction
@@
-    while (*valid_file_path == '.' || *valid_file_path == '/') {
+    while (*valid_file_path == '.' || *valid_file_path == '/' || *valid_file_path == '\\') {
         valid_file_path++;
     }
@@
-        file_name = strrchr(valid_file_path, '/');
+        file_name = strrchr(valid_file_path, '/');
+        if (!file_name) {
+            file_name = strrchr(valid_file_path, '\\');
+        }
🤖 Fix all issues with AI agents
In @plugins/out_s3/s3_queue.h:
- Line 26: The header declares s3_queue_remove as void but the implementation in
s3_queue.c defines int s3_queue_remove, causing a type mismatch; update the
declaration in s3_queue.h to match the implementation's return type (change void
to int for s3_queue_remove(struct flb_s3 *ctx, struct upload_queue *entry)) or
alternatively change the implementation to return void—ensure both the prototype
in s3_queue.h and the function signature in s3_queue.c for s3_queue_remove use
the same return type.

In @src/aws/flb_aws_util.c:
- Around line 851-1009: The offset parsing in process_timezone_patterns uses
atoi and then multiplies offset_hours by 3600 which can overflow; replace this
with strtol parsing of bracket_start..bracket_end, verify the entire offset
substring was consumed, check for errors/ERANGE, then clamp the parsed long to a
safe range (e.g., -48..+48 or -24..+24 per policy) before converting to an
integer and computing seconds; update variables offset_str/offset_hours to use
the validated value and on parse error fall back to the existing invalid-pattern
behavior (logging and appending the raw pattern) to avoid UB.

In @src/flb_parquet_impl.cpp:
- Around line 511-610: parse_schema_from_json currently silently coerces unknown
type names to arrow::utf8() via TYPE_FACTORY_MAP fallback, allowing typos like
"int" to produce incorrect schemas; change this to treat unknown types as errors
by checking TYPE_FACTORY_MAP.find(type_name) and if not found return
arrow::Status::Invalid with a clear message including the unknown type name
instead of assigning arrow::utf8(), and apply the same change to the analogous
parsing logic located in the other block (lines ~853-903) so both parsing paths
fail on unrecognized types rather than defaulting to utf8.
- Around line 913-1239: The error cleanup path currently unlinks output_file
without first closing the parquet::arrow::FileWriter (writer) and the
arrow::io::OutputStream (output_stream); update the catch/cleanup section to
check if writer and output_stream are non-null and call writer->Close() and
output_stream->Close() (logging any errors) before calling unlink(output_file),
mirroring the successful-path close logic (use close_status/stream_close_status
checks and flb_error messages) so Windows can release file handles prior to
unlinking.

In @tests/runtime/out_s3_blob.c:
- Around line 136-155: The immediate stat(db_path) after flb_start(ctx) can race
with async DB creation; wrap the stat check in a short retry loop (e.g., up to a
few attempts with small sleeps/usleep) before invoking TEST_CHECK to avoid
flakes: after calling flb_start(ctx) repeatedly call stat(db_path, &st) until it
returns 0 or the retry limit is reached, then use TEST_CHECK(ret == 0) and keep
the existing TEST_MSG; reference the existing flb_start, stat, db_path, and
TEST_CHECK symbols when implementing the loop.

In @tests/runtime/out_s3_error_handling.c:
- Around line 6-7: The file currently includes POSIX-only headers `<sys/stat.h>`
and `<unistd.h>` which break Windows builds; replace these direct includes by
either using the cross-platform helpers/macros provided in `s3_test_helpers.h`
(import that header and use its macros/functions instead of direct POSIX calls)
or wrap the existing includes and any usages in `#ifndef _WIN32` / `#endif`
guards so Windows compiles without them; update any code that relied on
POSIX-only functions to use the helper abstractions or guarded alternatives
(look for the include lines and usages in
tests/runtime/out_s3_error_handling.c).

In @tests/runtime/out_s3_multipart.c:
- Around line 438-493: The loop calling flb_lib_push in
flb_test_multiple_data_pushes currently ignores its return value; update the
loop to capture the result of flb_lib_push and assert/validate it (e.g., using
TEST_CHECK or equivalent) each iteration, so any failed push is detected
immediately (use the existing JSON_TD and S3_TEST_SLEEP_MS calls unchanged); if
a push fails, fail the test or break the loop to avoid confusing "mock not
exhausted" outcomes.
🧹 Nitpick comments (25)
tests/runtime/out_s3_edge_cases.c (6)

8-11: Including .c file directly may cause linker issues.

Line 9 includes aws_client_mock.c directly. This is unconventional and can cause duplicate symbol errors if this translation unit is linked with others that also include or compile that .c file. Consider compiling aws_client_mock.c separately and linking it, or ensure the included .c uses static linkage for all symbols.

 #include "../include/aws_client_mock.h"
-#include "../include/aws_client_mock.c"
 #include "../include/s3_test_helpers.h"

Then ensure aws_client_mock.c is added to the test build target in CMakeLists.


88-93: Redundant assertion — ret is unchanged since the earlier check.

ret was assigned at line 81 and verified at line 82. By line 93, ret hasn't been modified, making this second TEST_CHECK(ret == 0) redundant. The same pattern appears in nearly all other test functions.

If the intent is to confirm the test reached the cleanup phase without crashing, a simple comment suffices — or assert something meaningful like verifying the mock chain was fully consumed.

♻️ Suggested improvement
-    /* Verify no crash occurred - reaching here is the test */
-    TEST_CHECK(ret == 0);
+    /* Test passed: no crash occurred during empty flush */

170-180: Consider asserting that the mock chain was actually consumed.

The test pushes data and waits via S3_TEST_WAIT_MOCK_EXHAUSTED, but the only assertion is the reused ret == 0 from startup. Adding an explicit assertion that the expected HTTP calls occurred would strengthen the test.

For example, if the mock framework exposes a call count or exhaustion status, assert on it:

/* After S3_TEST_WAIT_MOCK_EXHAUSTED */
TEST_CHECK(flb_aws_client_mock_is_exhausted() == FLB_TRUE);

246-251: Test claims "max concurrent uploads" but doesn't stress concurrency.

The test sets "workers", "10" and pushes a single JSON_TD record. This configuration verifies the plugin accepts the setting, but doesn't exercise concurrent upload behavior. Consider pushing multiple distinct data batches rapidly or adding commentary that this is a configuration-acceptance test only.


836-848: "Maximum" boundary test uses modest values, not actual S3 maximums.

S3 supports up to 10,000 parts per multipart upload and 5 GB per part. Using "total_file_size", "100M" and "upload_chunk_size", "50M" tests acceptance of larger-than-default values but doesn't approach true limits. Consider renaming to "config_boundary_large" or documenting that actual S3 maximums aren't tested here.


957-1099: flb_test_rapid_start_stop effectively tests two sequential configurations, not rapid cycling.

The test runs two complete start/stop cycles with full cleanup between them. A true rapid start/stop stress test would loop many times (e.g., 10–50 iterations) with minimal delay. Consider renaming to flb_test_sequential_config_change or adding a loop for actual rapid cycling if that's the intent.

plugins/out_s3/s3_auth.h (1)

42-47: Consider using s3_presigned_url_type_t for type safety.

The url_type parameter is declared as int but accepts values from the s3_presigned_url_type_t enum defined above. Using the enum type directly would improve type safety and self-documentation.

Suggested fix
 /* Unified presigned URL fetcher - works for both standard and blob uploads */
 int s3_auth_fetch_presigned_url(struct flb_s3 *ctx,
                                  flb_sds_t *result_url,
-                                 int url_type,
+                                 s3_presigned_url_type_t url_type,
                                  const char *s3_key,
                                  const char *upload_id,
                                  int part_number);
tests/include/s3_test_helpers.h (3)

5-6: Copyright year inconsistency.

The copyright year here is 2015-2024, but other files in this PR (e.g., s3_auth.h) use 2015-2026. Consider aligning for consistency.


417-430: Potential buffer overflow with long paths.

The buffer size of 300 bytes may be insufficient when db_path is already close to PATH_MAX (typically 4096). If db_path is near the limit, appending -journal (8 chars) could cause truncation.

Suggested fix
 static inline void s3_test_cleanup_temp_db(const char *db_path)
 {
-    char buf[300];
+    char buf[PATH_MAX];
     if (!db_path) {
         return;
     }

464-480: External function dependency not declared in this header.

flb_aws_client_mock_generator_count_unused_requests() is called but not declared in this header. This relies on the including translation unit to also include aws_client_mock.h or similar.

Consider adding a forward declaration or documenting this dependency in the header comment.

tests/runtime/out_s3_error_handling.c (1)

9-10: Including a .c file directly is unconventional.

Including aws_client_mock.c directly compiles the mock implementation into this translation unit. While this pattern works, it's unusual and can cause issues with multiple definitions if the same .c file is included elsewhere. Consider linking the mock as a separate compilation unit if not already handled by the build system.

plugins/out_s3/s3_store.c (1)

511-519: Silent early return may mask data corruption.

When fsf->data != s3_file, the function returns 0 (success) without any logging. This could hide bugs where the fsf->data pointer was corrupted or incorrectly reassigned. Consider adding a warning log.

Suggested fix
     if (fsf->data != s3_file) {
+        flb_plg_warn(ctx->ins, "s3_file mismatch: fsf->data=%p, s3_file=%p",
+                     fsf->data, s3_file);
         return 0;
     }
plugins/out_s3/s3.h (1)

82-82: Consider documenting the etags array sizing.

The etags[10000] array matches S3_AWS_MAX_PARTS (10000), which aligns with AWS S3's maximum parts per multipart upload. This consumes ~80KB per active upload (10000 × 8 bytes). Consider adding a brief comment linking this to the constraint.

Suggested documentation
-    flb_sds_t etags[10000];
+    flb_sds_t etags[10000];  /* Sized to match S3_AWS_MAX_PARTS */
plugins/out_s3/s3_auth.c (2)

155-171: Consider checking return values from header functions.

The function ignores return values from flb_http_add_header, flb_http_basic_auth, and flb_http_bearer_auth. While failures are rare, propagating errors could aid debugging.


230-230: Consider reducing log verbosity for routine operations.

Logging at info level for every successful presigned URL retrieval may be noisy in production with frequent uploads. Consider using debug level instead.

Suggested change
-    flb_plg_info(ctx->ins, "Pre-signed URL retrieved successfully");
+    flb_plg_debug(ctx->ins, "Pre-signed URL retrieved successfully");
plugins/out_s3/s3_blob.c (1)

429-435: Minor indentation inconsistency.

The flb_plg_error call at line 430 appears to have inconsistent indentation with the surrounding code block.

Suggested fix
         if (ret == -1) {
-        flb_plg_error(ctx->ins, "cannot insert blob file: %s (size=%zu)",
-                      file_path, file_size);
+            flb_plg_error(ctx->ins, "cannot insert blob file: %s (size=%zu)",
+                          file_path, file_size);
             cfl_sds_destroy(file_path);
             cfl_sds_destroy(source);
             continue;
tests/runtime/out_s3_multipart.c (1)

17-507: Tests are useful, but many only assert “mock exhausted”; add at least one state assertion for failure paths.
E.g., for create/upload/complete failure tests, consider asserting expected blob-db state (aborted flag / delivery_attempts / part_count) or store_dir cleanup, otherwise regressions could still “exhaust the mock” but leave the system in a bad state.

tests/runtime/out_s3_blob.c (1)

157-570: Delivery-attempt / freshness / chunk-size tests currently only validate parsing; consider adding one behavioral assertion.
A small follow-up could force a specific failure (e.g., UploadPart 500) and then assert delivery_attempts increments / aborted transitions / retry selection from DB. That would make these tests regression-resistant.

src/aws/flb_aws_util.c (2)

176-231: Credential-refresh retry is good, but ensure no retry-storm interactions with retry_requests.
Now a request can be attempted: initial → auto-retry (if retry_requests) → refresh+retry (for auth error). That might be desired, but please confirm it won’t amplify load in high-error scenarios.


1435-1477: flb_aws_uri_encode_path() is correct in spirit, but per-char flb_sds_printf() is expensive.
Consider a small hex-lookup + direct flb_sds_cat() of 3 bytes for encoded chars to avoid repeated vsnprintf calls.

src/flb_parquet_impl.cpp (1)

644-851: validate_parquet_file() float/double comparisons via std::to_string can be flaky.
If tests compare textual float values, minor representation differences can fail across platforms/libstdc++ versions. Consider numeric comparison with tolerance when expected_type is float/double.

tests/internal/aws_util.c (1)

47-77: Add a Windows-path test case (C:\... or \var\log\...) for $FILE_NAME extraction.
Given current implementation only searches for /, this is likely to regress on Windows; having an internal unit test will lock in expected behavior once fixed.

Also applies to: 402-533

tests/runtime/out_s3_format.c (1)

19-902: Consider asserting at least one content property (compressed magic / extracted key-only payload) instead of only “upload happened”.
Right now, most cases would pass as long as the multipart sequence occurs, even if the payload is wrong.

include/fluent-bit/flb_blob_db.h (1)

208-219: SQL_GET_NEXT_PENDING_FILE uses both DISTINCT and GROUP BY; one is likely redundant.
Not a correctness issue here (blob_files columns are functionally dependent on f.id), but simplifying would reduce confusion.

src/flb_blob_db.c (1)

1849-1874: Consider using a prepared statement for consistency.

The function uses sqlite3_exec with a raw SQL string, which differs from the prepared statement pattern used elsewhere in this file. While functionally correct and acceptable for a startup/recovery operation, a prepared statement would provide:

  • Consistency with the rest of the codebase
  • Marginally better performance if called multiple times

This is a minor suggestion and not blocking.

♻️ Optional: Use prepared statement

Add to prepare_stmts():

/* reset zombie parts */
result = sqlite3_prepare_v2(context->db->handler,
                            "UPDATE blob_parts SET in_progress = 0 "
                            "WHERE uploaded = 0 AND in_progress = 1", -1,
                            &context->stmt_reset_zombie_parts,
                            NULL);
if (result != SQLITE_OK) {
    return FLB_BLOB_DB_ERROR_PREPARING_STATEMENT_RESET_ZOMBIE_PARTS;
}

Then update the function:

 int flb_blob_db_reset_zombie_parts(struct flb_blob_db *context)
 {
-    const char *sql;
+    sqlite3_stmt *statement;
     int result;

     if (!context || !context->db) {
         return FLB_BLOB_DB_ERROR_INVALID_BLOB_DB_CONTEXT;
     }

-    sql = "UPDATE blob_parts SET in_progress = 0 "
-          "WHERE uploaded = 0 AND in_progress = 1";
+    statement = context->stmt_reset_zombie_parts;

     flb_sqldb_lock(context->db);

-    result = sqlite3_exec(context->db->handler, sql, NULL, NULL, NULL);
+    result = sqlite3_step(statement);
+    sqlite3_reset(statement);

     flb_sqldb_unlock(context->db);

-    if (result != SQLITE_OK) {
+    if (result != SQLITE_DONE) {
         context->last_error = result;
         return -1;
     }

     return FLB_BLOB_DB_SUCCESS;
 }
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b5a7522 and 56d51f6.

📒 Files selected for processing (47)
  • include/fluent-bit/aws/flb_aws_compress.h
  • include/fluent-bit/flb_aws_util.h
  • include/fluent-bit/flb_blob_db.h
  • include/fluent-bit/flb_parquet.h
  • include/fluent-bit/flb_snappy.h
  • plugins/in_blob/blob_file.c
  • plugins/out_kinesis_firehose/firehose.c
  • plugins/out_s3/CMakeLists.txt
  • plugins/out_s3/s3.c
  • plugins/out_s3/s3.h
  • plugins/out_s3/s3_auth.c
  • plugins/out_s3/s3_auth.h
  • plugins/out_s3/s3_blob.c
  • plugins/out_s3/s3_blob.h
  • plugins/out_s3/s3_multipart.c
  • plugins/out_s3/s3_multipart.h
  • plugins/out_s3/s3_queue.c
  • plugins/out_s3/s3_queue.h
  • plugins/out_s3/s3_store.c
  • plugins/out_s3/s3_store.h
  • plugins/out_s3/s3_stream.c
  • plugins/out_s3/s3_stream.h
  • src/aws/flb_aws_compress.c
  • src/aws/flb_aws_util.c
  • src/flb_blob_db.c
  • src/flb_parquet_impl.cpp
  • src/flb_signv4.c
  • src/flb_snappy.c
  • tests/include/aws_client_mock.c
  • tests/include/aws_client_mock.h
  • tests/include/s3_test_helpers.h
  • tests/internal/CMakeLists.txt
  • tests/internal/aws_compress.c
  • tests/internal/aws_util.c
  • tests/internal/fuzzers/aws_util_fuzzer.c
  • tests/internal/parquet.c
  • tests/runtime/CMakeLists.txt
  • tests/runtime/out_s3.c
  • tests/runtime/out_s3_blob.c
  • tests/runtime/out_s3_config.c
  • tests/runtime/out_s3_edge_cases.c
  • tests/runtime/out_s3_error_handling.c
  • tests/runtime/out_s3_format.c
  • tests/runtime/out_s3_multipart.c
  • tests/runtime/out_s3_queue.c
  • tests/runtime/out_s3_recovery.c
  • tests/runtime/out_s3_special_features.c
💤 Files with no reviewable changes (1)
  • tests/runtime/out_s3.c
✅ Files skipped from review due to trivial changes (1)
  • tests/runtime/out_s3_config.c
🚧 Files skipped from review as they are similar to previous changes (9)
  • tests/internal/CMakeLists.txt
  • plugins/in_blob/blob_file.c
  • tests/internal/fuzzers/aws_util_fuzzer.c
  • plugins/out_s3/s3_stream.h
  • tests/include/aws_client_mock.h
  • src/flb_signv4.c
  • tests/runtime/CMakeLists.txt
  • plugins/out_s3/CMakeLists.txt
  • plugins/out_kinesis_firehose/firehose.c
👮 Files not reviewed due to content moderation or server errors (7)
  • plugins/out_s3/s3_stream.c
  • plugins/out_s3/s3_queue.c
  • tests/runtime/out_s3_queue.c
  • plugins/out_s3/s3_multipart.h
  • plugins/out_s3/s3_multipart.c
  • include/fluent-bit/flb_parquet.h
  • tests/internal/parquet.c
🧰 Additional context used
🧠 Learnings (15)
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.

Applied to files:

  • src/aws/flb_aws_compress.c
  • src/flb_parquet_impl.cpp
  • include/fluent-bit/aws/flb_aws_compress.h
  • src/flb_snappy.c
  • tests/internal/aws_compress.c
  • tests/internal/parquet.c
  • include/fluent-bit/flb_snappy.h
  • plugins/out_s3/s3_stream.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

  • src/aws/flb_aws_compress.c
  • include/fluent-bit/aws/flb_aws_compress.h
  • src/flb_snappy.c
  • tests/internal/aws_compress.c
  • tests/internal/parquet.c
  • include/fluent-bit/flb_snappy.h
  • plugins/out_s3/s3_stream.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

  • src/aws/flb_aws_compress.c
  • src/flb_parquet_impl.cpp
  • include/fluent-bit/aws/flb_aws_compress.h
  • tests/internal/aws_compress.c
  • tests/internal/parquet.c
  • include/fluent-bit/flb_snappy.h
  • plugins/out_s3/s3_stream.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.

Applied to files:

  • src/aws/flb_aws_compress.c
  • include/fluent-bit/aws/flb_aws_compress.h
  • src/flb_snappy.c
  • tests/internal/aws_compress.c
  • tests/internal/parquet.c
  • include/fluent-bit/flb_snappy.h
  • plugins/out_s3/s3_stream.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.

Applied to files:

  • src/aws/flb_aws_compress.c
  • include/fluent-bit/aws/flb_aws_compress.h
  • src/flb_snappy.c
  • tests/internal/aws_compress.c
  • tests/internal/parquet.c
  • include/fluent-bit/flb_snappy.h
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.

Applied to files:

  • src/aws/flb_aws_compress.c
  • src/flb_parquet_impl.cpp
  • tests/internal/aws_compress.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

  • src/flb_parquet_impl.cpp
  • include/fluent-bit/flb_parquet.h
  • tests/internal/aws_compress.c
  • tests/include/s3_test_helpers.h
  • plugins/out_s3/s3_store.c
📚 Learning: 2025-08-07T10:15:46.187Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10691
File: .github/workflows/unit-tests.yaml:94-101
Timestamp: 2025-08-07T10:15:46.187Z
Learning: Apache Arrow latest packages have changing checksums, making hardcoded checksum verification impractical for CI workflows that download the latest version.

Applied to files:

  • src/flb_parquet_impl.cpp
📚 Learning: 2025-09-04T07:28:37.083Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10832
File: src/wasm/CMakeLists.txt:112-131
Timestamp: 2025-09-04T07:28:37.083Z
Learning: In fluent-bit CMake files, the user cosmo0920 prefers treating Git as a command rather than a package, emphasizing that Git is not a pkg-config retrievable package but just a command.

Applied to files:

  • src/flb_parquet_impl.cpp
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.

Applied to files:

  • src/flb_parquet_impl.cpp
  • tests/internal/aws_compress.c
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.

Applied to files:

  • include/fluent-bit/flb_aws_util.h
  • include/fluent-bit/flb_snappy.h
📚 Learning: 2025-09-04T12:35:22.872Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:1339-1344
Timestamp: 2025-09-04T12:35:22.872Z
Learning: In the Fluent Bit S3 plugin, the user prefers to maintain current retry_limit behavior without special handling for FLB_OUT_RETRY_UNLIMITED (-1), as there's no documentation indicating -1 should be used for infinite retries and consistency with current logic is preferred.

Applied to files:

  • plugins/out_s3/s3_auth.c
  • plugins/out_s3/s3_queue.c
  • plugins/out_s3/s3_multipart.c
📚 Learning: 2025-12-17T18:45:02.283Z
Learnt from: edsiper
Repo: fluent/fluent-bit PR: 11286
File: src/flb_http_client.c:1657-1715
Timestamp: 2025-12-17T18:45:02.283Z
Learning: In the Fluent Bit OAuth2 implementation (src/flb_oauth2.c), the function flb_oauth2_get_access_token returns a pointer to the cached token stored in the OAuth2 context (ctx->access_token), not a new allocation. The OAuth2 context manages the token lifecycle, so callers must not free the returned token pointer.

Applied to files:

  • plugins/out_s3/s3_auth.c
  • plugins/out_s3/s3.h
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.

Applied to files:

  • src/aws/flb_aws_util.c
  • plugins/out_s3/s3_stream.c
📚 Learning: 2025-09-04T12:35:36.904Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:3275-3282
Timestamp: 2025-09-04T12:35:36.904Z
Learning: The out_s3 plugin intentionally uses a simple numeric comparison for retry_limit (chunk->failures >= ctx->ins->retry_limit) rather than the standard Fluent Bit pattern that checks for FLB_OUT_RETRY_UNLIMITED (-1). The maintainer wants to keep this current behavior for consistency within the plugin.

Applied to files:

  • plugins/out_s3/s3_queue.c
  • plugins/out_s3/s3_multipart.c
  • plugins/out_s3/s3_stream.c
🧬 Code graph analysis (20)
plugins/out_s3/s3_auth.h (1)
plugins/out_s3/s3_auth.c (3)
  • s3_auth_init_endpoint (79-128)
  • s3_auth_request_presigned_url (173-260)
  • s3_auth_fetch_presigned_url (359-402)
plugins/out_s3/s3_store.h (1)
plugins/out_s3/s3_store.c (1)
  • flb_chunk_file_open (534-570)
src/aws/flb_aws_compress.c (3)
src/flb_snappy.c (1)
  • flb_snappy_compress_framed_data (134-255)
src/flb_gzip.c (1)
  • flb_gzip_compress (157-252)
src/flb_zstd.c (1)
  • flb_zstd_compress (33-57)
src/flb_parquet_impl.cpp (1)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
tests/internal/aws_util.c (1)
src/aws/flb_aws_util.c (1)
  • flb_get_s3_key (1262-1340)
include/fluent-bit/flb_parquet.h (1)
src/flb_parquet_impl.cpp (8)
  • flb_parquet_validate_schema (853-876)
  • flb_parquet_validate_schema (853-855)
  • flb_parquet_schema_create (878-903)
  • flb_parquet_schema_create (878-880)
  • flb_parquet_schema_destroy (905-911)
  • flb_parquet_schema_destroy (905-905)
  • flb_msgpack_to_parquet_streaming (913-1239)
  • flb_msgpack_to_parquet_streaming (913-918)
plugins/out_s3/s3_auth.c (7)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
src/flb_utils.c (1)
  • flb_utils_url_split (1531-1667)
src/tls/flb_tls.c (1)
  • flb_tls_create (183-232)
src/flb_upstream.c (2)
  • flb_upstream_create_url (416-481)
  • flb_upstream_conn_get (711-844)
src/flb_http_client.c (5)
  • flb_http_add_header (1015-1047)
  • flb_http_basic_auth (1394-1398)
  • flb_http_bearer_auth (1406-1441)
  • flb_http_client (866-911)
  • flb_http_do (1645-1705)
src/flb_sds.c (5)
  • flb_sds_copy (260-281)
  • flb_sds_create_len (58-76)
  • flb_sds_destroy (389-399)
  • flb_sds_create_size (92-95)
  • flb_sds_printf (336-387)
src/aws/flb_aws_util.c (1)
  • flb_aws_uri_encode_path (1441-1477)
src/aws/flb_aws_util.c (3)
src/flb_http_client.c (1)
  • flb_http_client_destroy (1821-1828)
src/flb_sds.c (4)
  • flb_sds_create_size (92-95)
  • flb_sds_cat (120-141)
  • flb_sds_create_len (58-76)
  • flb_sds_printf (336-387)
include/fluent-bit/flb_compat.h (1)
  • gmtime_r (76-82)
tests/internal/aws_compress.c (3)
src/flb_snappy.c (2)
  • flb_snappy_compress_framed_data (134-255)
  • flb_snappy_uncompress_framed_data (257-481)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
src/aws/flb_aws_compress.c (1)
  • flb_aws_compression_get_type (87-108)
plugins/out_s3/s3_queue.c (9)
include/fluent-bit/flb_mem.h (2)
  • flb_calloc (84-96)
  • flb_free (126-128)
plugins/out_s3/s3_store.c (3)
  • s3_store_buffer_put (207-242)
  • s3_store_file_unlock (579-584)
  • s3_store_file_lock (572-577)
plugins/out_s3/s3_auth.c (1)
  • s3_auth_fetch_presigned_url (359-402)
plugins/out_s3/s3_multipart.c (1)
  • s3_multipart_upload_file_part (893-1015)
src/flb_blob_db.c (2)
  • flb_blob_db_file_part_update_remote_id (1258-1289)
  • flb_blob_file_update_remote_id (586-617)
plugins/out_s3/s3.c (1)
  • s3_format_chunk (1196-1337)
lib/cfl/src/cfl_sds.c (2)
  • cfl_sds_len (88-91)
  • cfl_sds_destroy (127-137)
plugins/out_s3/s3_blob.c (1)
  • s3_blob_notify_delivery (77-114)
src/aws/flb_aws_util.c (1)
  • flb_get_s3_key (1262-1340)
include/fluent-bit/flb_blob_db.h (1)
src/flb_blob_db.c (12)
  • flb_blob_file_update_s3_key (619-650)
  • flb_blob_file_update_s3_key (2053-2058)
  • flb_blob_db_file_get_next_aborted (822-953)
  • flb_blob_db_file_get_next_aborted (1934-1945)
  • flb_blob_db_file_fetch_part_count (1555-1584)
  • flb_blob_db_file_fetch_part_count (2035-2039)
  • flb_blob_db_file_fetch_all_parts (1586-1672)
  • flb_blob_db_file_fetch_all_parts (2060-2069)
  • flb_blob_db_file_get_next_pending (1678-1806)
  • flb_blob_db_file_get_next_pending (2041-2051)
  • flb_blob_db_reset_zombie_parts (1850-1874)
  • flb_blob_db_reset_zombie_parts (2085-2088)
tests/internal/parquet.c (3)
src/flb_parquet_impl.cpp (1)
  • field_name (548-548)
src/flb_pack.c (1)
  • flb_pack_json (530-535)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
plugins/out_s3/s3.h (1)
plugins/out_s3/s3.c (5)
  • cb_s3_upload (1340-1607)
  • s3_format_chunk (1196-1337)
  • s3_upload_file (1015-1120)
  • s3_initiate_multipart_upload (898-1013)
  • s3_plugin_under_test (73-79)
tests/include/s3_test_helpers.h (2)
plugins/out_s3/s3.c (1)
  • setenv (66-69)
tests/include/aws_client_mock.c (1)
  • flb_aws_client_mock_generator_count_unused_requests (143-147)
include/fluent-bit/flb_snappy.h (1)
src/flb_snappy.c (1)
  • flb_snappy_compress_framed_data (134-255)
plugins/out_s3/s3_multipart.h (1)
plugins/out_s3/s3_multipart.c (13)
  • flb_s3_calculate_optimal_part_size (164-216)
  • s3_multipart_get_md5_base64 (218-248)
  • s3_multipart_create_headers (357-417)
  • s3_multipart_upload_new (450-499)
  • s3_multipart_upload_destroy (419-444)
  • s3_multipart_initiate (501-558)
  • s3_multipart_upload_part (560-652)
  • s3_multipart_complete (654-735)
  • s3_multipart_abort (737-799)
  • s3_multipart_abort_with_url (1021-1042)
  • s3_multipart_check_upload_exists (805-879)
  • s3_multipart_upload_file_part (893-1015)
  • s3_multipart_upload_file (1202-1293)
tests/runtime/out_s3_queue.c (4)
tests/include/s3_test_helpers.h (4)
  • s3_test_next (438-446)
  • s3_test_create_temp_store_dir (204-257)
  • s3_test_cleanup_temp_db (417-430)
  • s3_test_cleanup_temp_store_dir (384-409)
tests/include/aws_client_mock.c (3)
  • flb_aws_client_mock_configure_generator (42-47)
  • flb_aws_client_get_mock_generator (137-140)
  • flb_aws_client_mock_clear_generator_instance (71-74)
src/flb_lib.c (6)
  • flb_create (143-225)
  • flb_service_set (652-678)
  • flb_input (266-276)
  • flb_lib_push (843-870)
  • flb_stop (1011-1055)
  • flb_destroy (228-263)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
plugins/out_s3/s3_store.c (4)
src/flb_sds.c (4)
  • flb_sds_create_size (92-95)
  • flb_sds_printf (336-387)
  • flb_sds_destroy (389-399)
  • flb_sds_create (78-90)
src/flb_fstore.c (8)
  • flb_fstore_file_delete (259-277)
  • flb_fstore_file_create (153-190)
  • flb_fstore_file_meta_set (74-108)
  • flb_fstore_file_append (298-329)
  • flb_fstore_create (479-525)
  • flb_fstore_destroy (527-564)
  • flb_fstore_stream_create (335-402)
  • flb_fstore_file_inactive (238-256)
include/fluent-bit/flb_mem.h (2)
  • flb_free (126-128)
  • flb_calloc (84-96)
include/fluent-bit/flb_compat.h (1)
  • gmtime_r (76-82)
src/flb_blob_db.c (3)
src/flb_sqldb.c (3)
  • flb_sqldb_close (129-149)
  • flb_sqldb_lock (173-178)
  • flb_sqldb_unlock (180-185)
lib/cfl/src/cfl_sds.c (2)
  • cfl_sds_create (113-125)
  • cfl_sds_destroy (127-137)
include/fluent-bit/flb_mem.h (2)
  • flb_calloc (84-96)
  • flb_free (126-128)
plugins/out_s3/s3_queue.h (1)
plugins/out_s3/s3_queue.c (11)
  • s3_queue_remove (307-315)
  • s3_queue_buffer_chunk (355-377)
  • s3_queue_add_file (116-130)
  • s3_queue_add_part (132-202)
  • s3_queue_add_pending_file (259-272)
  • s3_queue_add_file_unlocked (58-111)
  • s3_queue_add_pending_file_unlocked (208-254)
  • s3_queue_process_entry (747-806)
  • s3_queue_recover_all (849-894)
  • s3_queue_recover_from_database (987-1075)
  • s3_queue_recover_from_fstore (1140-1200)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cmake/kafka.cmake (1)

37-55: Non-Windows OAuth Bearer is forced ON without checking TLS; this violates Kafka security best practices and the misleading comment.

Kafka's OAuth Bearer should be used over TLS (SASL_SSL) in production—tokens must be protected in transit. The current code forces FLB_SASL_OAUTHBEARER_ENABLED ON on non-Windows without checking FLB_TLS, and the comment at line 40–42 (claiming "needs both SSL and Cyrus SASL") contradicts the actual code behavior. Additionally, FLB_KAFKA_MSK_IAM_ENABLED is unconditionally set to ON, even if OAUTHBEARER is disabled.

Gate OAUTHBEARER and MSK IAM on TLS availability to align with security guidance:

Proposed patch
else()
-  # Non-Windows platforms: OAuth Bearer is built-in, always enabled
-  set(FLB_SASL_OAUTHBEARER_ENABLED ON)
+  # Non-Windows platforms: OAuth Bearer requires TLS for secure token transport
+  if(FLB_TLS)
+    set(FLB_SASL_OAUTHBEARER_ENABLED ON)
+  else()
+    set(FLB_SASL_OAUTHBEARER_ENABLED OFF)
+  endif()
 endif()
 
-# MSK IAM requires OAuth Bearer support (which is always available now)
-set(FLB_KAFKA_MSK_IAM_ENABLED ON)
+# MSK IAM requires OAuth Bearer support
+set(FLB_KAFKA_MSK_IAM_ENABLED ${FLB_SASL_OAUTHBEARER_ENABLED})
🤖 Fix all issues with AI agents
In `@include/fluent-bit/aws/flb_aws_compress.h`:
- Around line 29-30: The comment incorrectly cites "RFC 7093 Framing Format";
update the comment near flb_snappy_compress_framed_data() to reference the
correct Snappy framing specification (the Google Snappy "framing_format.txt" /
Snappy framing format) instead of RFC 7093, e.g., change the phrase to "Snappy
framing format (Google Snappy framing_format.txt)" and keep the note that
streaming/concatenation is handled via flb_snappy_compress_framed_data().

In `@plugins/out_s3/s3_store.c`:
- Around line 198-233: On append failure in s3_store_buffer_put (when
flb_fstore_file_append returns non-zero), delete/cleanup the partially created
buffer file to avoid leaving a stuck entry: call the existing cleanup function
(e.g., s3_store_file_delete(ctx, s3_file) or cleanup_failed_file(ctx, s3_file))
before returning -1, and ensure any associated resources are released; do not
adjust ctx->current_buffer_size (since size was not yet incremented), but ensure
s3_file->fsf and s3_file references are properly removed so the file is not
retried.

In `@src/flb_parquet_impl.cpp`:
- Around line 959-960: The outer variables output_stream
(std::shared_ptr<arrow::io::FileOutputStream>) and writer
(std::unique_ptr<parquet::arrow::FileWriter>) are being shadowed by local auto
declarations when assigning the results of output_stream_result.ValueOrDie() and
writer_result.ValueOrDie(), so the cleanup path still sees null outer variables
and never closes the handles; fix by removing the `auto` shadowing and assign
directly to the existing outer variables (e.g., output_stream =
std::move(output_stream_result).ValueOrDie(); and writer =
std::move(writer_result).ValueOrDie();) ensuring the types match, so the error
cleanup that calls writer->Close() and output_stream->Close() operates on the
actual objects.

In `@tests/runtime/out_s3_blob.c`:
- Around line 138-151: The test uses the POSIX-only usleep(100000) inside the
retry loop in out_s3_blob.c; replace that call with the cross-platform
S3_TEST_SLEEP_MS(100) macro (from s3_test_helpers.h) to sleep 100 ms, and ensure
s3_test_helpers.h is included at the top of the file if not already so the macro
is available.

In `@tests/runtime/out_s3_format.c`:
- Around line 17-264: Several tests call flb_lib_push() and ignore its return
value; capture the return from each flb_lib_push(...) call in
flb_test_json_format_basic, flb_test_json_with_date_key, and
flb_test_json_streaming_conversion and assert success using the existing test
helper (e.g., TEST_CHECK(push_ret == 0) or TEST_CHECK(push_ret >= 0) as
appropriate). Locate all flb_lib_push invocations in those functions and add a
local int (e.g., push_ret) to store the result, then add a corresponding
TEST_CHECK immediately after each call to fail the test on push errors.

In `@tests/runtime/out_s3_multipart.c`:
- Around line 125-174: The tests call flb_lib_push(...) but do not check its
return value, which can hide failures and lead to misleading "mock not
exhausted" errors; update each test (e.g., flb_test_upload_part_success and the
other test functions referenced) to capture the return from flb_lib_push into a
local int (e.g., ret or rc) and assert it succeeded via TEST_CHECK(ret == 0) (or
appropriate check) immediately after the call, ensuring every flb_lib_push
invocation is validated.
♻️ Duplicate comments (5)
tests/internal/parquet.c (1)

289-333: Temp file paths are collision-prone (/tmp, predictable names). This can break parallel test runs and leave stale artifacts.

Consider using mkstemp() (or a shared Fluent Bit test temp helper, if available) instead of hard-coded /tmp/flb_test_*.{msgpack,parquet}.

Proposed direction (make filenames unique)
@@
 static int init_test_context(test_context *ctx, const char *test_name)
 {
@@
-    snprintf(ctx->msgpack_file, sizeof(ctx->msgpack_file),
-             "/tmp/flb_test_%s.msgpack", test_name);
-    snprintf(ctx->parquet_file, sizeof(ctx->parquet_file),
-             "/tmp/flb_test_%s.parquet", test_name);
+    snprintf(ctx->msgpack_file, sizeof(ctx->msgpack_file),
+             "/tmp/flb_test_%s_%ld.msgpack", test_name, (long) getpid());
+    snprintf(ctx->parquet_file, sizeof(ctx->parquet_file),
+             "/tmp/flb_test_%s_%ld.parquet", test_name, (long) getpid());
@@
 }

Also applies to: 751-786, 1178-1222, 1424-1506, 1508-1603, 1605-1647, 1649-1704, 1706-1774, 1776-1870, 1872-1939, 1941-2016

tests/runtime/out_s3_multipart.c (1)

6-12: Avoid including aws_client_mock.c directly (possible duplicate symbols depending on test linking).

If these runtime tests are linked into a single executable, including the .c file in multiple test translation units will cause duplicate definitions. Prefer compiling aws_client_mock.c once and including only the header.

#!/bin/bash
# Verify whether aws_client_mock.c is being compiled as a separate TU and/or included by multiple tests.
rg -n --fixed-strings '#include "../include/aws_client_mock.c"' tests/ || true
rg -n "aws_client_mock\.c" -S tests/ CMakeLists.txt cmake/ || true
tests/runtime/out_s3_config.c (2)

6-12: Avoid including aws_client_mock.c directly (possible duplicate symbols depending on test linking).

#!/bin/bash
rg -n --fixed-strings '#include "../include/aws_client_mock.c"' tests/ || true
rg -n "add_executable|add_library|target_sources" tests/ CMakeLists.txt cmake/ -S | rg -n "out_s3_config|aws_client_mock" || true

18-83: These tests don’t validate s3_key_format/endpoint output without URI/path assertions.

Right now they mostly prove “multipart happened”, not that the configured key/endpoint was used. Add expect(URI, ...) / host assertions (as supported by the mock framework) so regressions in expansion don’t slip through.

Also applies to: 85-150, 152-230, 364-429

tests/runtime/out_s3_format.c (1)

8-12: Avoid including aws_client_mock.c directly (possible duplicate symbols depending on test linking).

#!/bin/bash
rg -n --fixed-strings '#include "../include/aws_client_mock.c"' tests/ || true
rg -n "aws_client_mock\.c" -S tests/ CMakeLists.txt cmake/ || true
🧹 Nitpick comments (10)
plugins/out_s3/s3_multipart.c (1)

36-65: Make sleep_ms() overflow-safe and type-appropriate across platforms.

On POSIX, usleep(milliseconds * 1000) can overflow int math; on Windows Sleep() takes DWORD. Since callers can grow milliseconds with retry settings, prefer uint32_t + a capped conversion (or use nanosleep on POSIX).

Proposed diff
-static inline void sleep_ms(int milliseconds) {
-    usleep(milliseconds * 1000);
-}
+static inline void sleep_ms(uint32_t milliseconds) {
+    /* Cap to avoid overflow and overly long sleeps */
+    if (milliseconds > 30000) {
+        milliseconds = 30000;
+    }
+    usleep((useconds_t) milliseconds * 1000u);
+}
plugins/out_s3/s3_auth.c (1)

155-171: Check return codes from header/auth helpers (silent auth/header failures).
flb_http_add_header, flb_http_basic_auth, and flb_http_bearer_auth return status; currently ignored. If any fail, you’ll send requests without expected headers/auth but still log “retrieved successfully” on 200s.

Consider making setup_http_client_headers() fail fast if any call returns -1.

Also applies to: 173-260

plugins/out_s3/s3_blob.c (1)

124-143: Avoid network calls while holding the blob DB lock.
s3_blob_recover_state() holds flb_blob_db_lock() (Line 132) while recover_stale_files() / handle_aborted_files() can call abort_multipart_upload() (which does HTTP). This can block DB operations for extended periods and risks lock contention during startup.

Prefer: fetch N rows + needed fields → unlock → perform abort/validation calls → re-lock to update state.

Also applies to: 145-343

tests/runtime/out_s3_queue.c (1)

7-8: Including .c file directly is unusual but acceptable for tests.

While including .c files directly can cause symbol conflicts in larger builds, this pattern is acceptable for isolated test files where the mock implementation needs to be compiled together with the test. Consider verifying that this doesn't cause issues if multiple test files are linked together.

If you encounter linker errors about duplicate symbols when running multiple S3 tests together, consider compiling aws_client_mock.c as a separate object file and linking it.

tests/internal/parquet.c (3)

69-136: ChunkIO header CRC fields are always zero; tests may get brittle if validation is enabled.

If ChunkIO validation/CRC checks get turned on for tests in the future, this helper could start failing unexpectedly. At minimum, consider documenting that CRC checks are disabled, or compute/write CRC32 for the “content section” to match the layout described in the comment.


251-286: Surface error_msg on schema parse failures to make test failures actionable.

Right now schema parse failures just return -1 with no context; printing/logging error_msg (or returning it to the caller) would significantly reduce debugging time.


350-2079: Cleanup isn’t guaranteed if TEST_CHECK aborts the test function.

If TEST_CHECK() short-circuits (common in test frameworks), failures may leak temp files and buffers. Consider a goto cleanup; pattern per test (or a helper macro) so cleanup runs on all paths.

tests/runtime/out_s3_multipart.c (1)

14-67: Make cleanup robust on failures (temp db/dir + mock generator).

Consider a cleanup: label that always clears the mock generator and deletes temp artifacts even if a TEST_CHECK aborts early.

Also applies to: 69-123, 227-327, 329-380, 382-434, 436-492

tests/runtime/out_s3_config.c (1)

152-230: Avoid fixed sleeps (S3_TEST_SLEEP_MS(2000))—slow and can still be flaky.

If the goal is to force a second upload for $INDEX, prefer forcing rotation deterministically (e.g., smaller total_file_size, or explicitly waiting for the first multipart cycle to complete before pushing again).

tests/runtime/out_s3_format.c (1)

432-509: Make format explicit where the test intent is “JSON + compression/log_key”.

E.g. flb_test_compression_snappy, flb_test_log_key_extraction, and flb_test_log_key_with_compression don’t set "format", "json". If the plugin default ever changes, these tests may start validating the wrong behavior.

Also applies to: 511-679

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 56d51f6 and 0e0ad38.

📒 Files selected for processing (54)
  • .github/workflows/unit-tests.yaml
  • CMakeLists.txt
  • cmake/kafka.cmake
  • cmake/parquet.cmake
  • include/fluent-bit/aws/flb_aws_compress.h
  • include/fluent-bit/flb_aws_util.h
  • include/fluent-bit/flb_blob_db.h
  • include/fluent-bit/flb_parquet.h
  • include/fluent-bit/flb_snappy.h
  • plugins/in_blob/blob_file.c
  • plugins/out_kinesis_firehose/firehose.c
  • plugins/out_s3/CMakeLists.txt
  • plugins/out_s3/s3.c
  • plugins/out_s3/s3.h
  • plugins/out_s3/s3_auth.c
  • plugins/out_s3/s3_auth.h
  • plugins/out_s3/s3_blob.c
  • plugins/out_s3/s3_blob.h
  • plugins/out_s3/s3_multipart.c
  • plugins/out_s3/s3_multipart.h
  • plugins/out_s3/s3_queue.c
  • plugins/out_s3/s3_queue.h
  • plugins/out_s3/s3_store.c
  • plugins/out_s3/s3_store.h
  • plugins/out_s3/s3_stream.c
  • plugins/out_s3/s3_stream.h
  • src/CMakeLists.txt
  • src/aws/CMakeLists.txt
  • src/aws/compression/CMakeLists.txt
  • src/aws/flb_aws_compress.c
  • src/aws/flb_aws_util.c
  • src/flb_blob_db.c
  • src/flb_parquet_impl.cpp
  • src/flb_signv4.c
  • src/flb_snappy.c
  • tests/include/aws_client_mock.c
  • tests/include/aws_client_mock.h
  • tests/include/s3_test_helpers.h
  • tests/internal/CMakeLists.txt
  • tests/internal/aws_compress.c
  • tests/internal/aws_util.c
  • tests/internal/fuzzers/aws_util_fuzzer.c
  • tests/internal/parquet.c
  • tests/runtime/CMakeLists.txt
  • tests/runtime/out_s3.c
  • tests/runtime/out_s3_blob.c
  • tests/runtime/out_s3_config.c
  • tests/runtime/out_s3_edge_cases.c
  • tests/runtime/out_s3_error_handling.c
  • tests/runtime/out_s3_format.c
  • tests/runtime/out_s3_multipart.c
  • tests/runtime/out_s3_queue.c
  • tests/runtime/out_s3_recovery.c
  • tests/runtime/out_s3_special_features.c
💤 Files with no reviewable changes (3)
  • tests/runtime/out_s3.c
  • src/aws/CMakeLists.txt
  • src/aws/compression/CMakeLists.txt
🚧 Files skipped from review as they are similar to previous changes (11)
  • plugins/in_blob/blob_file.c
  • tests/include/aws_client_mock.c
  • tests/include/aws_client_mock.h
  • plugins/out_kinesis_firehose/firehose.c
  • cmake/parquet.cmake
  • tests/internal/fuzzers/aws_util_fuzzer.c
  • tests/include/s3_test_helpers.h
  • plugins/out_s3/s3_blob.h
  • plugins/out_s3/s3_auth.h
  • tests/runtime/out_s3_error_handling.c
  • include/fluent-bit/flb_parquet.h
🧰 Additional context used
🧠 Learnings (18)
📚 Learning: 2025-08-07T10:15:46.187Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10691
File: .github/workflows/unit-tests.yaml:94-101
Timestamp: 2025-08-07T10:15:46.187Z
Learning: Apache Arrow latest packages have changing checksums, making hardcoded checksum verification impractical for CI workflows that download the latest version.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • src/flb_parquet_impl.cpp
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • src/CMakeLists.txt
  • tests/internal/CMakeLists.txt
  • src/flb_parquet_impl.cpp
  • cmake/kafka.cmake
  • tests/internal/aws_compress.c
  • CMakeLists.txt
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • src/CMakeLists.txt
  • cmake/kafka.cmake
  • CMakeLists.txt
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • include/fluent-bit/flb_snappy.h
  • src/CMakeLists.txt
  • src/aws/flb_aws_compress.c
  • src/flb_snappy.c
  • src/flb_parquet_impl.cpp
  • cmake/kafka.cmake
  • tests/internal/parquet.c
  • tests/internal/aws_compress.c
  • include/fluent-bit/aws/flb_aws_compress.h
  • plugins/out_s3/s3_stream.c
  • CMakeLists.txt
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use `#ifdef` FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • include/fluent-bit/flb_snappy.h
  • src/CMakeLists.txt
  • src/aws/flb_aws_compress.c
  • src/flb_parquet_impl.cpp
  • cmake/kafka.cmake
  • tests/internal/parquet.c
  • tests/internal/aws_compress.c
  • include/fluent-bit/aws/flb_aws_compress.h
  • plugins/out_s3/s3_stream.c
  • CMakeLists.txt
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • include/fluent-bit/flb_snappy.h
  • src/CMakeLists.txt
  • src/aws/flb_aws_compress.c
  • src/flb_snappy.c
  • cmake/kafka.cmake
  • tests/internal/parquet.c
  • tests/internal/aws_compress.c
  • include/fluent-bit/aws/flb_aws_compress.h
  • plugins/out_s3/s3_stream.c
  • CMakeLists.txt
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like `#ifdef` FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • include/fluent-bit/flb_snappy.h
  • src/CMakeLists.txt
  • src/aws/flb_aws_compress.c
  • src/flb_snappy.c
  • cmake/kafka.cmake
  • tests/internal/parquet.c
  • tests/internal/aws_compress.c
  • include/fluent-bit/aws/flb_aws_compress.h
  • plugins/out_s3/s3_stream.c
  • CMakeLists.txt
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • src/CMakeLists.txt
  • src/aws/flb_aws_compress.c
  • src/flb_parquet_impl.cpp
  • cmake/kafka.cmake
  • tests/internal/aws_compress.c
  • CMakeLists.txt
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

  • .github/workflows/unit-tests.yaml
  • src/CMakeLists.txt
  • src/flb_parquet_impl.cpp
  • tests/internal/aws_compress.c
  • plugins/out_s3/s3_store.c
  • CMakeLists.txt
📚 Learning: 2025-09-04T12:35:22.872Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:1339-1344
Timestamp: 2025-09-04T12:35:22.872Z
Learning: In the Fluent Bit S3 plugin, the user prefers to maintain current retry_limit behavior without special handling for FLB_OUT_RETRY_UNLIMITED (-1), as there's no documentation indicating -1 should be used for infinite retries and consistency with current logic is preferred.

Applied to files:

  • plugins/out_s3/s3_auth.c
  • plugins/out_s3/s3_queue.c
  • plugins/out_s3/s3_multipart.c
📚 Learning: 2025-12-17T18:45:02.283Z
Learnt from: edsiper
Repo: fluent/fluent-bit PR: 11286
File: src/flb_http_client.c:1657-1715
Timestamp: 2025-12-17T18:45:02.283Z
Learning: In the Fluent Bit OAuth2 implementation (src/flb_oauth2.c), the function flb_oauth2_get_access_token returns a pointer to the cached token stored in the OAuth2 context (ctx->access_token), not a new allocation. The OAuth2 context manages the token lifecycle, so callers must not free the returned token pointer.

Applied to files:

  • plugins/out_s3/s3_auth.c
  • plugins/out_s3/s3.h
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.

Applied to files:

  • include/fluent-bit/flb_snappy.h
  • include/fluent-bit/flb_aws_util.h
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.

Applied to files:

  • include/fluent-bit/flb_snappy.h
  • src/CMakeLists.txt
  • src/aws/flb_aws_compress.c
  • src/flb_snappy.c
  • tests/internal/parquet.c
  • tests/internal/aws_compress.c
  • include/fluent-bit/aws/flb_aws_compress.h
  • CMakeLists.txt
📚 Learning: 2025-12-22T05:39:02.291Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11250
File: lib/ripser-1.2.1/Makefile:1-18
Timestamp: 2025-12-22T05:39:02.291Z
Learning: In fluent/fluent-bit, the Makefile at lib/ripser-1.2.1/Makefile is imported from upstream Ripser and is not used in the actual build process. The project uses CMake for building (lib/ripser-1.2.1/CMakeLists.txt), so changes to the imported Makefile are not necessary.

Applied to files:

  • src/CMakeLists.txt
📚 Learning: 2025-09-04T07:28:37.083Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10832
File: src/wasm/CMakeLists.txt:112-131
Timestamp: 2025-09-04T07:28:37.083Z
Learning: In fluent-bit CMake files, the user cosmo0920 prefers treating Git as a command rather than a package, emphasizing that Git is not a pkg-config retrievable package but just a command.

Applied to files:

  • src/CMakeLists.txt
  • src/flb_parquet_impl.cpp
📚 Learning: 2025-09-14T09:46:09.531Z
Learnt from: aminvakil
Repo: fluent/fluent-bit PR: 10844
File: conf/fluent-bit:13-15
Timestamp: 2025-09-14T09:46:09.531Z
Learning: For fluent-bit Debian packaging, /opt/fluent-bit/bin/ is the appropriate installation path since the package may be installed from non-official Debian sources, making /opt compliant with FHS for optional software packages.

Applied to files:

  • src/CMakeLists.txt
📚 Learning: 2025-09-04T12:35:36.904Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:3275-3282
Timestamp: 2025-09-04T12:35:36.904Z
Learning: The out_s3 plugin intentionally uses a simple numeric comparison for retry_limit (chunk->failures >= ctx->ins->retry_limit) rather than the standard Fluent Bit pattern that checks for FLB_OUT_RETRY_UNLIMITED (-1). The maintainer wants to keep this current behavior for consistency within the plugin.

Applied to files:

  • plugins/out_s3/s3_stream.c
  • plugins/out_s3/s3_queue.c
  • plugins/out_s3/s3_multipart.c
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.

Applied to files:

  • plugins/out_s3/s3_stream.c
  • src/aws/flb_aws_util.c
🧬 Code graph analysis (21)
plugins/out_s3/s3_auth.c (6)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
src/flb_utils.c (1)
  • flb_utils_url_split (1531-1667)
src/flb_upstream.c (1)
  • flb_upstream_create_url (416-481)
src/flb_output.c (1)
  • flb_output_upstream_set (1634-1725)
src/flb_http_client.c (4)
  • flb_http_add_header (1015-1047)
  • flb_http_basic_auth (1394-1398)
  • flb_http_bearer_auth (1406-1441)
  • flb_http_client (866-911)
src/aws/flb_aws_util.c (1)
  • flb_aws_uri_encode_path (1490-1526)
include/fluent-bit/flb_snappy.h (1)
src/flb_snappy.c (1)
  • flb_snappy_compress_framed_data (134-255)
plugins/out_s3/s3_stream.h (1)
plugins/out_s3/s3_stream.c (4)
  • stream_compress_file (142-213)
  • stream_process_msgpack_file (563-631)
  • stream_json_processor (633-661)
  • stream_log_key_processor (736-765)
src/flb_signv4.c (1)
src/flb_sds.c (2)
  • flb_sds_destroy (389-399)
  • flb_sds_create_len (58-76)
src/aws/flb_aws_compress.c (1)
src/flb_snappy.c (1)
  • flb_snappy_compress_framed_data (134-255)
tests/runtime/out_s3_blob.c (2)
tests/include/s3_test_helpers.h (5)
  • s3_test_next (438-446)
  • s3_test_create_temp_db_path (137-194)
  • s3_test_create_temp_store_dir (204-257)
  • s3_test_cleanup_temp_db (417-430)
  • s3_test_cleanup_temp_store_dir (384-409)
tests/include/aws_client_mock.c (3)
  • flb_aws_client_mock_configure_generator (42-47)
  • flb_aws_client_get_mock_generator (137-140)
  • flb_aws_client_mock_clear_generator_instance (71-74)
src/flb_snappy.c (1)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
tests/runtime/out_s3_edge_cases.c (3)
tests/include/s3_test_helpers.h (5)
  • s3_test_next (438-446)
  • s3_test_create_temp_db_path (137-194)
  • s3_test_create_temp_store_dir (204-257)
  • s3_test_cleanup_temp_db (417-430)
  • s3_test_cleanup_temp_store_dir (384-409)
tests/include/aws_client_mock.c (3)
  • flb_aws_client_mock_configure_generator (42-47)
  • flb_aws_client_get_mock_generator (137-140)
  • flb_aws_client_mock_clear_generator_instance (71-74)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
src/flb_parquet_impl.cpp (1)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
tests/internal/parquet.c (3)
src/flb_parquet_impl.cpp (1)
  • field_name (548-548)
src/flb_pack.c (1)
  • flb_pack_json (530-535)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
include/fluent-bit/flb_aws_util.h (1)
src/aws/flb_aws_util.c (2)
  • flb_get_s3_key (1311-1389)
  • flb_aws_uri_encode_path (1490-1526)
tests/internal/aws_compress.c (2)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
src/aws/flb_aws_compress.c (1)
  • flb_aws_compression_get_type (87-108)
plugins/out_s3/s3_multipart.h (1)
plugins/out_s3/s3_multipart.c (13)
  • flb_s3_calculate_optimal_part_size (164-216)
  • s3_multipart_get_md5_base64 (218-248)
  • s3_multipart_create_headers (357-417)
  • s3_multipart_upload_new (450-499)
  • s3_multipart_upload_destroy (419-444)
  • s3_multipart_initiate (501-558)
  • s3_multipart_upload_part (560-652)
  • s3_multipart_complete (654-735)
  • s3_multipart_abort (737-799)
  • s3_multipart_abort_with_url (1021-1042)
  • s3_multipart_check_upload_exists (805-879)
  • s3_multipart_upload_file_part (893-1015)
  • s3_multipart_upload_file (1202-1293)
src/aws/flb_aws_util.c (2)
src/flb_sds.c (5)
  • flb_sds_create_size (92-95)
  • flb_sds_cat (120-141)
  • flb_sds_destroy (389-399)
  • flb_sds_create_len (58-76)
  • flb_sds_printf (336-387)
include/fluent-bit/flb_compat.h (1)
  • gmtime_r (76-82)
plugins/out_s3/s3_store.h (1)
plugins/out_s3/s3_store.c (1)
  • flb_chunk_file_open (521-557)
plugins/out_s3/s3_store.c (3)
src/flb_sds.c (4)
  • flb_sds_create_size (92-95)
  • flb_sds_printf (336-387)
  • flb_sds_destroy (389-399)
  • flb_sds_create (78-90)
src/flb_fstore.c (4)
  • flb_fstore_file_create (153-190)
  • flb_fstore_file_append (298-329)
  • flb_fstore_create (479-525)
  • flb_fstore_file_inactive (238-256)
include/fluent-bit/flb_compat.h (1)
  • gmtime_r (76-82)
include/fluent-bit/flb_blob_db.h (1)
src/flb_blob_db.c (18)
  • flb_blob_file_update_s3_key (619-650)
  • flb_blob_file_update_s3_key (2053-2058)
  • flb_blob_db_file_get_next_aborted (822-953)
  • flb_blob_db_file_get_next_aborted (1934-1945)
  • flb_blob_db_file_fetch_part_ids (1479-1553)
  • flb_blob_db_file_fetch_part_ids (2026-2033)
  • flb_blob_db_file_fetch_part_count (1555-1584)
  • flb_blob_db_file_fetch_part_count (2035-2039)
  • flb_blob_db_file_fetch_all_parts (1586-1672)
  • flb_blob_db_file_fetch_all_parts (2060-2069)
  • flb_blob_db_file_get_next_pending (1678-1806)
  • flb_blob_db_file_get_next_pending (2041-2051)
  • flb_blob_db_file_part_check_uploaded (1812-1847)
  • flb_blob_db_file_part_check_uploaded (2071-2076)
  • flb_blob_db_file_parts_in_progress (1098-1133)
  • flb_blob_db_file_parts_in_progress (2078-2083)
  • flb_blob_db_reset_zombie_parts (1850-1874)
  • flb_blob_db_reset_zombie_parts (2085-2088)
plugins/out_s3/s3_blob.c (4)
plugins/out_s3/s3_multipart.c (2)
  • s3_multipart_upload_new (450-499)
  • s3_multipart_abort (737-799)
plugins/out_s3/s3_auth.c (1)
  • s3_auth_fetch_presigned_url (359-402)
src/flb_input_blob.c (1)
  • flb_input_blob_file_get_info (31-112)
src/aws/flb_aws_util.c (1)
  • flb_get_s3_key (1311-1389)
plugins/out_s3/s3_multipart.c (5)
plugins/out_s3/s3.c (1)
  • write_seq_index (127-147)
src/flb_sds.c (6)
  • flb_sds_create_len (58-76)
  • flb_sds_create (78-90)
  • flb_sds_create_size (92-95)
  • flb_sds_destroy (389-399)
  • flb_sds_printf (336-387)
  • flb_sds_cat (120-141)
src/aws/flb_aws_util.c (4)
  • flb_aws_uri_encode_path (1490-1526)
  • flb_get_s3_key (1311-1389)
  • flb_aws_xml_get_val (623-655)
  • flb_aws_print_xml_error (586-610)
src/flb_http_client.c (1)
  • flb_http_client_destroy (1821-1828)
plugins/out_s3/s3_auth.c (1)
  • s3_auth_fetch_presigned_url (359-402)
plugins/out_s3/s3_queue.h (1)
plugins/out_s3/s3_queue.c (11)
  • s3_queue_remove (307-316)
  • s3_queue_buffer_chunk (356-378)
  • s3_queue_add_file (116-130)
  • s3_queue_add_part (132-202)
  • s3_queue_add_pending_file (259-272)
  • s3_queue_add_file_unlocked (58-111)
  • s3_queue_add_pending_file_unlocked (208-254)
  • s3_queue_process_entry (748-807)
  • s3_queue_recover_all (850-895)
  • s3_queue_recover_from_database (988-1076)
  • s3_queue_recover_from_fstore (1141-1201)
src/flb_blob_db.c (3)
src/flb_sqldb.c (2)
  • flb_sqldb_lock (173-178)
  • flb_sqldb_unlock (180-185)
lib/cfl/src/cfl_sds.c (2)
  • cfl_sds_create (113-125)
  • cfl_sds_destroy (127-137)
include/fluent-bit/flb_mem.h (2)
  • flb_calloc (84-96)
  • flb_free (126-128)

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Signed-off-by: Arbin Cheng <arbin.cheng@coins.ph>
Signed-off-by: Arbin <arbin.cheng@coins.ph>
Signed-off-by: Arbin Cheng <arbin.cheng@coins.ph>
Signed-off-by: Arbin <arbin.cheng@coins.ph>
Signed-off-by: Arbin Cheng <arbin.cheng@coins.ph>
Signed-off-by: Arbin <arbin.cheng@coins.ph>
Signed-off-by: Arbin Cheng <arbin.cheng@coins.ph>
Signed-off-by: Arbin <arbin.cheng@coins.ph>
Signed-off-by: Arbin Cheng <arbin.cheng@coins.ph>
Signed-off-by: Arbin <arbin.cheng@coins.ph>
Signed-off-by: Arbin Cheng <arbin.cheng@coins.ph>
Signed-off-by: Arbin <arbin.cheng@coins.ph>
Signed-off-by: Arbin Cheng <arbin.cheng@coins.ph>
Signed-off-by: Arbin <arbin.cheng@coins.ph>
@kalavt kalavt force-pushed the feature/aws-s3-parquet branch from 0e0ad38 to 2aa4b4b Compare January 14, 2026 14:51
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/aws/flb_aws_util.c (1)

796-832: BUG: replace_uri_tokens() returns SDS with incorrect length header (likely 0).

replace_uri_tokens() allocates with flb_sds_create_size() but then writes bytes directly and returns without calling flb_sds_len_set(). Any consumer using flb_sds_len(result) (common pattern) will see the wrong length and can truncate/omit content.

Proposed fix
 static char* replace_uri_tokens(const char* original_string, const char* current_word,
                          const char* new_word)
 {
     char *result;
     int i = 0;
@@
     result[i] = '\0';
+    /* Keep SDS header in sync with the actual string length */
+    flb_sds_len_set(result, i);
     return result;
 }
🤖 Fix all issues with AI agents
In `@plugins/out_s3/s3_auth.c`:
- Around line 393-394: The concatenation using flb_sds_printf into full_url with
ctx->authorization_endpoint_url and url_path can produce a double slash when
authorization_endpoint_url ends with '/' because url_path (from
build_presigned_url_path) always begins with '/'; update the logic in the code
around the flb_sds_printf call (referencing tmp, full_url,
ctx->authorization_endpoint_url, url_path, and build_presigned_url_path) to
ensure there is exactly one slash between the endpoint and path — either by
trimming a trailing '/' from ctx->authorization_endpoint_url before formatting
or by skipping the leading '/' on url_path when concatenating so flb_sds_printf
never forms "//".

In `@plugins/out_s3/s3_blob.c`:
- Around line 433-504: s3_blob_process_events can leak allocated cfl_sds_t
values when flb_input_blob_file_get_info() returns -1 but has already allocated
source or file_path; update the error branch after the
flb_input_blob_file_get_info call to defensively free any non-NULL source and
file_path using cfl_sds_destroy(source) and cfl_sds_destroy(file_path) (and then
set them to NULL) before continuing, so partial allocations are not leaked.

In `@plugins/out_s3/s3_multipart.c`:
- Around line 408-412: The storage_class header block leaves the header count
variable n unchanged, which will leave s3_headers/insertion state inconsistent;
in the conditional where ctx->storage_class is assigned to s3_headers[n] (using
storage_class_header and setting val/val_len), increment n (e.g., n++)
immediately after that assignment so the header index is advanced for subsequent
headers and keeps s3_headers/n consistent.

In `@plugins/out_s3/s3_queue.c`:
- Around line 911-917: The code uses access(file_path, F_OK) in s3_queue.c
(checking file existence) which lacks Windows portability; add a compatibility
wrapper in flb_compat.h (similar to existing realpath/basename adapters) or wrap
the call with `#ifdef` FLB_SYSTEM_WINDOWS to call _access() on Windows toolchains,
then replace the direct access(...) usage in the block that logs
file_path/file_id and calls flb_blob_db_file_delete(&ctx->blob_db, file_id) to
use that wrapper (e.g., flb_compat_access or conditional _access) so older
MSVC/UCRT toolchains work correctly.

In `@plugins/out_s3/s3_store.h`:
- Line 32: Replace all accesses to the renamed struct field file_path with the
new name stream_path in s3_queue.c (update expressions like store->file_path,
ctx->file_path, etc. to store->stream_path or ctx->stream_path). Ensure you
update every occurrence that references the field (all member accesses and any
string operations using that member), keep the flb_sds_t usage intact, and
rebuild to confirm there are no remaining references to file_path.

In `@plugins/out_s3/s3_stream.c`:
- Around line 379-440: The rename() call moves temp_template to final_path while
ctx->temp_fp remains open, which breaks on Windows; close the FILE*
(ctx->temp_fp) before calling rename(), set ctx->temp_fp to NULL, then perform
the rename and only after a successful rename set ctx->temp_path to final_path
(and reopen the file for further writes only if later logic requires keeping it
open); ensure error-handling paths still close the stream and clean up
temp_template/final_path (use the same flb_plg_error / flb_errno cleanup
pattern) and reference the existing symbols: ctx->temp_fp, temp_template,
final_path, rename(), ctx->temp_path, s3_unlink, flb_free, flb_sds_destroy.

In `@tests/include/s3_test_helpers.h`:
- Around line 137-194: The comment incorrectly calls mkstemps() "POSIX"; update
the comment near the mkstemps usage to remove the POSIX label and instead
describe mkstemps as a BSD/Glibc extension (available on Linux glibc and macOS),
and make the same change in the comment inside s3_test_create_temp_db_path
(tests/include/s3_test_helpers.h) so both references accurately state mkstemps
is a BSD/Glibc extension rather than POSIX.

In `@tests/internal/aws_compress.c`:
- Around line 77-85: The test comment incorrectly cites "RFC 7093" as the Snappy
framing spec; update the comment block in tests/internal/aws_compress.c to
reference the correct specification consistent with src/flb_snappy.c (e.g., the
Snappy framing format spec or the correct RFC/URL used there) so that the
description for flb_snappy_compress_framed_data and
flb_snappy_uncompress_framed_data matches the corrected wording in
src/flb_snappy.c.

In `@tests/runtime/out_s3_edge_cases.c`:
- Around line 7-11: Remove the direct inclusion of aws_client_mock.c from the
test and replace it with a proper header-based interface: create
aws_client_mock.h that contains declarations for every public function and any
needed types/extern variables that are defined in aws_client_mock.c (the same
symbols the tests call), add include guards, and have tests include
aws_client_mock.h (not the .c); then update the test build rules so
aws_client_mock.c is compiled once and linked into the test binary alongside the
test object (keep the existing includes of s3_test_helpers.h and s3.h
unchanged). Ensure function names and types declared in aws_client_mock.h
exactly match the definitions in aws_client_mock.c so symbols like the mock
init/teardown and any mocked S3 call functions resolve at link time.
♻️ Duplicate comments (7)
src/flb_parquet_impl.cpp (1)

981-991: CRITICAL: arrow::MakeBuilder() argument order likely wrong (compile failure on Arrow 11+).

You call arrow::MakeBuilder(field->type(), arrow::default_memory_pool()) (Line 984, Line 1114). For Arrow 11+, the common/expected order is pool first (or a different overload).

Proposed fix
-            auto builder_result = arrow::MakeBuilder(field->type(), arrow::default_memory_pool());
+            auto builder_result = arrow::MakeBuilder(arrow::default_memory_pool(), field->type());
@@
-                            auto builder_result = arrow::MakeBuilder(field->type(),
-                                                                    arrow::default_memory_pool());
+                            auto builder_result = arrow::MakeBuilder(arrow::default_memory_pool(),
+                                                                    field->type());
Apache Arrow C++: what is the signature / overload set for arrow::MakeBuilder in Arrow 11+ and Arrow 12+?

Also applies to: 1110-1120

tests/runtime/out_s3_multipart.c (1)

7-8: Stop including aws_client_mock.c directly.

Same issue as other test files - include only the header to avoid duplicate symbol definitions.

tests/runtime/out_s3_format.c (1)

9-10: Stop including aws_client_mock.c directly.

Same issue as other test files.

tests/runtime/out_s3_blob.c (1)

7-8: Stop including aws_client_mock.c directly.

Same issue as other test files.

tests/runtime/out_s3_config.c (1)

7-8: Stop including aws_client_mock.c directly.

Same issue as other test files.

tests/runtime/out_s3_error_handling.c (1)

44-101: Environment variable cleanup may still leak across tests (FLB_S3_PLUGIN_UNDER_TEST).
If tests share a process, leaving it set can affect subsequent suites. Consider adding a helper to unset it in teardown.

Also applies to: 104-161, 164-234, 237-294, 297-354, 357-422, 425-484, 487-623, 625-691

tests/internal/parquet.c (1)

289-333: Temp paths are collision-prone (/tmp/flb_test_<name>.*) — switch to mkstemp()/PID suffix.
Current naming will collide under parallel test runs and is not portable.

Proposed direction (mkstemp-based unique files)
 static int init_test_context(test_context *ctx, const char *test_name)
 {
@@
-    snprintf(ctx->msgpack_file, sizeof(ctx->msgpack_file),
-             "/tmp/flb_test_%s.msgpack", test_name);
-    snprintf(ctx->parquet_file, sizeof(ctx->parquet_file),
-             "/tmp/flb_test_%s.parquet", test_name);
+    /* Prefer mkstemp to avoid collisions (or add getpid() at minimum). */
+    snprintf(ctx->msgpack_file, sizeof(ctx->msgpack_file),
+             "/tmp/flb_test_%s_msgpack_XXXXXX", test_name);
+    snprintf(ctx->parquet_file, sizeof(ctx->parquet_file),
+             "/tmp/flb_test_%s_parquet_XXXXXX", test_name);
+    int fd1 = mkstemp(ctx->msgpack_file);
+    if (fd1 != -1) { close(fd1); }
+    int fd2 = mkstemp(ctx->parquet_file);
+    if (fd2 != -1) { close(fd2); }

Also applies to: 752-786, 1872-1939

🧹 Nitpick comments (12)
tests/internal/fuzzers/aws_util_fuzzer.c (1)

73-73: Consider fuzzing the new file_path parameter for better coverage.

The API update is correct. However, passing only NULL for the new file_path parameter limits fuzz coverage. The flb_get_s3_key function has significant logic for $FILE_PATH and $FILE_NAME template substitution that won't be exercised.

♻️ Suggested enhancement
+    char *file_path = NULL;
+    if (size > 250) {
+        file_path = get_null_terminated(50, &data, &size);
+    }
+
     if (format && tag && tag_delimiter) {
         if (!initialization_crutch()) {
             flb_sds_t s3_key_format = NULL;
-            s3_key_format = flb_get_s3_key(format, t, tag, tag_delimiter, 0, NULL);
+            s3_key_format = flb_get_s3_key(format, t, tag, tag_delimiter, 0, file_path);
             if (s3_key_format) {
                 flb_sds_destroy(s3_key_format);
             }

Don't forget to free file_path in the cleanup section.

plugins/out_s3/s3_auth.c (1)

230-230: Consider using debug log level for routine success messages.

flb_plg_info on every successful presigned URL retrieval may flood logs in high-throughput scenarios. Consider flb_plg_debug for routine operations.

♻️ Suggested change
-    flb_plg_info(ctx->ins, "Pre-signed URL retrieved successfully");
+    flb_plg_debug(ctx->ins, "Pre-signed URL retrieved successfully");
plugins/out_s3/s3_multipart.h (1)

52-107: Prefer const-correctness for read-only buffers/URLs.

Several APIs accept char *pre_signed_url / char *body but appear read-only; consider switching to const char * (and const char *body) to prevent accidental mutation and clarify ownership.

plugins/out_s3/s3_multipart.c (2)

62-64: Integer overflow risk in sleep_ms for POSIX.

The expression milliseconds * 1000 can overflow a 32-bit int when milliseconds > 2147483 (about 35 minutes). While current usage with backoff_ms capped at 30000ms is safe, this is a latent bug if the function is reused elsewhere.

🔧 Proposed fix to use safer types
 /* Cross-platform sleep wrapper */
 static inline void sleep_ms(int milliseconds) {
-    usleep(milliseconds * 1000);
+    usleep((useconds_t)milliseconds * 1000);
 }

Alternatively, since flb_compat.h already provides a cross-platform usleep() for Windows (lines 128-132 in the relevant snippet), consider using that directly instead of defining a new wrapper.


117-138: strchr does not respect size bounds if response is not NUL-terminated.

The function receives size but strchr at lines 118, 124, and 135 will scan until NUL regardless of size. If response is not NUL-terminated, this could read out of bounds. The bounds checks at lines 130-132 and 136-138 help but only after strchr has already scanned past size.

Consider using memchr instead of strchr to respect the size parameter:

🔧 Proposed fix using memchr
     /* Find "ETag:" header in response */
-    etag_header = strstr(response, "ETag:");
+    etag_header = flb_strnstr(response, "ETag:", size);
     if (!etag_header) {
         return NULL;
     }

+    size_t remaining = size - (etag_header - response);
+
     /* Find the opening quote after "ETag:" */
-    start_quote = strchr(etag_header, '\"');
+    start_quote = memchr(etag_header, '\"', remaining);
     if (!start_quote) {
         return NULL;
     }
tests/runtime/out_s3_queue.c (1)

7-8: Stop including aws_client_mock.c directly.

Including a .c file creates duplicate symbol definitions if these tests are ever compiled into a single binary or linked together. This pattern is repeated across all S3 test files in this PR.

Include only the header and ensure the mock implementation is compiled/linked separately.

♻️ Proposed fix
 `#include` "../include/aws_client_mock.h"
-#include "../include/aws_client_mock.c"
tests/runtime/out_s3_multipart.c (1)

387-440: Note: flb_test_abort_multipart_success and flb_test_complete_multipart_failure have identical mock chains.

Both tests use the same sequence: Create → UploadPart → Complete(500) → Abort(DELETE 204). While this correctly tests abort behavior, consider adding a comment to flb_test_complete_multipart_failure clarifying it also validates the abort path, or differentiate the tests more clearly.

tests/internal/parquet.c (2)

69-136: ChunkIO header CRC32 fields are hardcoded to 0; consider computing or using ChunkIO helpers.
If ChunkIO validation is enabled (now or later), these tests may become brittle.


407-441: Very large record-count tests may cause CI timeouts; consider gating or trimming.
The 70k/131k loops are valuable, but they’re expensive. Consider an env-gated “long tests” mode or reducing counts while still covering boundaries.

Also applies to: 1872-1939

plugins/out_s3/s3_blob.c (1)

183-235: Consider skipping MPU abort for stale files when s3_key is unknown.
Regenerating the key for time-based formats is explicitly warned as potentially wrong; abort requests will likely fail and create noise. Prefer “best effort” only when s3_key is present (or rely on S3 MPU expiration).

tests/runtime/out_s3_edge_cases.c (1)

18-98: Make flb_test_empty_data_upload deterministic: assert mock chain state instead of sleeping.
Right now it’s “no crash” only and timing-based. Consider asserting that no requests were consumed (or don’t install a chain at all).

src/flb_blob_db.c (1)

1849-1874: Consider capturing error details from sqlite3_exec.

The function uses sqlite3_exec but ignores the error message parameter. While functionally correct, capturing the error message could aid debugging.

Optional enhancement for better error diagnostics
 int flb_blob_db_reset_zombie_parts(struct flb_blob_db *context)
 {
     const char *sql;
+    char *errmsg = NULL;
     int result;

     if (!context || !context->db) {
         return FLB_BLOB_DB_ERROR_INVALID_BLOB_DB_CONTEXT;
     }

     sql = "UPDATE blob_parts SET in_progress = 0 "
           "WHERE uploaded = 0 AND in_progress = 1";

     flb_sqldb_lock(context->db);

-    result = sqlite3_exec(context->db->handler, sql, NULL, NULL, NULL);
+    result = sqlite3_exec(context->db->handler, sql, NULL, NULL, &errmsg);

     flb_sqldb_unlock(context->db);

     if (result != SQLITE_OK) {
         context->last_error = result;
+        if (errmsg != NULL) {
+            sqlite3_free(errmsg);
+        }
         return -1;
     }

     return FLB_BLOB_DB_SUCCESS;
 }
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0e0ad38 and 2aa4b4b.

📒 Files selected for processing (47)
  • include/fluent-bit/aws/flb_aws_compress.h
  • include/fluent-bit/flb_aws_util.h
  • include/fluent-bit/flb_blob_db.h
  • include/fluent-bit/flb_parquet.h
  • include/fluent-bit/flb_snappy.h
  • plugins/in_blob/blob_file.c
  • plugins/out_kinesis_firehose/firehose.c
  • plugins/out_s3/CMakeLists.txt
  • plugins/out_s3/s3.c
  • plugins/out_s3/s3.h
  • plugins/out_s3/s3_auth.c
  • plugins/out_s3/s3_auth.h
  • plugins/out_s3/s3_blob.c
  • plugins/out_s3/s3_blob.h
  • plugins/out_s3/s3_multipart.c
  • plugins/out_s3/s3_multipart.h
  • plugins/out_s3/s3_queue.c
  • plugins/out_s3/s3_queue.h
  • plugins/out_s3/s3_store.c
  • plugins/out_s3/s3_store.h
  • plugins/out_s3/s3_stream.c
  • plugins/out_s3/s3_stream.h
  • src/aws/flb_aws_compress.c
  • src/aws/flb_aws_util.c
  • src/flb_blob_db.c
  • src/flb_parquet_impl.cpp
  • src/flb_signv4.c
  • src/flb_snappy.c
  • tests/include/aws_client_mock.c
  • tests/include/aws_client_mock.h
  • tests/include/s3_test_helpers.h
  • tests/internal/CMakeLists.txt
  • tests/internal/aws_compress.c
  • tests/internal/aws_util.c
  • tests/internal/fuzzers/aws_util_fuzzer.c
  • tests/internal/parquet.c
  • tests/runtime/CMakeLists.txt
  • tests/runtime/out_s3.c
  • tests/runtime/out_s3_blob.c
  • tests/runtime/out_s3_config.c
  • tests/runtime/out_s3_edge_cases.c
  • tests/runtime/out_s3_error_handling.c
  • tests/runtime/out_s3_format.c
  • tests/runtime/out_s3_multipart.c
  • tests/runtime/out_s3_queue.c
  • tests/runtime/out_s3_recovery.c
  • tests/runtime/out_s3_special_features.c
💤 Files with no reviewable changes (1)
  • tests/runtime/out_s3.c
✅ Files skipped from review due to trivial changes (1)
  • plugins/out_s3/s3.h
🚧 Files skipped from review as they are similar to previous changes (5)
  • include/fluent-bit/flb_snappy.h
  • tests/include/aws_client_mock.h
  • plugins/out_kinesis_firehose/firehose.c
  • plugins/out_s3/s3_auth.h
  • plugins/out_s3/s3_blob.h
🧰 Additional context used
🧠 Learnings (15)
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.

Applied to files:

  • tests/internal/parquet.c
  • src/flb_snappy.c
  • src/flb_parquet_impl.cpp
  • src/aws/flb_aws_compress.c
  • tests/internal/aws_compress.c
  • include/fluent-bit/aws/flb_aws_compress.h
  • plugins/out_s3/s3_stream.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.

Applied to files:

  • tests/internal/parquet.c
  • src/flb_snappy.c
  • src/aws/flb_aws_compress.c
  • tests/internal/aws_compress.c
  • include/fluent-bit/aws/flb_aws_compress.h
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like `#ifdef` FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

  • tests/internal/parquet.c
  • src/flb_snappy.c
  • src/aws/flb_aws_compress.c
  • tests/internal/aws_compress.c
  • include/fluent-bit/aws/flb_aws_compress.h
  • plugins/out_s3/s3_stream.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use `#ifdef` FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

  • tests/internal/parquet.c
  • src/flb_snappy.c
  • src/flb_parquet_impl.cpp
  • src/aws/flb_aws_compress.c
  • tests/internal/aws_compress.c
  • include/fluent-bit/aws/flb_aws_compress.h
  • plugins/out_s3/s3_stream.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.

Applied to files:

  • tests/internal/parquet.c
  • src/flb_snappy.c
  • src/aws/flb_aws_compress.c
  • tests/internal/aws_compress.c
  • include/fluent-bit/aws/flb_aws_compress.h
  • plugins/out_s3/s3_stream.c
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.

Applied to files:

  • src/aws/flb_aws_util.c
  • plugins/out_s3/s3_stream.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

  • tests/runtime/out_s3_error_handling.c
  • src/flb_parquet_impl.cpp
  • plugins/out_s3/s3_store.c
  • tests/include/s3_test_helpers.h
  • tests/internal/aws_compress.c
📚 Learning: 2025-08-07T10:15:46.187Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10691
File: .github/workflows/unit-tests.yaml:94-101
Timestamp: 2025-08-07T10:15:46.187Z
Learning: Apache Arrow latest packages have changing checksums, making hardcoded checksum verification impractical for CI workflows that download the latest version.

Applied to files:

  • src/flb_parquet_impl.cpp
📚 Learning: 2025-09-04T07:28:37.083Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10832
File: src/wasm/CMakeLists.txt:112-131
Timestamp: 2025-09-04T07:28:37.083Z
Learning: In fluent-bit CMake files, the user cosmo0920 prefers treating Git as a command rather than a package, emphasizing that Git is not a pkg-config retrievable package but just a command.

Applied to files:

  • src/flb_parquet_impl.cpp
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.

Applied to files:

  • src/flb_parquet_impl.cpp
  • src/aws/flb_aws_compress.c
  • tests/internal/aws_compress.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.

Applied to files:

  • src/flb_parquet_impl.cpp
  • tests/internal/CMakeLists.txt
  • tests/internal/aws_compress.c
📚 Learning: 2025-09-04T12:35:22.872Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:1339-1344
Timestamp: 2025-09-04T12:35:22.872Z
Learning: In the Fluent Bit S3 plugin, the user prefers to maintain current retry_limit behavior without special handling for FLB_OUT_RETRY_UNLIMITED (-1), as there's no documentation indicating -1 should be used for infinite retries and consistency with current logic is preferred.

Applied to files:

  • plugins/out_s3/s3_auth.c
  • plugins/out_s3/s3_queue.c
  • plugins/out_s3/s3_multipart.c
📚 Learning: 2025-12-17T18:45:02.283Z
Learnt from: edsiper
Repo: fluent/fluent-bit PR: 11286
File: src/flb_http_client.c:1657-1715
Timestamp: 2025-12-17T18:45:02.283Z
Learning: In the Fluent Bit OAuth2 implementation (src/flb_oauth2.c), the function flb_oauth2_get_access_token returns a pointer to the cached token stored in the OAuth2 context (ctx->access_token), not a new allocation. The OAuth2 context manages the token lifecycle, so callers must not free the returned token pointer.

Applied to files:

  • plugins/out_s3/s3_auth.c
📚 Learning: 2025-09-04T12:35:36.904Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:3275-3282
Timestamp: 2025-09-04T12:35:36.904Z
Learning: The out_s3 plugin intentionally uses a simple numeric comparison for retry_limit (chunk->failures >= ctx->ins->retry_limit) rather than the standard Fluent Bit pattern that checks for FLB_OUT_RETRY_UNLIMITED (-1). The maintainer wants to keep this current behavior for consistency within the plugin.

Applied to files:

  • plugins/out_s3/s3_queue.c
  • plugins/out_s3/s3_stream.c
  • plugins/out_s3/s3_multipart.c
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.

Applied to files:

  • include/fluent-bit/flb_aws_util.h
🧬 Code graph analysis (23)
tests/internal/parquet.c (3)
src/flb_parquet_impl.cpp (1)
  • field_name (548-548)
src/flb_pack.c (1)
  • flb_pack_json (530-535)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
tests/runtime/out_s3_multipart.c (2)
tests/include/s3_test_helpers.h (5)
  • s3_test_next (438-446)
  • s3_test_create_temp_db_path (137-194)
  • s3_test_create_temp_store_dir (204-257)
  • s3_test_cleanup_temp_db (417-430)
  • s3_test_cleanup_temp_store_dir (384-409)
tests/include/aws_client_mock.c (3)
  • flb_aws_client_mock_configure_generator (42-47)
  • flb_aws_client_get_mock_generator (137-140)
  • flb_aws_client_mock_clear_generator_instance (71-74)
src/flb_signv4.c (1)
src/flb_sds.c (2)
  • flb_sds_destroy (389-399)
  • flb_sds_create_len (58-76)
plugins/in_blob/blob_file.c (1)
plugins/in_blob/blob_db.c (1)
  • blob_db_file_insert (164-196)
include/fluent-bit/flb_parquet.h (1)
src/flb_parquet_impl.cpp (8)
  • flb_parquet_validate_schema (860-883)
  • flb_parquet_validate_schema (860-862)
  • flb_parquet_schema_create (885-910)
  • flb_parquet_schema_create (885-887)
  • flb_parquet_schema_destroy (912-918)
  • flb_parquet_schema_destroy (912-912)
  • flb_msgpack_to_parquet_streaming (920-1274)
  • flb_msgpack_to_parquet_streaming (920-925)
tests/runtime/out_s3_error_handling.c (2)
tests/include/s3_test_helpers.h (5)
  • s3_test_next (438-446)
  • s3_test_create_temp_db_path (137-194)
  • s3_test_create_temp_store_dir (204-257)
  • s3_test_cleanup_temp_db (417-430)
  • s3_test_cleanup_temp_store_dir (384-409)
tests/include/aws_client_mock.c (3)
  • flb_aws_client_mock_configure_generator (42-47)
  • flb_aws_client_get_mock_generator (137-140)
  • flb_aws_client_mock_clear_generator_instance (71-74)
src/flb_snappy.c (1)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
src/flb_parquet_impl.cpp (1)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
plugins/out_s3/s3_stream.h (1)
plugins/out_s3/s3_stream.c (4)
  • stream_compress_file (142-213)
  • stream_process_msgpack_file (563-631)
  • stream_json_processor (633-661)
  • stream_log_key_processor (736-765)
plugins/out_s3/s3_auth.c (5)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
src/flb_utils.c (1)
  • flb_utils_url_split (1531-1667)
src/flb_upstream.c (1)
  • flb_upstream_create_url (416-481)
src/flb_http_client.c (4)
  • flb_http_add_header (1015-1047)
  • flb_http_basic_auth (1394-1398)
  • flb_http_bearer_auth (1406-1441)
  • flb_http_client (866-911)
src/aws/flb_aws_util.c (1)
  • flb_aws_uri_encode_path (1490-1526)
tests/runtime/out_s3_edge_cases.c (4)
tests/include/s3_test_helpers.h (5)
  • s3_test_next (438-446)
  • s3_test_create_temp_db_path (137-194)
  • s3_test_create_temp_store_dir (204-257)
  • s3_test_cleanup_temp_db (417-430)
  • s3_test_cleanup_temp_store_dir (384-409)
tests/include/aws_client_mock.c (3)
  • flb_aws_client_mock_configure_generator (42-47)
  • flb_aws_client_get_mock_generator (137-140)
  • flb_aws_client_mock_clear_generator_instance (71-74)
src/flb_lib.c (9)
  • flb_create (143-225)
  • flb_service_set (652-678)
  • flb_input (266-276)
  • flb_input_set (305-335)
  • flb_output (279-289)
  • flb_output_set (520-551)
  • flb_start (983-994)
  • flb_stop (1011-1055)
  • flb_destroy (228-263)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
plugins/out_s3/s3_queue.c (5)
include/fluent-bit/flb_mem.h (2)
  • flb_calloc (84-96)
  • flb_free (126-128)
src/flb_sds.c (3)
  • flb_sds_create_len (58-76)
  • flb_sds_create (78-90)
  • flb_sds_destroy (389-399)
plugins/out_s3/s3_store.c (2)
  • s3_store_buffer_put (199-240)
  • s3_store_file_delete (508-533)
plugins/out_s3/s3_auth.c (1)
  • s3_auth_fetch_presigned_url (359-407)
lib/cfl/src/cfl_sds.c (2)
  • cfl_sds_len (88-91)
  • cfl_sds_destroy (127-137)
tests/runtime/out_s3_blob.c (2)
tests/include/s3_test_helpers.h (5)
  • s3_test_next (438-446)
  • s3_test_create_temp_db_path (137-194)
  • s3_test_create_temp_store_dir (204-257)
  • s3_test_cleanup_temp_db (417-430)
  • s3_test_cleanup_temp_store_dir (384-409)
tests/include/aws_client_mock.c (3)
  • flb_aws_client_mock_configure_generator (42-47)
  • flb_aws_client_get_mock_generator (137-140)
  • flb_aws_client_mock_clear_generator_instance (71-74)
src/aws/flb_aws_compress.c (1)
src/flb_snappy.c (1)
  • flb_snappy_compress_framed_data (134-255)
tests/include/s3_test_helpers.h (2)
plugins/out_s3/s3.c (1)
  • setenv (66-69)
tests/include/aws_client_mock.c (1)
  • flb_aws_client_mock_generator_count_unused_requests (143-147)
tests/internal/fuzzers/aws_util_fuzzer.c (1)
src/aws/flb_aws_util.c (1)
  • flb_get_s3_key (1311-1389)
plugins/out_s3/s3_queue.h (1)
plugins/out_s3/s3_queue.c (11)
  • s3_queue_remove (307-316)
  • s3_queue_buffer_chunk (356-378)
  • s3_queue_add_file (116-130)
  • s3_queue_add_part (132-202)
  • s3_queue_add_pending_file (259-272)
  • s3_queue_add_file_unlocked (58-111)
  • s3_queue_add_pending_file_unlocked (208-254)
  • s3_queue_process_entry (748-807)
  • s3_queue_recover_all (850-895)
  • s3_queue_recover_from_database (988-1076)
  • s3_queue_recover_from_fstore (1141-1201)
tests/internal/aws_util.c (1)
src/aws/flb_aws_util.c (1)
  • flb_get_s3_key (1311-1389)
include/fluent-bit/flb_aws_util.h (1)
src/aws/flb_aws_util.c (2)
  • flb_get_s3_key (1311-1389)
  • flb_aws_uri_encode_path (1490-1526)
tests/internal/aws_compress.c (1)
src/flb_snappy.c (2)
  • flb_snappy_compress_framed_data (134-255)
  • flb_snappy_uncompress_framed_data (257-481)
plugins/out_s3/s3_stream.c (4)
src/aws/flb_aws_compress.c (1)
  • flb_aws_compression_compress (110-127)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
src/flb_sds.c (4)
  • flb_sds_create_size (92-95)
  • flb_sds_cat (120-141)
  • flb_sds_create (78-90)
  • flb_sds_printf (336-387)
plugins/out_s3/s3_store.c (1)
  • flb_chunk_file_open (535-571)
src/flb_blob_db.c (3)
src/flb_sqldb.c (3)
  • flb_sqldb_close (129-149)
  • flb_sqldb_lock (173-178)
  • flb_sqldb_unlock (180-185)
lib/cfl/src/cfl_sds.c (2)
  • cfl_sds_create (113-125)
  • cfl_sds_destroy (127-137)
include/fluent-bit/flb_mem.h (1)
  • flb_calloc (84-96)
plugins/out_s3/s3_multipart.c (4)
include/fluent-bit/flb_compat.h (1)
  • usleep (129-133)
plugins/out_s3/s3.c (1)
  • write_seq_index (127-147)
include/fluent-bit/flb_mem.h (2)
  • flb_calloc (84-96)
  • flb_free (126-128)
plugins/out_s3/s3_auth.c (1)
  • s3_auth_fetch_presigned_url (359-407)

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@kalavt kalavt force-pushed the feature/aws-s3-parquet branch from 2aa4b4b to 01929d6 Compare January 14, 2026 17:00
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Fix all issues with AI agents
In `@plugins/out_s3/s3_blob.c`:
- Around line 442-531: flb_blob_db_file_insert() returns int64_t so avoid
truncation and incorrect error checks: replace the current int ret =
flb_blob_db_file_insert(...) usage with assigning its return directly to the
existing int64_t file_id (i.e. file_id = flb_blob_db_file_insert(...)) and
change the error check from (ret == -1) to (file_id < 0); keep using the int
variable ret for subsequent calls that return int (e.g., s3_blob_register_parts
and s3_queue_add_pending_file). Ensure all error-handling branches use file_id
for deletion/logging when appropriate.

In `@plugins/out_s3/s3_store.c`:
- Around line 553-589: In flb_chunk_file_open, fix the meta_len logging and make
the fseek safe: log meta_len using the correct unsigned 16-bit specifier (e.g.,
cast to unsigned and use %hu or use PRIu16) instead of %d, and before fseek
validate meta_len against the actual remaining file size (use fseek/ftell or
fstat to get file length and compute remaining bytes) to ensure meta_len is not
larger than available bytes; if validation fails, log an error and close/return
NULL. Also ensure the fseek call uses a signed/long offset type appropriate for
fseek and handle any overflow when converting meta_len to the seek offset.

In `@plugins/out_s3/s3_stream.c`:
- Around line 60-140: The cleanup path in stream_compress_file_chunked may leak
compressed_chunk if an upstream compressor (e.g., flb_gzip_compress) returns -1
without freeing its output; update the cleanup block in
stream_compress_file_chunked to defensively free compressed_chunk (check if
compressed_chunk is non-NULL then call flb_free) in addition to freeing
read_buffer so any partially-allocated compressed_chunk is released on error or
early return.

In `@tests/include/s3_test_helpers.h`:
- Around line 206-270: The Windows build is missing linkage for the RPC/OLE
functions used in s3_test_create_temp_store_dir() (CoCreateGuid, UuidToStringA,
RpcStringFreeA); update the test target's target_link_libraries invocation to
add ole32 and rpcrt4 when WIN32 is true (or conditionally for the S3 test
target) so the linker can resolve those symbols; keep the existing
fluent-bit-static, ${CMAKE_THREAD_LIBS_INIT} and ${SYSTEMD_LIB} entries and only
add ole32 and rpcrt4 inside the WIN32 branch.
♻️ Duplicate comments (2)
plugins/out_s3/s3_stream.c (1)

319-337: Delete/recreate on Windows reintroduces a TOCTOU race
GetTempFileName() creates the file, then DeleteFile() + _O_EXCL recreates it. Another process can potentially win that window. Better: open the already-created file (no delete), or switch to CreateFile(..., CREATE_NEW, ...) on a generated unique name without creating it first.

tests/runtime/out_s3_config.c (1)

18-430: Tests don’t currently assert the generated S3 key/URI, so they won’t catch key_format regressions
Most s3_key_format_* tests only assert HTTP method. To validate $TAG/$INDEX/$UUID/%Y... expansion, add mock expectations for request URI/path (and ideally host/port for endpoint tests).

🧹 Nitpick comments (11)
plugins/out_s3/s3_auth.c (1)

230-230: Consider lowering log level from info to debug for routine operations.

Logging at info level on every successful presigned URL retrieval may produce excessive log volume in production, especially with multipart uploads that fetch multiple URLs per upload. Consider using flb_plg_debug for routine success messages and reserving info for significant state changes.

Suggested change
-    flb_plg_info(ctx->ins, "Pre-signed URL retrieved successfully");
+    flb_plg_debug(ctx->ins, "Pre-signed URL retrieved successfully");
tests/internal/parquet.c (1)

906-941: Use flb_malloc/flb_free for consistency with Fluent Bit memory management.

The test uses standard malloc() and free() while the rest of the codebase uses flb_malloc() and flb_free(). For consistency with Fluent Bit's memory allocation patterns, consider using the Fluent Bit wrappers.

Suggested change
     const size_t long_str_size = 100000;  /* 100KB string */
-    char *long_str = (char *)malloc(long_str_size);
+    char *long_str = (char *)flb_malloc(long_str_size);

     if (!long_str) {
         TEST_CHECK(0);  /* Memory allocation failed */
         return;
     }
-    free(long_str);
+    flb_free(long_str);
     cleanup_test_context(&ctx);
plugins/out_s3/s3_store.c (5)

116-140: Buffer warning threshold logic looks inverted at first glance; consider simplifying
space_remaining/limit < (1.0 - 0.8) is equivalent to “used > 80%”. It’s correct, but harder to read than new_total > limit * 0.8.


383-407: You can likely drop the platform split by using gmtime_r() from flb_compat.h
This file already includes flb_compat.h, which provides a gmtime_r wrapper on Windows. Calling gmtime_r unconditionally would reduce platform-specific code here.


260-295: Use PATH_SEPARATOR macro instead of hardcoded "/" for Windows compatibility

The snprintf(chunk_path, sizeof(chunk_path), "%s/%s", ...) call uses a hardcoded "/" separator. Fluent Bit already provides PATH_SEPARATOR macro (defined in include/fluent-bit/calyptia/calyptia_constants.h) for platform-aware path construction. Follow the pattern used in plugins/out_file/file.c and use:

snprintf(chunk_path, sizeof(chunk_path), "%s" PATH_SEPARATOR "%s", stream_path, chunk_name)

This ensures consistency across the codebase and prevents edge cases on Windows.


33-49: Use FLB_SYSTEM_WINDOWS consistently for Windows detection across the file

Line 34 uses _WIN32 to define FLB_PATH_MAX, but line 278 uses FLB_SYSTEM_WINDOWS for the same purpose (stat vs _stat). Fluent Bit defines FLB_SYSTEM_WINDOWS via CMake when CMAKE_SYSTEM_NAME matches "Windows" and uses it as the standard platform macro throughout the codebase. Prefer FLB_SYSTEM_WINDOWS in both places to maintain consistency with the project convention and avoid potential divergence if code is built outside the standard CMake flow.


142-215: Append failure handling for existing files needs safeguards

When flb_fstore_file_append() fails on an existing buffer file (one that was passed in rather than newly created), the file is left in a partial/corrupted state with no cleanup or status flag. Callers only receive a warning. Consider either: (a) locking the file to prevent reuse, or (b) deleting and allowing the pipeline to recreate a fresh buffer. The existing locked field and lock/unlock functions provide infrastructure for option (a).

plugins/out_s3/s3_queue.h (1)

23-102: Header should directly include type dependencies (stdint.h, time.h)
This header uses uint64_t and time_t but only includes <sys/types.h>. Relying on s3.h transitive includes is brittle.

Proposed fix
 `#include` "s3.h"
 `#include` <sys/types.h>  /* for off_t */
+ `#include` <stdint.h>    /* for uint64_t */
+ `#include` <time.h>      /* for time_t */
tests/runtime/out_s3_config.c (1)

687-835: Looped scenario tests: consider including scenario name in WAIT label
S3_TEST_WAIT_MOCK_EXHAUSTED(5, "storage_class_variations") inside a loop makes failures harder to pinpoint. Consider suffixing with classes[i] / acls[i] in the label.

Also applies to: 762-835

plugins/out_s3/s3_blob.c (1)

507-509: Prefer a provided tag length over strlen(tag)
You pass strlen(event_chunk->tag) into s3_queue_add_pending_file(). If event_chunk already provides a tag length (common in Fluent Bit event APIs), use it to avoid surprises with embedded NULs or non-string tags.

#!/bin/bash
# Check struct flb_event_chunk definition for tag length fields.
rg -n --type=h 'struct\s+flb_event_chunk\b' include -n -C 3
tests/runtime/out_s3_edge_cases.c (1)

80-92: Minor: Redundant assertion at line 92.

The assertion TEST_CHECK(ret == 0) at line 92 checks the same ret value from flb_start() that was already verified at line 82. This is redundant but not harmful. The comment "Verify no crash occurred - reaching here is the test" suggests the intent is to verify the test didn't crash, which is implicitly true by reaching that point.

Suggested simplification
     /* Verify no crash occurred - reaching here is the test */
-    TEST_CHECK(ret == 0);
+    TEST_MSG("Empty flush completed without crash");
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2aa4b4b and 01929d6.

📒 Files selected for processing (33)
  • plugins/out_s3/CMakeLists.txt
  • plugins/out_s3/s3.c
  • plugins/out_s3/s3.h
  • plugins/out_s3/s3_auth.c
  • plugins/out_s3/s3_auth.h
  • plugins/out_s3/s3_blob.c
  • plugins/out_s3/s3_blob.h
  • plugins/out_s3/s3_multipart.c
  • plugins/out_s3/s3_multipart.h
  • plugins/out_s3/s3_queue.c
  • plugins/out_s3/s3_queue.h
  • plugins/out_s3/s3_store.c
  • plugins/out_s3/s3_store.h
  • plugins/out_s3/s3_stream.c
  • plugins/out_s3/s3_stream.h
  • tests/include/aws_client_mock.c
  • tests/include/aws_client_mock.h
  • tests/include/s3_test_helpers.h
  • tests/internal/CMakeLists.txt
  • tests/internal/aws_compress.c
  • tests/internal/aws_util.c
  • tests/internal/parquet.c
  • tests/runtime/CMakeLists.txt
  • tests/runtime/out_s3.c
  • tests/runtime/out_s3_blob.c
  • tests/runtime/out_s3_config.c
  • tests/runtime/out_s3_edge_cases.c
  • tests/runtime/out_s3_error_handling.c
  • tests/runtime/out_s3_format.c
  • tests/runtime/out_s3_multipart.c
  • tests/runtime/out_s3_queue.c
  • tests/runtime/out_s3_recovery.c
  • tests/runtime/out_s3_special_features.c
💤 Files with no reviewable changes (1)
  • tests/runtime/out_s3.c
🚧 Files skipped from review as they are similar to previous changes (7)
  • tests/internal/CMakeLists.txt
  • tests/include/aws_client_mock.h
  • plugins/out_s3/s3_stream.h
  • plugins/out_s3/s3_auth.h
  • plugins/out_s3/CMakeLists.txt
  • plugins/out_s3/s3_multipart.h
  • tests/internal/aws_compress.c
🧰 Additional context used
🧠 Learnings (10)
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.

Applied to files:

  • tests/internal/parquet.c
  • plugins/out_s3/s3_stream.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.

Applied to files:

  • tests/internal/parquet.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like `#ifdef` FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

  • tests/internal/parquet.c
  • plugins/out_s3/s3_stream.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use `#ifdef` FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

  • tests/internal/parquet.c
  • plugins/out_s3/s3_stream.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.

Applied to files:

  • tests/internal/parquet.c
  • plugins/out_s3/s3_stream.c
📚 Learning: 2025-09-04T12:35:36.904Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:3275-3282
Timestamp: 2025-09-04T12:35:36.904Z
Learning: The out_s3 plugin intentionally uses a simple numeric comparison for retry_limit (chunk->failures >= ctx->ins->retry_limit) rather than the standard Fluent Bit pattern that checks for FLB_OUT_RETRY_UNLIMITED (-1). The maintainer wants to keep this current behavior for consistency within the plugin.

Applied to files:

  • plugins/out_s3/s3_stream.c
  • plugins/out_s3/s3_queue.c
  • plugins/out_s3/s3_multipart.c
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.

Applied to files:

  • plugins/out_s3/s3_stream.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

  • tests/include/s3_test_helpers.h
  • plugins/out_s3/s3_store.c
  • tests/runtime/out_s3_error_handling.c
  • plugins/out_s3/s3_queue.c
📚 Learning: 2025-09-04T12:35:22.872Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:1339-1344
Timestamp: 2025-09-04T12:35:22.872Z
Learning: In the Fluent Bit S3 plugin, the user prefers to maintain current retry_limit behavior without special handling for FLB_OUT_RETRY_UNLIMITED (-1), as there's no documentation indicating -1 should be used for infinite retries and consistency with current logic is preferred.

Applied to files:

  • plugins/out_s3/s3_auth.c
  • plugins/out_s3/s3_queue.c
  • plugins/out_s3/s3_multipart.c
📚 Learning: 2025-12-17T18:45:02.283Z
Learnt from: edsiper
Repo: fluent/fluent-bit PR: 11286
File: src/flb_http_client.c:1657-1715
Timestamp: 2025-12-17T18:45:02.283Z
Learning: In the Fluent Bit OAuth2 implementation (src/flb_oauth2.c), the function flb_oauth2_get_access_token returns a pointer to the cached token stored in the OAuth2 context (ctx->access_token), not a new allocation. The OAuth2 context manages the token lifecycle, so callers must not free the returned token pointer.

Applied to files:

  • plugins/out_s3/s3_auth.c
  • plugins/out_s3/s3.h
🧬 Code graph analysis (13)
tests/runtime/out_s3_multipart.c (3)
tests/include/s3_test_helpers.h (5)
  • s3_test_next (451-459)
  • s3_test_create_temp_db_path (138-196)
  • s3_test_create_temp_store_dir (206-270)
  • s3_test_cleanup_temp_db (430-443)
  • s3_test_cleanup_temp_store_dir (397-422)
tests/include/aws_client_mock.c (3)
  • flb_aws_client_mock_configure_generator (42-47)
  • flb_aws_client_get_mock_generator (137-140)
  • flb_aws_client_mock_clear_generator_instance (71-74)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
tests/internal/parquet.c (3)
src/flb_parquet_impl.cpp (1)
  • field_name (548-548)
src/flb_pack.c (1)
  • flb_pack_json (530-535)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
plugins/out_s3/s3_stream.c (4)
src/aws/flb_aws_compress.c (1)
  • flb_aws_compression_compress (110-127)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
src/flb_sds.c (4)
  • flb_sds_create_size (92-95)
  • flb_sds_cat (120-141)
  • flb_sds_create (78-90)
  • flb_sds_printf (336-387)
plugins/out_s3/s3_store.c (1)
  • flb_chunk_file_open (553-589)
tests/runtime/out_s3_format.c (3)
tests/include/s3_test_helpers.h (5)
  • s3_test_next (451-459)
  • s3_test_create_temp_db_path (138-196)
  • s3_test_create_temp_store_dir (206-270)
  • s3_test_cleanup_temp_db (430-443)
  • s3_test_cleanup_temp_store_dir (397-422)
tests/include/aws_client_mock.c (3)
  • flb_aws_client_mock_configure_generator (42-47)
  • flb_aws_client_get_mock_generator (137-140)
  • flb_aws_client_mock_clear_generator_instance (71-74)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
tests/runtime/out_s3_blob.c (3)
tests/include/s3_test_helpers.h (5)
  • s3_test_next (451-459)
  • s3_test_create_temp_db_path (138-196)
  • s3_test_create_temp_store_dir (206-270)
  • s3_test_cleanup_temp_db (430-443)
  • s3_test_cleanup_temp_store_dir (397-422)
tests/include/aws_client_mock.c (3)
  • flb_aws_client_mock_configure_generator (42-47)
  • flb_aws_client_get_mock_generator (137-140)
  • flb_aws_client_mock_clear_generator_instance (71-74)
src/flb_lib.c (3)
  • flb_create (143-225)
  • flb_start (983-994)
  • flb_stop (1011-1055)
tests/include/s3_test_helpers.h (2)
plugins/out_s3/s3.c (1)
  • setenv (66-69)
tests/include/aws_client_mock.c (1)
  • flb_aws_client_mock_generator_count_unused_requests (143-147)
plugins/out_s3/s3_blob.c (6)
src/aws/flb_aws_util.c (1)
  • flb_get_s3_key (1311-1389)
plugins/out_s3/s3_auth.c (1)
  • s3_auth_fetch_presigned_url (359-423)
src/flb_input_blob.c (1)
  • flb_input_blob_delivery_notification_destroy (222-231)
src/flb_blob_db.c (2)
  • flb_blob_db_lock (423-428)
  • flb_blob_db_unlock (430-435)
src/flb_log_event_decoder.c (2)
  • flb_log_event_decoder_init (103-120)
  • flb_log_event_decoder_next (309-456)
plugins/out_s3/s3_queue.c (1)
  • s3_queue_add_pending_file (269-282)
tests/runtime/out_s3_config.c (3)
tests/include/s3_test_helpers.h (5)
  • s3_test_next (451-459)
  • s3_test_create_temp_db_path (138-196)
  • s3_test_create_temp_store_dir (206-270)
  • s3_test_cleanup_temp_db (430-443)
  • s3_test_cleanup_temp_store_dir (397-422)
tests/include/aws_client_mock.c (3)
  • flb_aws_client_mock_configure_generator (42-47)
  • flb_aws_client_get_mock_generator (137-140)
  • flb_aws_client_mock_clear_generator_instance (71-74)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
plugins/out_s3/s3_store.c (4)
src/flb_sds.c (4)
  • flb_sds_create_size (92-95)
  • flb_sds_printf (336-387)
  • flb_sds_destroy (389-399)
  • flb_sds_create (78-90)
src/flb_fstore.c (7)
  • flb_fstore_file_delete (259-277)
  • flb_fstore_file_create (153-190)
  • flb_fstore_file_append (298-329)
  • flb_fstore_create (479-525)
  • flb_fstore_destroy (527-564)
  • flb_fstore_stream_create (335-402)
  • flb_fstore_file_inactive (238-256)
include/fluent-bit/flb_mem.h (2)
  • flb_free (126-128)
  • flb_calloc (84-96)
include/fluent-bit/flb_compat.h (1)
  • gmtime_r (76-82)
plugins/out_s3/s3_blob.h (1)
plugins/out_s3/s3_blob.c (4)
  • s3_blob_register_parts (394-427)
  • s3_blob_process_events (442-531)
  • s3_blob_recover_state (171-190)
  • s3_blob_notify_delivery (124-161)
tests/runtime/out_s3_edge_cases.c (3)
tests/include/s3_test_helpers.h (5)
  • s3_test_next (451-459)
  • s3_test_create_temp_db_path (138-196)
  • s3_test_create_temp_store_dir (206-270)
  • s3_test_cleanup_temp_db (430-443)
  • s3_test_cleanup_temp_store_dir (397-422)
tests/include/aws_client_mock.c (3)
  • flb_aws_client_mock_configure_generator (42-47)
  • flb_aws_client_get_mock_generator (137-140)
  • flb_aws_client_mock_clear_generator_instance (71-74)
src/flb_lib.c (7)
  • flb_create (143-225)
  • flb_service_set (652-678)
  • flb_input (266-276)
  • flb_start (983-994)
  • flb_stop (1011-1055)
  • flb_destroy (228-263)
  • flb_lib_push (843-870)
plugins/out_s3/s3_store.h (1)
plugins/out_s3/s3_store.c (1)
  • flb_chunk_file_open (553-589)
tests/internal/aws_util.c (3)
src/aws/flb_aws_util.c (1)
  • flb_get_s3_key (1311-1389)
src/flb_sds.c (1)
  • flb_sds_destroy (389-399)
tests/internal/fuzzers/aws_util_fuzzer.c (1)
  • initialization_crutch (27-36)
🔇 Additional comments (76)
tests/runtime/CMakeLists.txt (1)

238-246: LGTM! Good modular test organization.

The expansion from a single test file to nine focused test files improves test maintainability and provides better isolation for different S3 plugin aspects (config, multipart, format, error handling, edge cases, special features, queue, recovery, blob).

tests/include/aws_client_mock.c (3)

2-10: Good documentation for the inclusion pattern.

The comment clearly explains that this .c file is directly included in test files rather than compiled separately, which prevents symbol conflicts when tests are built as standalone executables.


56-64: Good double-free protection pattern.

Clearing the instance pointer before calling destroy prevents double-free scenarios when the generator is destroyed multiple times or when the S3 plugin cleanup has already freed the mock.


266-299: DATA field handling correctly allocates memory for flb_http_client_destroy cleanup.

The implementation properly:

  1. Searches for DATA_SIZE or DATA_LEN to determine allocation size
  2. Falls back to strlen() if size not specified
  3. Allocates, copies, and null-terminates the data
  4. Sets all three response fields (data, data_len, data_size)

One minor observation: if flb_malloc fails at line 291, c->resp.data remains NULL, but this is acceptable since the code doesn't set the other fields in that case, and the caller will receive a response with no data which is a reasonable failure mode.

plugins/out_s3/s3_auth.c (3)

1-128: Well-structured authorization endpoint initialization.

The implementation correctly:

  • Parses URL and handles failures with proper cleanup
  • Creates TLS context only for HTTPS endpoints
  • Properly cleans up TLS context if upstream creation fails
  • Uses flb_output_upstream_set to wire the upstream to the plugin instance

262-357: Solid URL path builder with proper encoding and cleanup.

The implementation:

  • URL-encodes both S3 key and upload ID
  • Strips leading slash from key to avoid double slashes
  • Has comprehensive error cleanup paths
  • Returns the formatted path via flb_sds_printf result

359-423: Complete presigned URL fetch flow with proper resource management.

The implementation correctly handles all edge cases:

  • Returns success (0) with NULL result when endpoint is not configured
  • Validates upstream initialization before use
  • Handles double-slash prevention for endpoint/path concatenation
  • Properly destroys both url_path and full_url on all paths
tests/internal/parquet.c (4)

1-53: Good header organization with proper includes and extern "C" linkage.

The file correctly includes all necessary standard headers (stdio, stdlib, string, stdint, math.h) and establishes C++ interoperability for the validate_parquet_file function.


69-136: ChunkIO header construction follows the specification correctly.

The implementation properly sets:

  • Magic bytes (0xC1, 0x00) at bytes 0-1
  • Content length (big-endian) at bytes 10-13
  • Metadata length (big-endian) at bytes 22-23

The CRC32 fields are left zeroed, which is acceptable for test scenarios where validation is typically disabled.


1941-2016: Proper use of standard NAN and INFINITY macros.

Using macros from <math.h> instead of division-by-zero expressions avoids potential SIGFPE traps when floating-point exceptions are enabled.


2081-2156: Excellent test coverage across multiple categories.

The TEST_LIST provides comprehensive coverage:

  • 9 functional tests for basic operations and type conversions
  • 12 data quality tests for NULL handling and schema mismatches
  • 9 scale/boundary tests including batch boundaries
  • 1 optimized compression test covering all 4 types
  • 10 negative/error tests
  • 3 real-world scenario tests

The organization into categories with comments aids maintainability.

tests/include/s3_test_helpers.h (6)

14-43: Good cross-platform abstraction layer.

The header provides clean platform-specific macros for common operations (sleep, unlink, rmdir, mkdir, close) and appropriate includes for both Windows and POSIX environments.


138-196: Correct implementation of unique temp file creation.

The Windows path uses GetTempFileNameA for atomic file creation, while POSIX uses mkstemps (correctly documented as a BSD/Glibc extension). Both approaches ensure uniqueness and avoid race conditions.


272-298: Safe nftw callback implementation.

The callback correctly handles different file types (FTW_F, FTW_SL, FTW_SLN for files/symlinks, FTW_D/FTW_DP for directories) and uses FTW_DEPTH to ensure children are visited before parents.


397-422: Secure directory cleanup without command injection.

Replacing system("rm -rf ...") with platform-specific APIs (nftw on POSIX, FindFirstFile/RemoveDirectoryA on Windows) eliminates the command injection vulnerability while maintaining functionality.


451-459: Correct Windows/POSIX environment variable handling.

Uses _putenv_s on Windows and setenv on POSIX, avoiding the portability issue from the previous review.


477-509: Good polling-based wait pattern for mock exhaustion.

The s3_test_poll_mock_exhausted function and S3_TEST_WAIT_MOCK_EXHAUSTED macro provide a deterministic alternative to sleep-based waiting, ensuring tests fail explicitly if mock chains aren't consumed rather than timing out silently.

tests/internal/aws_util.c (4)

47-68: Well-structured test coverage for new $FILE_PATH and $FILE_NAME variables.

The new macros comprehensively test the S3 key generation with file path/name substitution, including important edge cases:

  • Relative paths (./logs/file.log)
  • Triple-dot prefixes (...hidden/file.log)
  • Parent traversal (../../../etc/passwd)
  • Hidden files (.hidden)
  • Mixed dots and slashes (.///logs/./file.log)

This provides good coverage for the path sanitization logic that strips leading dots and slashes.


200-201: Correct signature update for existing tests.

The additional NULL parameter correctly indicates that these tests don't exercise the new $FILE_PATH/$FILE_NAME substitution feature, maintaining backward compatibility while extending the API.


402-533: LGTM - New file path tests follow established patterns.

All eight new test functions correctly:

  • Initialize configuration with initialization_crutch()
  • Set up UTC time context
  • Call flb_get_s3_key with appropriate parameters
  • Verify expected output
  • Clean up with flb_sds_destroy

550-558: Test registration looks correct.

All new test functions are properly registered in the test list.

plugins/out_s3/s3_queue.c (9)

22-30: Good Windows portability wrapper for file access checks.

The flb_access macro provides cross-platform compatibility for file existence checks. This addresses the portability concern for the access() function.


68-121: Clean memory management in queue entry creation.

The function properly handles allocation failures by cleaning up previously allocated resources before returning -1. The documentation clearly states the caller must hold the lock.


142-212: LGTM - Proper resource cleanup on allocation failures.

The cascading cleanup pattern correctly frees previously allocated resources when any allocation fails, preventing memory leaks.


444-474: ETag memory management correctly implemented.

The code properly:

  1. Destroys the ETag SDS after saving to database (lines 452-454)
  2. Cleans up any allocated ETag on upload failure (lines 470-474)

This addresses the previously identified memory leak.


598-603: Proper validation of part_count before allocation.

The bounds check (1–10,000) correctly validates part_count before allocating and using it, preventing issues with corrupt or excessive values. This aligns with AWS S3's maximum parts limit.


1021-1045: Recovery loop properly advances through pending files.

The loop correctly avoids infinite iteration by:

  1. Marking parts as in_progress (line 1042) to exclude them from subsequent queries
  2. Resetting in_progress when skipping due to endpoint mismatch (line 1037)
  3. Processing or cleaning up each file before the next iteration

792-797: Proper NULL check before database operation.

The check if (ctx->blob_db.db != NULL) at line 793 ensures the database call is only made when the database is configured, preventing potential null pointer issues.


721-728: Correct NULL check before database operation in failure path.

The if (ctx->blob_db.db != NULL) check ensures the database operation is only performed when the database is configured.


1158-1218: LGTM - Fstore recovery properly handles chunk states.

The function correctly:

  • Skips the metadata stream
  • Respects chunk lock state
  • Handles retry limit exceeded by marking inactive
  • Locks files before queue addition and unlocks on failure
plugins/out_s3/s3.h (5)

32-33: Forward declaration correctly added for Parquet schema.

The forward declaration allows the header to reference struct flb_parquet_schema without including the full Parquet header, keeping dependencies minimal.


35-36: MAX_FILE_SIZE correctly reflects AWS S3 50TB limit.

The value 54975581388800ULL equals 50 TiB (50 × 1024⁴), which matches the AWS S3 maximum object size for multipart uploads (5 TiB max part size × 10,000 max parts).


52-74: Well-designed upload queue entry structure.

The structure cleanly supports both:

  • DB-tracked uploads (via file_id, part_db_id, part_id)
  • Non-DB uploads (via upload_file pointer)

The needs_upload_creation flag enables deferred multipart upload initialization during recovery.


150-157: Good state management and concurrency additions.

The new fields properly support:

  • initial_upload_done: Tracks first upload completion
  • is_exiting: Graceful shutdown signaling
  • needs_recovery: Recovery state tracking
  • upload_queue_lock: Thread-safe queue access

175-177: Parquet integration fields properly added.

The format, schema_str, and cached_arrow_schema fields provide the necessary hooks for Parquet output support.

plugins/out_s3/s3_blob.h (1)

1-43: Clean header with well-documented public API.

The header properly declares the blob management API with:

  • Appropriate include guards
  • Minimal necessary includes
  • Clear documentation comments for each function
  • Signatures matching the implementations in s3_blob.c
plugins/out_s3/s3_store.h (2)

51-52: Useful context-free chunk file opener.

The function provides a clean interface for opening and validating chunk files, with proper header validation and metadata skipping. The "context-free" design allows usage from C++ code (important for Parquet integration).


32-32: Field rename from file_path to stream_path in struct s3_file.

The rename better reflects the field's purpose as a stream directory path. This is distinct from upload_queue.file_path (the actual file path for blob uploads). All references throughout the codebase have been properly updated.

plugins/out_s3/s3_store.c (1)

83-114: Safe list deletion during iteration; good hardening around NULL fsf->data
Deleting the file entry under mk_list_foreach_safe() is the right pattern, and returning only unlocked matches keeps semantics clear.

plugins/out_s3/s3_stream.c (1)

480-515: MsgPack unpack loop handling is much safer now
Handling EXTRA_BYTES as “success with remaining data” and explicitly erroring on MSGPACK_UNPACK_NOMEM_ERROR avoids the common infinite-loop footgun.

plugins/out_s3/s3_blob.c (1)

192-244: This review comment is incorrect; the DB helper functions already clean up on error returns.

Both flb_blob_db_file_get_next_stale and flb_blob_db_file_get_next_aborted return -1 on error and properly clean up all dynamically allocated output parameters (sds strings) before returning:

  • flb_blob_db_file_get_next_stale: On error (exists == -1), cleans up and NULLs *path, *remote_id, and *tag.
  • flb_blob_db_file_get_next_aborted: On error (exists == -1), cleans up and NULLs all five output parameters (*path, *source, *remote_id, *file_tag, *s3_key). Additionally, this function initializes all output pointers to NULL upfront for added safety.

The nested allocation pattern in both functions ensures that if any cfl_sds_create() fails, all previously allocated strings are cleaned up before the error path returns control to the caller. When the caller's loop breaks on ret != 1 (including error case -1), the output pointers are either NULL or have already been properly managed by the helper function.

No memory leak exists in these call sites.

Likely an incorrect or invalid review comment.

tests/runtime/out_s3_queue.c (1)

1-12: Test file structure looks good.

The test module follows the established pattern for S3 plugin tests with proper includes, mock chain configuration, and resource cleanup. The seven test cases cover key queue functionality including basic operations, multi-worker scenarios, retry mechanisms, timeouts, concurrency, and shutdown behavior.

Note: Line 8 directly includes aws_client_mock.c rather than linking it separately. This pattern is consistent across all S3 test files and works because each test is compiled as a separate executable, avoiding symbol collision. While not ideal from an architectural standpoint, this is an existing pattern in the codebase.

tests/runtime/out_s3_blob.c (2)

1-12: Test module for blob database configuration looks well-structured.

The test suite covers key blob database configuration scenarios including custom paths, delivery attempt limits (file-level, part-level, and unlimited), freshness thresholds, and chunk sizing. Each test validates that the configuration is accepted and a successful upload completes.


137-151: Good implementation of retry loop for async DB creation.

The retry loop correctly handles the race condition where the database file may not exist immediately after flb_start(). Using S3_TEST_SLEEP_MS ensures cross-platform compatibility.

tests/runtime/out_s3_format.c (2)

1-15: Comprehensive format and compression test suite.

The test module provides good coverage of JSON formatting options, various compression algorithms (gzip, zstd, snappy), log_key extraction, and invalid configuration handling. All flb_lib_push return values are properly checked.


851-910: Good negative test case for invalid configuration.

The test correctly expects flb_start() to fail with an invalid compression type and properly handles cleanup regardless of the outcome. The defensive check at lines 900-902 ensures flb_stop() is only called if start unexpectedly succeeded.

tests/runtime/out_s3_error_handling.c (3)

14-41: Well-defined AWS error response templates.

The error response templates are properly structured XML matching real AWS S3 error formats. This enables realistic testing of error handling paths for various failure scenarios.


67-76: Good defensive NULL handling with early bailout.

The pattern of checking for NULL after s3_test_create_temp_* functions and returning early prevents potential NULL pointer dereferences. This pattern is consistently applied across all test functions in this file.


743-756: Test list is complete and properly structured.

All 10 error handling tests are registered with appropriate names. The list is properly NULL-terminated for the test framework.

tests/runtime/out_s3_edge_cases.c (3)

1-13: Good inclusion of flb_compat.h for cross-platform compatibility.

The inclusion of flb_compat.h at line 4 helps ensure cross-platform compatibility for system calls and types. This addresses previous portability concerns.


938-1079: Good lifecycle stability test with proper isolation.

The rapid start/stop test correctly creates separate mock chains, temp paths, and contexts for each cycle. The call to s3_test_next() at line 1014 ensures unique temporary paths for the second cycle. This validates that the plugin handles repeated initialization/shutdown cleanly.


1081-1096: Test list is complete.

All 12 edge case tests are properly registered with descriptive names.

plugins/out_s3/s3_multipart.c (19)

36-65: LGTM! Cross-platform abstractions are well implemented.

The Windows-specific O_BINARY flag addresses binary data corruption concerns (e.g., Parquet files), and the sleep_ms() wrapper properly handles platform differences between Sleep() (Windows, milliseconds) and usleep() (POSIX, microseconds).


106-148: LGTM! ETag extraction now correctly preserves quotes.

The implementation properly includes the surrounding double-quotes in the extracted ETag value, which is required by the S3 CompleteMultipartUpload API. Bounds checking is also correct.


164-216: LGTM! Part size calculation correctly enforces AWS S3 constraints.

The function properly handles the tradeoffs between user configuration, file size, and AWS hard limits (5 MiB minimum, 5 GiB maximum, 10000 parts maximum).


218-248: LGTM! MD5 base64 encoding now properly NUL-terminates output.

The function correctly ensures the output buffer is NUL-terminated (Lines 238-241) with a bounds check to prevent buffer overflow.


250-299: LGTM! URI construction handles both presigned URLs and standard paths correctly.

The function properly URL-encodes the S3 key and handles memory cleanup on error paths.


301-355: LGTM! CompleteMultipartUpload XML payload construction is robust.

The function properly validates all ETags before building the XML and handles memory cleanup on all error paths.


408-413: LGTM! Storage class header now correctly increments the header count.

The n++ at line 412 properly advances the header index after setting the storage_class header, addressing the previous review concern.


420-445: LGTM! Resource cleanup is thorough and handles NULL gracefully.


451-500: LGTM! Multipart upload structure initialization with proper error handling.

The function correctly manages the sequential index persistence and provides appropriate cleanup on all error paths.


502-559: LGTM! CreateMultipartUpload initiation handles the API flow correctly.


561-653: LGTM! UploadPart implementation correctly handles part uploads with optional MD5 verification.


655-736: LGTM! CompleteMultipartUpload handles success and NoSuchUpload error appropriately.

The function properly returns distinct error codes for different failure modes, enabling callers to handle upload expiration gracefully.


738-800: LGTM! AbortMultipartUpload correctly handles the abort flow.


802-880: LGTM! ListParts validation correctly distinguishes between transient and definitive failures.

The function now properly returns -1 for transient errors (network failures, unexpected status codes) and 0 for definitive NoSuchUpload responses, allowing callers to implement appropriate retry logic.


894-1016: LGTM! File part upload handles 32-bit platform safety and efficient I/O.

The overflow guards using uint64_t casting and the direct-to-buffer read pattern address previous review concerns effectively.


1018-1043: LGTM! Abort helper provides consistent presigned URL handling with fallback.

The function gracefully falls back to aborting without a presigned URL if the URL fetch fails, ensuring cleanup attempts aren't blocked by transient auth endpoint issues.


1045-1081: LGTM! Initialization helper properly sets up the multipart upload structure.


1083-1201: LGTM! Part upload orchestration with proper AWS limits and overflow-safe backoff.

The function correctly enforces the 10000-part AWS limit, ensures at least one retry attempt, and uses overflow-safe exponential backoff calculation.


1203-1294: LGTM! Complete multipart upload orchestration with proper error handling.

The function correctly validates uploaded byte count against file size and ensures resources are cleaned up on all paths.

tests/runtime/out_s3_multipart.c (5)

1-13: LGTM! Test includes provide necessary mock and helper functionality.

Including aws_client_mock.c directly (line 8) is acceptable in test code to simplify mock linkage.


14-67: LGTM! Test follows best practices for isolation and cleanup.

The test properly uses unique temporary paths, condition-based waiting (S3_TEST_WAIT_MOCK_EXHAUSTED), and thorough cleanup. The flb_lib_push() return value is correctly validated.


69-123: LGTM! Metadata test validates ACL header propagation.


125-498: LGTM! All test functions follow consistent patterns with proper validation and cleanup.

The flb_test_multiple_data_pushes test correctly validates each flb_lib_push() return value in the loop (lines 485-486), addressing the previous review concern.


500-512: LGTM! Test list is complete and well-organized.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@kalavt kalavt force-pushed the feature/aws-s3-parquet branch from 01929d6 to bb051a2 Compare January 15, 2026 08:42
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
plugins/out_s3/s3_store.h (1)

32-32: Field rename from file_path to stream_path.

This semantic rename better reflects the purpose (stream directory path). Ensure all references in s3_queue.c have been updated to use stream_path as flagged in the previous review.

tests/runtime/out_s3_edge_cases.c (1)

7-8: Including implementation file directly is non-standard.

Including aws_client_mock.c instead of a header violates best practices. Create aws_client_mock.h with function declarations and link the implementation at build time.

🧹 Nitpick comments (8)
tests/include/aws_client_mock.c (1)

266-299: Consider extracting data length lookup to avoid repeated iteration.

The inner loop at lines 275-286 scans all config parameters each time FLB_AWS_CLIENT_MOCK_SET_DATA is encountered. While functionally correct, this could be extracted into a helper or the loop could break earlier to improve clarity.

♻️ Optional: Extract helper function
+static size_t get_response_data_len(struct flb_aws_client_mock_response *response)
+{
+    int j;
+    for (j = 0; j < response->length; ++j) {
+        if (response->config_parameters[j].config_parameter ==
+            FLB_AWS_CLIENT_MOCK_SET_DATA_SIZE ||
+            response->config_parameters[j].config_parameter ==
+            FLB_AWS_CLIENT_MOCK_SET_DATA_LEN) {
+            return (size_t)(uintptr_t)response->config_parameters[j].config_value;
+        }
+    }
+    return 0;
+}
plugins/out_s3/s3_multipart.h (1)

94-100: Inconsistent parameter type: flb_sds_t vs const char *.

s3_multipart_upload_file_part uses flb_sds_t pre_signed_url while other functions use char *pre_signed_url. Consider using const char * for consistency with the rest of the API, unless flb_sds_t is specifically required here.

♻️ Suggested fix for consistency
 int s3_multipart_upload_file_part(struct flb_s3 *ctx,
                                    const char *file_path,
                                    off_t offset_start,
                                    off_t offset_end,
                                    struct multipart_upload *m_upload,
-                                   flb_sds_t pre_signed_url);
+                                   char *pre_signed_url);
plugins/out_s3/s3.h (1)

192-197: Consider adding const qualifier to immutable parameters.

The file_path and tag parameters in s3_initiate_multipart_upload appear to be read-only based on the implementation in s3.c. Adding const would clarify intent.

♻️ Optional: Add const qualifiers
 int s3_initiate_multipart_upload(struct flb_s3 *ctx,
-                                           uint64_t file_id,
-                                           const char *file_path,
-                                           const char *tag,
-                                           int tag_len);
+                                 uint64_t file_id,
+                                 const char *file_path,
+                                 const char *tag,
+                                 int tag_len);

Note: Parameters already have const, but the indentation is inconsistent.

plugins/out_s3/s3_auth.c (2)

155-171: Consider checking return values from HTTP header functions.

The return values from flb_http_add_header(), flb_http_basic_auth(), and flb_http_bearer_auth() are not checked. While these functions rarely fail, authentication header failures could cause silent authorization issues.

♻️ Suggested improvement
 static int setup_http_client_headers(struct flb_s3 *ctx,
                                       struct flb_http_client *client)
 {
-    flb_http_add_header(client, "Accept", 6, "text/plain", 10);
-    flb_http_add_header(client, "User-Agent", 10, "Fluent-Bit", 10);
+    int ret;
+
+    ret = flb_http_add_header(client, "Accept", 6, "text/plain", 10);
+    if (ret < 0) {
+        return -1;
+    }
+    ret = flb_http_add_header(client, "User-Agent", 10, "Fluent-Bit", 10);
+    if (ret < 0) {
+        return -1;
+    }
 
     if (ctx->authorization_endpoint_username &&
         ctx->authorization_endpoint_password) {
-        flb_http_basic_auth(client, ctx->authorization_endpoint_username,
-                           ctx->authorization_endpoint_password);
+        ret = flb_http_basic_auth(client, ctx->authorization_endpoint_username,
+                                  ctx->authorization_endpoint_password);
+        if (ret < 0) {
+            return -1;
+        }
     }
     else if (ctx->authorization_endpoint_bearer_token) {
-        flb_http_bearer_auth(client, ctx->authorization_endpoint_bearer_token);
+        ret = flb_http_bearer_auth(client, ctx->authorization_endpoint_bearer_token);
+        if (ret < 0) {
+            return -1;
+        }
     }
 
     return 0;
 }

207-207: Check return value of setup_http_client_headers().

If header setup fails (especially authentication headers), the request should not proceed.

♻️ Suggested fix
-    setup_http_client_headers(ctx, http_client);
+    ret = setup_http_client_headers(ctx, http_client);
+    if (ret < 0) {
+        flb_plg_error(ctx->ins, "Failed to setup HTTP client headers");
+        goto cleanup;
+    }
tests/internal/parquet.c (1)

288-304: Tempfile paths are collision-prone and not portable.

The current implementation uses fixed /tmp/flb_test_*.{msgpack,parquet} paths based only on test_name. This can cause issues with:

  1. Parallel test execution: Tests with the same name running concurrently will collide
  2. Stale files: Failed tests may leave files that affect subsequent runs
  3. Windows portability: /tmp doesn't exist on Windows

Consider using PID + random suffix or mkstemp()-style unique paths:

snprintf(ctx->msgpack_file, sizeof(ctx->msgpack_file),
         "/tmp/flb_test_%s_%ld_%d.msgpack", test_name, (long)getpid(), rand());

Or better, use a cross-platform temp directory helper similar to s3_test_create_temp_db_path() from s3_test_helpers.h.

tests/runtime/out_s3_edge_cases.c (2)

91-92: Redundant assertion.

ret was already checked at line 81 and hasn't been modified. This check provides no additional value.

Suggested fix
-    /* Verify no crash occurred - reaching here is the test */
-    TEST_CHECK(ret == 0);
-
     s3_test_cleanup_temp_db(db_path);

1072-1073: Redundant assertion.

ret from flb_lib_push was already checked at line 1065. This duplicate check adds no value.

Suggested fix
-    /* Verify second configuration worked - flb_lib_push succeeded */
-    TEST_CHECK(ret >= 0);
-
     s3_test_cleanup_temp_db(db_path);
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 01929d6 and bb051a2.

📒 Files selected for processing (33)
  • plugins/out_s3/CMakeLists.txt
  • plugins/out_s3/s3.c
  • plugins/out_s3/s3.h
  • plugins/out_s3/s3_auth.c
  • plugins/out_s3/s3_auth.h
  • plugins/out_s3/s3_blob.c
  • plugins/out_s3/s3_blob.h
  • plugins/out_s3/s3_multipart.c
  • plugins/out_s3/s3_multipart.h
  • plugins/out_s3/s3_queue.c
  • plugins/out_s3/s3_queue.h
  • plugins/out_s3/s3_store.c
  • plugins/out_s3/s3_store.h
  • plugins/out_s3/s3_stream.c
  • plugins/out_s3/s3_stream.h
  • tests/include/aws_client_mock.c
  • tests/include/aws_client_mock.h
  • tests/include/s3_test_helpers.h
  • tests/internal/CMakeLists.txt
  • tests/internal/aws_compress.c
  • tests/internal/aws_util.c
  • tests/internal/parquet.c
  • tests/runtime/CMakeLists.txt
  • tests/runtime/out_s3.c
  • tests/runtime/out_s3_blob.c
  • tests/runtime/out_s3_config.c
  • tests/runtime/out_s3_edge_cases.c
  • tests/runtime/out_s3_error_handling.c
  • tests/runtime/out_s3_format.c
  • tests/runtime/out_s3_multipart.c
  • tests/runtime/out_s3_queue.c
  • tests/runtime/out_s3_recovery.c
  • tests/runtime/out_s3_special_features.c
💤 Files with no reviewable changes (1)
  • tests/runtime/out_s3.c
🚧 Files skipped from review as they are similar to previous changes (8)
  • tests/internal/aws_compress.c
  • tests/internal/CMakeLists.txt
  • tests/runtime/out_s3_queue.c
  • plugins/out_s3/s3_auth.h
  • tests/include/aws_client_mock.h
  • tests/runtime/out_s3_format.c
  • plugins/out_s3/CMakeLists.txt
  • plugins/out_s3/s3_queue.h
🧰 Additional context used
🧠 Learnings (10)
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.

Applied to files:

  • tests/internal/parquet.c
  • plugins/out_s3/s3_stream.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.

Applied to files:

  • tests/internal/parquet.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like `#ifdef` FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

  • tests/internal/parquet.c
  • plugins/out_s3/s3_stream.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use `#ifdef` FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

  • tests/internal/parquet.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.

Applied to files:

  • tests/internal/parquet.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

  • plugins/out_s3/s3_store.c
  • tests/runtime/out_s3_error_handling.c
  • tests/include/s3_test_helpers.h
  • plugins/out_s3/s3_queue.c
📚 Learning: 2025-12-17T18:45:02.283Z
Learnt from: edsiper
Repo: fluent/fluent-bit PR: 11286
File: src/flb_http_client.c:1657-1715
Timestamp: 2025-12-17T18:45:02.283Z
Learning: In the Fluent Bit OAuth2 implementation (src/flb_oauth2.c), the function flb_oauth2_get_access_token returns a pointer to the cached token stored in the OAuth2 context (ctx->access_token), not a new allocation. The OAuth2 context manages the token lifecycle, so callers must not free the returned token pointer.

Applied to files:

  • plugins/out_s3/s3_auth.c
  • plugins/out_s3/s3.h
📚 Learning: 2025-09-04T12:35:36.904Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:3275-3282
Timestamp: 2025-09-04T12:35:36.904Z
Learning: The out_s3 plugin intentionally uses a simple numeric comparison for retry_limit (chunk->failures >= ctx->ins->retry_limit) rather than the standard Fluent Bit pattern that checks for FLB_OUT_RETRY_UNLIMITED (-1). The maintainer wants to keep this current behavior for consistency within the plugin.

Applied to files:

  • plugins/out_s3/s3_blob.c
  • plugins/out_s3/s3_stream.c
  • plugins/out_s3/s3_queue.c
  • plugins/out_s3/s3_multipart.c
📚 Learning: 2025-09-04T12:35:22.872Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:1339-1344
Timestamp: 2025-09-04T12:35:22.872Z
Learning: In the Fluent Bit S3 plugin, the user prefers to maintain current retry_limit behavior without special handling for FLB_OUT_RETRY_UNLIMITED (-1), as there's no documentation indicating -1 should be used for infinite retries and consistency with current logic is preferred.

Applied to files:

  • plugins/out_s3/s3_blob.c
  • plugins/out_s3/s3_queue.c
  • plugins/out_s3/s3_multipart.c
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.

Applied to files:

  • plugins/out_s3/s3_stream.c
🧬 Code graph analysis (14)
tests/internal/parquet.c (3)
src/flb_parquet_impl.cpp (1)
  • field_name (548-548)
src/flb_pack.c (1)
  • flb_pack_json (530-535)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
plugins/out_s3/s3_store.c (2)
src/flb_fstore.c (7)
  • flb_fstore_file_create (153-190)
  • flb_fstore_file_meta_set (74-108)
  • flb_fstore_file_append (298-329)
  • flb_fstore_create (479-525)
  • flb_fstore_destroy (527-564)
  • flb_fstore_stream_create (335-402)
  • flb_fstore_file_inactive (238-256)
include/fluent-bit/flb_compat.h (1)
  • gmtime_r (76-82)
tests/runtime/out_s3_multipart.c (4)
tests/include/s3_test_helpers.h (5)
  • s3_test_next (451-459)
  • s3_test_create_temp_db_path (138-196)
  • s3_test_create_temp_store_dir (206-270)
  • s3_test_cleanup_temp_db (430-443)
  • s3_test_cleanup_temp_store_dir (397-422)
tests/include/aws_client_mock.c (3)
  • flb_aws_client_mock_configure_generator (42-47)
  • flb_aws_client_get_mock_generator (137-140)
  • flb_aws_client_mock_clear_generator_instance (71-74)
src/flb_lib.c (1)
  • flb_lib_push (843-870)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
plugins/out_s3/s3_store.h (1)
plugins/out_s3/s3_store.c (1)
  • flb_chunk_file_open (553-632)
tests/runtime/out_s3_blob.c (2)
tests/include/s3_test_helpers.h (5)
  • s3_test_next (451-459)
  • s3_test_create_temp_db_path (138-196)
  • s3_test_create_temp_store_dir (206-270)
  • s3_test_cleanup_temp_db (430-443)
  • s3_test_cleanup_temp_store_dir (397-422)
tests/include/aws_client_mock.c (3)
  • flb_aws_client_mock_configure_generator (42-47)
  • flb_aws_client_get_mock_generator (137-140)
  • flb_aws_client_mock_clear_generator_instance (71-74)
plugins/out_s3/s3_blob.h (1)
plugins/out_s3/s3_blob.c (4)
  • s3_blob_register_parts (403-436)
  • s3_blob_process_events (451-538)
  • s3_blob_recover_state (171-199)
  • s3_blob_notify_delivery (124-161)
plugins/out_s3/s3_multipart.h (1)
plugins/out_s3/s3_multipart.c (13)
  • flb_s3_calculate_optimal_part_size (164-216)
  • s3_multipart_get_md5_base64 (218-248)
  • s3_multipart_create_headers (357-418)
  • s3_multipart_upload_new (451-500)
  • s3_multipart_upload_destroy (420-445)
  • s3_multipart_initiate (502-559)
  • s3_multipart_upload_part (561-653)
  • s3_multipart_complete (655-736)
  • s3_multipart_abort (738-800)
  • s3_multipart_abort_with_url (1022-1043)
  • s3_multipart_check_upload_exists (806-880)
  • s3_multipart_upload_file_part (894-1016)
  • s3_multipart_upload_file (1203-1294)
plugins/out_s3/s3_auth.c (5)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
src/flb_utils.c (1)
  • flb_utils_url_split (1531-1667)
src/flb_upstream.c (2)
  • flb_upstream_create_url (416-481)
  • flb_upstream_conn_get (711-844)
src/flb_http_client.c (6)
  • flb_http_add_header (1015-1047)
  • flb_http_basic_auth (1394-1398)
  • flb_http_bearer_auth (1406-1441)
  • flb_http_client (866-911)
  • flb_http_do (1645-1705)
  • flb_http_client_destroy (1821-1828)
src/aws/flb_aws_util.c (1)
  • flb_aws_uri_encode_path (1490-1526)
tests/runtime/out_s3_edge_cases.c (3)
tests/include/s3_test_helpers.h (5)
  • s3_test_next (451-459)
  • s3_test_create_temp_db_path (138-196)
  • s3_test_create_temp_store_dir (206-270)
  • s3_test_cleanup_temp_db (430-443)
  • s3_test_cleanup_temp_store_dir (397-422)
tests/include/aws_client_mock.c (3)
  • flb_aws_client_mock_configure_generator (42-47)
  • flb_aws_client_get_mock_generator (137-140)
  • flb_aws_client_mock_clear_generator_instance (71-74)
src/flb_lib.c (6)
  • flb_create (143-225)
  • flb_input (266-276)
  • flb_start (983-994)
  • flb_stop (1011-1055)
  • flb_destroy (228-263)
  • flb_lib_push (843-870)
tests/internal/aws_util.c (1)
src/aws/flb_aws_util.c (1)
  • flb_get_s3_key (1311-1389)
plugins/out_s3/s3.h (1)
plugins/out_s3/s3.c (5)
  • cb_s3_upload (1340-1607)
  • s3_format_chunk (1196-1337)
  • s3_upload_file (1015-1120)
  • s3_initiate_multipart_upload (898-1013)
  • s3_plugin_under_test (73-79)
plugins/out_s3/s3_stream.c (4)
src/aws/flb_aws_compress.c (1)
  • flb_aws_compression_compress (110-127)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
src/flb_sds.c (4)
  • flb_sds_create_size (92-95)
  • flb_sds_cat (120-141)
  • flb_sds_create (78-90)
  • flb_sds_printf (336-387)
plugins/out_s3/s3_store.c (1)
  • flb_chunk_file_open (553-632)
plugins/out_s3/s3_queue.c (8)
include/fluent-bit/flb_mem.h (2)
  • flb_calloc (84-96)
  • flb_free (126-128)
plugins/out_s3/s3_store.c (3)
  • s3_store_buffer_put (217-258)
  • s3_store_file_unlock (641-646)
  • s3_store_file_lock (634-639)
plugins/out_s3/s3_auth.c (1)
  • s3_auth_fetch_presigned_url (359-423)
plugins/out_s3/s3_multipart.c (2)
  • s3_multipart_upload_file_part (894-1016)
  • s3_multipart_complete (655-736)
src/flb_blob_db.c (2)
  • flb_blob_db_lock (423-428)
  • flb_blob_db_unlock (430-435)
plugins/out_s3/s3.c (1)
  • s3_format_chunk (1196-1337)
lib/cfl/src/cfl_sds.c (2)
  • cfl_sds_len (88-91)
  • cfl_sds_destroy (127-137)
plugins/out_s3/s3_blob.c (1)
  • s3_blob_notify_delivery (124-161)
plugins/out_s3/s3_multipart.c (5)
include/fluent-bit/flb_compat.h (1)
  • usleep (129-133)
plugins/out_s3/s3.c (1)
  • write_seq_index (127-147)
include/fluent-bit/flb_mem.h (2)
  • flb_calloc (84-96)
  • flb_free (126-128)
src/flb_http_client.c (1)
  • flb_http_client_destroy (1821-1828)
plugins/out_s3/s3_auth.c (1)
  • s3_auth_fetch_presigned_url (359-423)
🔇 Additional comments (84)
tests/include/aws_client_mock.c (4)

2-10: Good documentation for test mock usage pattern.

The added note clearly explains why this .c file is directly included rather than compiled separately, helping prevent future build issues.


54-64: LGTM: Proper double-free prevention.

Clearing flb_aws_client_mock_instance before calling flb_aws_client_mock_destroy prevents double-free scenarios when the mock was already freed by S3 plugin cleanup.


66-74: Appropriate addition for test isolation.

This function enables safe reconfiguration between tests when the S3 plugin has already freed the mock client via flb_destroy().


306-308: Macro condition correctly excludes DATA from standard expansion.

The additional check ensures DATA is handled by the special case above rather than the generic macro expansion.

plugins/out_s3/s3_store.h (1)

51-52: Context-free chunk file opener enables C++ Parquet integration.

The implementation in s3_store.c properly validates the chunk header, metadata length bounds, and handles all error paths with appropriate cleanup. This design allows safe usage from the Parquet encoder without requiring S3 context.

plugins/out_s3/s3_multipart.h (2)

1-24: Well-structured header with appropriate includes.

The header establishes a clean API surface for multipart uploads with proper include guards and dependency on s3.h and sys/types.h for off_t.


77-88: Critical documentation for recovery correctness.

The warning about using the persisted s3_key rather than regenerating it is essential. Key formats with timestamps or sequence numbers would produce different keys, causing recovery failures. This documentation prevents a subtle but serious bug.

tests/internal/aws_util.c (3)

47-68: Comprehensive test macros for file path templating.

Good coverage of the new $FILE_PATH and $FILE_NAME placeholders with edge cases for path sanitization (relative paths, parent traversal, hidden files, mixed dots/slashes).


402-533: Well-structured new tests for file path key formatting.

The eight new test functions follow the established pattern and cover:

  • Basic $FILE_PATH substitution
  • Basic $FILE_NAME substitution
  • Combined $FILE_PATH-$FILE_NAME
  • Edge cases with various dot/slash prefixes

62-63: Verify the sanitization behavior prevents directory traversal.

The test confirms that ../../../etc/passwd is sanitized by stripping leading ../ components, resulting in etc/passwd, which is then safely inserted into the logs/$FILE_PATH format to produce logs/etc/passwd. This prevents attackers from using relative paths to escape the intended S3 object key namespace.

plugins/out_s3/s3.h (6)

29-33: Appropriate includes and forward declaration.

The pthread.h inclusion supports the new upload_queue_lock mutex, and the forward declaration for flb_parquet_schema correctly enables the pointer field without requiring the full Parquet header.


41-46: S3 multipart constants align with AWS limits.

The constants correctly reflect AWS S3 constraints: 5 MiB minimum, 5 GiB maximum per part, and 10,000 maximum parts. The default of 100 MiB is a reasonable balance.


52-74: Well-designed upload queue structure for DB-tracked multipart uploads.

The structure cleanly separates file identification (file_id, part_db_id, part_id), file location (file_path, offsets), S3 metadata (s3_key, upload_id), and retry state. The needs_upload_creation flag enables lazy multipart initiation.


82-82: Fixed-size etags array matches S3_AWS_MAX_PARTS.

The etags[10000] array size matches the S3_AWS_MAX_PARTS constant, ensuring sufficient storage for all possible parts in a multipart upload.


150-157: Thread-safety additions for concurrent workers.

The upload_queue_lock mutex and is_exiting flag enable safe operation when multiple workers are configured. The needs_recovery flag supports atomic recovery coordination as shown in cb_s3_upload.


175-178: Parquet integration fields properly added.

The format field enables runtime format selection, schema_str holds the user-provided schema configuration, and cached_arrow_schema caches the parsed schema to avoid re-parsing in coroutines (preventing yyjson stack overflow as noted in s3_format_chunk).

plugins/out_s3/s3_blob.h (1)

1-43: LGTM - Well-structured header file.

The header provides a clean public API for blob handling with proper include guards and documentation comments. The function declarations align with their implementations in s3_blob.c.

plugins/out_s3/s3_auth.c (1)

359-423: Well-structured presigned URL fetch function.

The function properly handles:

  • Early return when authorization endpoint is not configured
  • Validation that upstream is initialized before use
  • URL path construction with correct slash handling for all edge cases
  • Proper cleanup on all error paths
plugins/out_s3/s3_multipart.c (5)

36-65: Good cross-platform I/O abstraction.

The platform-specific macros correctly handle:

  • Binary mode file I/O on Windows (O_BINARY flag prevents text mode corruption for Parquet files)
  • Cross-platform sleep wrapper using Sleep() on Windows and usleep() on POSIX

106-148: ETag extraction correctly preserves quotes.

The function now extracts the ETag including surrounding quotes (required by AWS CompleteMultipartUpload API) and includes proper bounds checking.


922-946: Thorough overflow protection for 32-bit platforms.

The code properly guards against:

  • Negative offsets
  • Invalid offset ranges (end <= start)
  • 32-bit size_t truncation when casting from off_t
  • Defensive truncation verification

This addresses a critical correctness concern for 32-bit systems.


1099-1105: AWS 10000 parts limit properly enforced.

The check at line 1101 prevents exceeding the AWS S3 maximum of 10000 parts per multipart upload.


1203-1294: Complete multipart upload workflow with proper error handling.

The function:

  • Uses platform-specific stat struct declaration
  • Calls s3_multipart_abort_with_url() on all error paths after initiation
  • Verifies uploaded bytes match file size
  • Properly cleans up resources in all cases
plugins/out_s3/s3_blob.c (3)

41-122: Well-structured abort operation with proper S3 key handling.

The function:

  • Validates file_remote_id before proceeding
  • Prefers stored s3_key over regeneration (critical for time-based key formats)
  • Properly cleans up pre_signed_url in error paths
  • Logs warnings when falling back to key regeneration

285-286: Good handling of FLB_OUT_RETRY_UNLIMITED.

The check at line 285 correctly handles the unlimited retry case by comparing against FLB_OUT_RETRY_UNLIMITED before the numeric comparison.


451-538: Proper error handling for blob event processing.

The function correctly:

  • Uses int64_t file_id to receive flb_blob_db_file_insert() return value without truncation
  • Checks file_id < 0 to catch all negative error codes (not just -1)
  • Defensively frees partial allocations when flb_input_blob_file_get_info() fails
  • Cleans up database entries on registration/queue failures
plugins/out_s3/s3_queue.c (6)

22-30: Good cross-platform file existence check.

The flb_access wrapper properly handles the portability difference between _access() on Windows and access() on POSIX systems.


444-474: Proper ETag memory management.

The code correctly:

  • Frees the ETag SDS after persisting to database on success (lines 452-454)
  • Frees any allocated ETag on failure path (lines 470-474)
  • Uses bounds checking before array access

598-603: AWS part count validation properly enforced.

The validation ensures part_count is within valid range (1-10000) before allocating and copying ETags, preventing out-of-bounds access and excessive allocation.


1042-1055: Endpoint mismatch handling corrected.

The code now properly resets in_progress flag before skipping files with endpoint mismatch (line 1047), preventing files from being hidden until zombie cleanup.

The loop correctly terminates because:

  1. Each file's parts are marked in_progress=1 (line 1052) immediately after retrieval
  2. flb_blob_db_file_get_next_pending excludes files with in-progress parts
  3. This prevents the same file from being returned on subsequent iterations

1167-1228: Correct fstore recovery for legacy buffered chunks.

The function properly:

  • Skips metadata streams
  • Respects file lock state
  • Enforces retry limits
  • Adds files without DB tracking (file_id=0) for backward compatibility

719-740: Retry limit handling consistent with plugin convention.

The retry limit check at line 721 uses direct numeric comparison (entry->retry_counter >= ctx->ins->retry_limit), which is consistent with the S3 plugin's established pattern. Based on learnings, this is intentional for consistency within the plugin.

tests/runtime/CMakeLists.txt (2)

238-246: LGTM! Expanded S3 test coverage.

The replacement of a single S3 test with nine specialized test targets provides comprehensive coverage across configuration, multipart, format, error handling, edge cases, queue, recovery, and blob functionality.


300-306: LGTM! Correct Windows library linkage.

The conditional linkage of ole32 and rpcrt4 libraries for S3 tests on Windows is correct. These are required for GUID generation (CoCreateGuid, UuidToStringA, RpcStringFreeA) used in s3_test_helpers.h for creating unique temp paths.

plugins/out_s3/s3_store.c (6)

23-49: LGTM! Well-designed cross-platform path length handling.

The FLB_PATH_MAX constant correctly handles:

  • Windows: MAX_PATH / _MAX_PATH with 260 fallback
  • POSIX: PATH_MAX from limits.h with 4096 fallback

The inclusion of <chunkio/cio_file_st.h> properly sources the chunk format constants from the canonical location.


116-140: LGTM! Proactive buffer monitoring.

The check_buffer_space function provides good observability:

  • Early return for unlimited buffer (store_dir_limit_size == 0)
  • Clear error when buffer is full
  • Warning at 80% threshold to allow for proactive intervention

142-159: LGTM! Centralized cleanup reduces code duplication.

The cleanup_failed_file helper properly handles cleanup of both the s3_file context (including stream_path SDS) and the underlying flb_fstore_file, with clear logging of the failure reason.


260-295: LGTM! Correct cross-platform stat implementation.

The calculate_chunk_data_size function properly uses:

  • _stat / struct _stat on Windows
  • stat / struct stat on POSIX

The path truncation check at line 273 correctly guards against buffer overflow.


383-407: LGTM! Correct cross-platform gmtime implementation.

The create_stream_name function properly handles the different signatures:

  • Windows: gmtime_s(struct tm*, const time_t*) returns 0 on success
  • POSIX: gmtime_r(const time_t*, struct tm*) returns pointer on success

The consistent hyphen-based time format (%Y-%m-%dT%H-%M-%S) ensures cross-platform compatibility.


553-632: LGTM! Robust chunk file validation.

The flb_chunk_file_open function implements thorough validation:

  • Magic byte verification (CIO_FILE_ID_00, CIO_FILE_ID_01)
  • File size validation before seeking past metadata
  • Proper error handling with file closure on all failure paths
  • Safe metadata length validation against remaining file bytes (lines 613-619)

The use of canonical chunkio constants ensures compatibility with the file format specification.

tests/runtime/out_s3_multipart.c (5)

1-13: LGTM! Well-structured test file with proper dependencies.

The includes provide all necessary components:

  • AWS mock framework for simulating S3 responses
  • Test helpers for temp path management
  • S3 plugin internals for multipart testing

15-67: LGTM! Comprehensive basic multipart test.

The test properly:

  • Creates unique temp paths using s3_test_create_temp_db_path and s3_test_create_temp_store_dir
  • Configures mock responses for complete multipart flow (POST create → PUT upload → POST complete)
  • Validates flb_lib_push return value
  • Uses S3_TEST_WAIT_MOCK_EXHAUSTED for deterministic waiting
  • Performs full cleanup including mock generator, temp DB, and store directory

229-277: LGTM! Good failure scenario coverage.

The flb_test_create_multipart_failure test correctly configures only the initial POST with a 500 error to verify error handling for CreateMultipartUpload failures.


442-498: LGTM! Multiple data pushes with return value validation.

The loop at lines 484-488 correctly:

  • Captures flb_lib_push return value
  • Validates each push with TEST_CHECK(ret >= 0)
  • Uses S3_TEST_SLEEP_MS(100) between pushes for timing

This addresses the previous review concern about ignoring flb_lib_push results.


500-512: LGTM! Comprehensive TEST_LIST coverage.

The test list covers the full multipart lifecycle: creation, metadata, upload, completion, failure handling, retry behavior, and abort scenarios.

tests/internal/parquet.c (4)

20-34: LGTM! Complete include set.

All required standard headers are explicitly included (stdio.h, stdlib.h, string.h, stdint.h, math.h), and the compression constants are properly sourced from flb_aws_compress.h.


69-136: LGTM! Complete ChunkIO header initialization.

The write_msgpack_to_chunk_file function now properly initializes all header fields:

  • Magic bytes (0xC1, 0x00)
  • Content length in big-endian (bytes 10-13)
  • Metadata length in big-endian (bytes 22-23)
  • Reserved fields zeroed

This matches the ChunkIO specification in lib/chunkio/include/chunkio/cio_file_st.h.


1941-2016: LGTM! Safe NaN/Infinity generation.

Using NAN, INFINITY, and -INFINITY macros from math.h is the correct approach, avoiding potential SIGFPE from division-by-zero if FP exceptions are enabled.


2081-2156: LGTM! Well-organized test registry.

The TEST_LIST is clearly categorized with 44 tests covering:

  • Functional tests (9)
  • Data quality tests (12)
  • Scale & boundary tests (9)
  • Compression tests (1 consolidated)
  • Negative tests (10)
  • Real-world scenarios (3)

This provides comprehensive coverage of the Parquet encoder functionality.

tests/runtime/out_s3_blob.c (4)

1-13: LGTM! Consistent test file structure.

The includes match the pattern from out_s3_multipart.c, providing all necessary components for mock-based S3 blob testing.


84-164: LGTM! Robust database creation verification.

The flb_test_blob_database_custom_path test properly handles the async nature of database creation:

  • Retry loop with max_retries = 10 (up to ~1s total wait)
  • Uses S3_TEST_SLEEP_MS(100) for cross-platform sleep
  • Clear TEST_MSG on assertion for debugging

This addresses the previous review concern about flaky stat() checks.


166-373: LGTM! Comprehensive delivery attempt configuration tests.

The tests verify that various delivery attempt configurations are accepted:

  • file_delivery_attempt_limit with finite value (3)
  • part_delivery_attempt_limit with finite value (5)
  • Unlimited attempts with -1

The comment block (lines 166-172) clearly documents that these tests verify configuration acceptance, not actual retry behavior.


585-595: LGTM! Complete test coverage for blob functionality.

The TEST_LIST covers all key blob database features:

  • Basic and custom path configuration
  • File and part delivery attempt limits
  • Unlimited retry configuration
  • Upload freshness thresholds
  • Chunk size configuration
plugins/out_s3/s3_stream.h (1)

1-112: Well-structured streaming interface header.

The header defines a clean callback-based API for per-record streaming processing. The documentation comments are thorough, the include dependencies are appropriate (including the previously-flagged <sys/types.h> for off_t), and the stream_processor_context struct provides useful tracking for processed records and bytes written.

tests/include/s3_test_helpers.h (2)

1-511: Comprehensive cross-platform test helper utilities.

The implementation addresses all previously flagged concerns:

  • Safe directory cleanup using platform APIs instead of system()
  • Windows path separator handling in safe path validation
  • Atomic temp file creation using GetTempFileNameA (Windows) and mkstemps/mkdtemp (POSIX)
  • Deterministic polling-based mock exhaustion via S3_TEST_WAIT_MOCK_EXHAUSTED

The helper macros and functions provide good test isolation and cross-platform compatibility.


206-270: Windows library linking for GUID functions is properly configured.

The implementation correctly uses CoCreateGuid(), UuidToStringA(), and RpcStringFreeA() for unique directory naming on Windows. The required libraries ole32 and rpcrt4 are properly linked in tests/runtime/CMakeLists.txt (line 304) with appropriate Windows-only and S3-test-only conditionals.

tests/runtime/out_s3_config.c (4)

19-83: Well-structured test with proper mock chain and cleanup.

The test correctly:

  • Sets up mock chain with METHOD expectations
  • Creates isolated temp paths via s3_test_create_temp_db_path and s3_test_create_temp_store_dir
  • Uses deterministic S3_TEST_WAIT_MOCK_EXHAUSTED instead of fixed sleep
  • Cleans up all resources including mock generator, temp files, and allocated memory

777-835: Good use of HEADER expectation for ACL validation.

The flb_test_canned_acl_options test properly validates that the x-amz-acl header is set correctly for each ACL option using expect(HEADER, "x-amz-acl", acls[i]). This is a good pattern that other configuration tests could adopt to verify their respective headers.


1282-1300: Complete TEST_LIST registration.

All 15 test functions are properly registered in the TEST_LIST for execution.


7-8: This pattern is intentional and documented. The aws_client_mock.c file is explicitly designed to be directly included in test files, with each test built as a standalone executable. The header file clearly states: "Usage: #include both .h and .c files in test files" and notes the limitation that multiple tests should not be compiled into a single executable. The static functions in aws_client_mock.c provide file-scope isolation, preventing symbol conflicts. This same pattern is consistently used across 10+ test files in the codebase (out_s3_*.c, filter_aws.c, etc.), confirming it is an established approach rather than a mistake.

tests/runtime/out_s3_error_handling.c (4)

14-42: Well-organized AWS error response templates.

The error response macros provide realistic S3 error XML payloads for testing various error scenarios. These are properly formatted and cover key error codes that the S3 plugin should handle.


44-106: Proper NULL handling with early bailout.

The test correctly handles potential NULL returns from s3_test_create_temp_db_path and s3_test_create_temp_store_dir with an explicit early return (lines 71-75), preventing NULL pointer dereferences in subsequent code. This pattern is consistently applied across all tests in this file.


381-452: SlowDown throttling test documents plugin behavior.

The comment on lines 392-393 clarifies important plugin behavior: "S3 plugin removes file from queue on first failure without retry". This documentation is valuable for understanding the expected test outcome and plugin semantics.


743-756: Complete TEST_LIST for error handling scenarios.

All 10 error handling test functions are properly registered, covering authentication, resource, throttling, server errors, retry behavior, and cleanup scenarios.

plugins/out_s3/s3_stream.c (9)

20-45: Clean platform abstraction for file operations.

The platform-specific macros (s3_unlink, s3_close, s3_fdopen) provide a clean abstraction layer. The Windows implementation correctly uses DeleteFile for unlink and handles the different file descriptor semantics.


60-143: Chunked compression with proper resource cleanup.

The stream_compress_file_chunked function:

  • Allocates read buffer with proper NULL check
  • Handles remaining bytes calculation correctly
  • Frees compressed_chunk after each successful write
  • Includes defensive cleanup in the cleanup section for both buffers
  • Checks for read errors via ferror()

The cleanup block at lines 134-140 now properly frees compressed_chunk if it's non-NULL, addressing the past review concern about potential leaks from flb_gzip_compress.


145-216: Robust offset validation and error handling.

The function properly validates:

  • offset_start < 0 (line 169)
  • offset_end > 0 && offset_end <= offset_start (line 174)

And cleans up the output file on failure (lines 211-213). The compression type logging (lines 155-158) now correctly handles the "unknown" case.


424-453: Windows-compatible rename handling.

The code correctly closes the file handle before calling rename() (line 428-429) to support Windows, where renaming an open file fails. The file is then reopened in append mode after successful rename. This addresses the past review concern about Windows compatibility.


471-490: Proper resource cleanup on initialization failure.

The code now correctly calls stream_context_destroy() when buffer allocation (line 475) or unpacker initialization (line 481) fails, preventing resource leaks. The unpacker_initialized and result_initialized flags ensure the destroy function only cleans up what was actually allocated.


492-527: Correct msgpack unpacker return code handling.

The function now properly handles all msgpack_unpacker_next() return codes:

  • MSGPACK_UNPACK_SUCCESS and MSGPACK_UNPACK_EXTRA_BYTES: Process the unpacked record
  • MSGPACK_UNPACK_NOMEM_ERROR: Log error and return -1
  • MSGPACK_UNPACK_PARSE_ERROR: Log error and return -1
  • MSGPACK_UNPACK_CONTINUE: Loop condition handles this naturally

The comment at lines 503-507 correctly explains that both SUCCESS and EXTRA_BYTES indicate a complete message was unpacked.


591-659: Clean entry point with proper success handling for empty input.

The function correctly returns success (0) with NULL output for empty input (lines 610-615) and empty processor output (lines 639-645), rather than treating these as errors. This matches the expected "no-op" behavior noted in past reviews.


661-689: JSON processor with proper error propagation.

The stream_json_processor now returns -1 on JSON conversion failure (lines 673-676), properly propagating the error to the caller. This addresses the past review concern about silently dropping records.


691-793: Well-structured log_key processor.

The log_key extraction logic is well organized:

  • find_log_key_in_map: Searches for the configured log key in the map
  • write_string_value: Writes string/binary values directly
  • write_json_value: Converts non-string values to JSON
  • stream_log_key_processor: Orchestrates the flow

The function correctly handles missing keys by returning 0 (skip record) rather than failing.

tests/runtime/out_s3_edge_cases.c (11)

104-184: LGTM!

Test structure is correct with proper mock setup, data push, wait, and cleanup.


189-266: LGTM!

Concurrent upload test with 10 workers is properly configured.


271-347: LGTM!

Minimal timeout test follows the established pattern correctly.


353-431: LGTM!

Good edge case coverage verifying timeout triggers before reaching file size threshold.


436-514: LGTM!

Good coverage for S3 key format with tag segment placeholders.


519-601: LGTM!

Tag delimiter configuration test is well structured.


606-682: LGTM!

Custom store directory test follows established patterns.


687-766: LGTM!

Minimum boundary configuration test is correctly implemented.


771-850: LGTM!

Maximum boundary configuration test properly validates large config values while keeping timeout practical.


855-933: LGTM!

Compression with small data test correctly uses sizeof(...) - 1 for payload length.


1082-1096: LGTM!

Test list correctly registers all 12 test functions with proper NULL termination.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@kalavt kalavt force-pushed the feature/aws-s3-parquet branch from bb051a2 to ff0643c Compare January 15, 2026 09:31
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@plugins/out_s3/s3_blob.c`:
- Around line 424-429: The error check after calling
flb_blob_db_file_part_insert currently compares ret == -1 which won't catch the
DB error constant; change the check to detect negative/error returns (e.g. if
(ret < 0) or if (ret != FLB_BLOB_DB_SUCCESS)) and keep the existing
flb_plg_error(ctx->ins, ...) and return -1 path so database insertion failures
from flb_blob_db_file_part_insert (which returns
FLB_BLOB_DB_ERROR_FILE_PART_INSERT on error) are properly handled, matching the
flb_blob_db_file_insert pattern used elsewhere.
♻️ Duplicate comments (4)
tests/internal/parquet.c (1)

234-348: Test context framework reduces boilerplate effectively.

The test_context structure and associated helpers (init_test_context, run_conversion, cleanup_test_context, ctx_pack_json, ctx_validate) provide a clean abstraction for test setup and teardown. The flb_msgpack_raw_to_parquet_file_streaming wrapper correctly manages schema lifecycle.

However, the tempfile paths remain collision-prone (as noted in a previous review). Consider adding PID or using mkstemp() for concurrent test safety.

tests/runtime/out_s3_format.c (1)

9-12: Including .c file directly is non-standard but works for this test structure.

Line 10 includes aws_client_mock.c directly. While unconventional and would cause linker errors if multiple test files were compiled into a single executable, this approach is documented in aws_client_mock.h (lines 6-8) as a known limitation. Since each S3 test file is compiled as a separate executable (per CMakeLists.txt), this works correctly.

For future maintainability, consider compiling aws_client_mock.c as a separate object and linking it, but this would require build system changes.

tests/runtime/out_s3_error_handling.c (1)

7-8: Direct inclusion of implementation file is non-standard.

Including aws_client_mock.c directly rather than linking against it is an architectural smell, though consistent with other test files in this PR. This works because each test is compiled separately, avoiding linker conflicts.

Consider creating a proper header (aws_client_mock.h) with declarations and linking the implementation at build time for better separation of concerns.

tests/runtime/out_s3_edge_cases.c (1)

7-8: Direct inclusion of implementation file.

Same pattern as other test files in this PR - including aws_client_mock.c directly. Consistent but non-standard.

🧹 Nitpick comments (8)
tests/internal/parquet.c (1)

905-941: Consider using flb_malloc/flb_free for consistency.

The test uses standard malloc/free for the 100KB string allocation while other parts of Fluent Bit use flb_malloc/flb_free. For consistency with the codebase memory management patterns, consider using the Fluent Bit wrappers.

♻️ Suggested change
-    char *long_str = (char *)malloc(long_str_size);
+    char *long_str = flb_malloc(long_str_size);
     ...
-    free(long_str);
+    flb_free(long_str);
plugins/out_s3/s3_blob.h (1)

1-43: LGTM - Clean API header.

The header correctly declares the blob handling API with proper include guards and minimal dependencies. Function signatures match their implementations in s3_blob.c.

Minor: The parameter alignment in s3_blob_notify_delivery has inconsistent indentation (extra leading spaces on lines 37-41). Consider aligning parameters consistently:

♻️ Optional formatting fix
 /* Send delivery notification to input plugin */
 int s3_blob_notify_delivery(struct flb_s3 *ctx,
-                                        struct flb_config *config,
-                                        cfl_sds_t source,
-                                        cfl_sds_t file_path,
-                                        uint64_t file_id,
-                                        int success);
+                            struct flb_config *config,
+                            cfl_sds_t source,
+                            cfl_sds_t file_path,
+                            uint64_t file_id,
+                            int success);
tests/include/s3_test_helpers.h (1)

138-195: LGTM - Atomic temp file creation.

Good use of platform-specific atomic file creation (GetTempFileNameA on Windows, mkstemp on POSIX) to avoid race conditions in temp path generation.

Consider using strncpy or snprintf instead of strcpy at line 187 for defense-in-depth, even though the buffer size is correctly controlled:

♻️ Optional defensive change
     path = flb_malloc(strlen(template_path) + 1);
     if (path) {
-        strcpy(path, template_path);
+        snprintf(path, strlen(template_path) + 1, "%s", template_path);
     }
plugins/out_s3/s3_auth.h (1)

41-47: Consider using the enum type instead of int for url_type parameter.

The s3_auth_fetch_presigned_url function takes int url_type but should use s3_presigned_url_type_t for better type safety and self-documentation.

♻️ Suggested change
 /* Unified presigned URL fetcher - works for both standard and blob uploads */
 int s3_auth_fetch_presigned_url(struct flb_s3 *ctx,
                                  flb_sds_t *result_url,
-                                 int url_type,
+                                 s3_presigned_url_type_t url_type,
                                  const char *s3_key,
                                  const char *upload_id,
                                  int part_number);
plugins/out_s3/s3_stream.h (1)

80-88: Consider clarifying the out_buf parameter documentation.

The comment "Output buffer (FILE: marker for temp file)" on line 76 is unclear. It would help to explain what "FILE: marker" means and how callers should use this parameter.

plugins/out_s3/s3_store.c (1)

391-401: Consider using the gmtime_r wrapper for consistency.

A gmtime_r wrapper is available in flb_compat.h that handles the parameter order conversion for Windows (gmtime_s). Using it would eliminate the platform-specific preprocessor block and provide a single consistent code path across platforms.

tests/runtime/out_s3_config.c (2)

1-13: Including .c file directly is unconventional but acceptable for test scaffolding.

The direct inclusion of aws_client_mock.c at line 8 is unusual (typically .c files are compiled separately). This pattern works for isolated test files but can cause issues with multiple translation units. If this test file is part of a larger test suite compiled together, consider moving to proper compilation unit separation.


762-835: Good use of HEADER expectation for ACL verification.

Unlike the storage_class_variations test which only checks METHOD, this test includes expect(HEADER, "x-amz-acl", acls[i]) at line 782, which actually validates that the configured canned_acl value is sent in the request headers. This is a more thorough test.

Consider adding similar HEADER expectations to flb_test_storage_class_variations to verify the x-amz-storage-class header is actually sent.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bb051a2 and ff0643c.

📒 Files selected for processing (33)
  • plugins/out_s3/CMakeLists.txt
  • plugins/out_s3/s3.c
  • plugins/out_s3/s3.h
  • plugins/out_s3/s3_auth.c
  • plugins/out_s3/s3_auth.h
  • plugins/out_s3/s3_blob.c
  • plugins/out_s3/s3_blob.h
  • plugins/out_s3/s3_multipart.c
  • plugins/out_s3/s3_multipart.h
  • plugins/out_s3/s3_queue.c
  • plugins/out_s3/s3_queue.h
  • plugins/out_s3/s3_store.c
  • plugins/out_s3/s3_store.h
  • plugins/out_s3/s3_stream.c
  • plugins/out_s3/s3_stream.h
  • tests/include/aws_client_mock.c
  • tests/include/aws_client_mock.h
  • tests/include/s3_test_helpers.h
  • tests/internal/CMakeLists.txt
  • tests/internal/aws_compress.c
  • tests/internal/aws_util.c
  • tests/internal/parquet.c
  • tests/runtime/CMakeLists.txt
  • tests/runtime/out_s3.c
  • tests/runtime/out_s3_blob.c
  • tests/runtime/out_s3_config.c
  • tests/runtime/out_s3_edge_cases.c
  • tests/runtime/out_s3_error_handling.c
  • tests/runtime/out_s3_format.c
  • tests/runtime/out_s3_multipart.c
  • tests/runtime/out_s3_queue.c
  • tests/runtime/out_s3_recovery.c
  • tests/runtime/out_s3_special_features.c
💤 Files with no reviewable changes (1)
  • tests/runtime/out_s3.c
🚧 Files skipped from review as they are similar to previous changes (4)
  • tests/internal/CMakeLists.txt
  • tests/internal/aws_compress.c
  • tests/include/aws_client_mock.c
  • plugins/out_s3/CMakeLists.txt
🧰 Additional context used
🧠 Learnings (10)
📚 Learning: 2025-09-04T12:35:36.904Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:3275-3282
Timestamp: 2025-09-04T12:35:36.904Z
Learning: The out_s3 plugin intentionally uses a simple numeric comparison for retry_limit (chunk->failures >= ctx->ins->retry_limit) rather than the standard Fluent Bit pattern that checks for FLB_OUT_RETRY_UNLIMITED (-1). The maintainer wants to keep this current behavior for consistency within the plugin.

Applied to files:

  • plugins/out_s3/s3_blob.c
  • plugins/out_s3/s3_stream.c
  • plugins/out_s3/s3_multipart.c
  • plugins/out_s3/s3_queue.c
📚 Learning: 2025-09-04T12:35:22.872Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:1339-1344
Timestamp: 2025-09-04T12:35:22.872Z
Learning: In the Fluent Bit S3 plugin, the user prefers to maintain current retry_limit behavior without special handling for FLB_OUT_RETRY_UNLIMITED (-1), as there's no documentation indicating -1 should be used for infinite retries and consistency with current logic is preferred.

Applied to files:

  • plugins/out_s3/s3_blob.c
  • plugins/out_s3/s3_auth.c
  • plugins/out_s3/s3_multipart.c
  • plugins/out_s3/s3_queue.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

  • tests/runtime/out_s3_error_handling.c
  • plugins/out_s3/s3_store.c
  • tests/include/s3_test_helpers.h
  • plugins/out_s3/s3_queue.c
📚 Learning: 2025-12-17T18:45:02.283Z
Learnt from: edsiper
Repo: fluent/fluent-bit PR: 11286
File: src/flb_http_client.c:1657-1715
Timestamp: 2025-12-17T18:45:02.283Z
Learning: In the Fluent Bit OAuth2 implementation (src/flb_oauth2.c), the function flb_oauth2_get_access_token returns a pointer to the cached token stored in the OAuth2 context (ctx->access_token), not a new allocation. The OAuth2 context manages the token lifecycle, so callers must not free the returned token pointer.

Applied to files:

  • plugins/out_s3/s3_auth.c
  • plugins/out_s3/s3.h
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.

Applied to files:

  • tests/internal/parquet.c
  • plugins/out_s3/s3_stream.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.

Applied to files:

  • tests/internal/parquet.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like `#ifdef` FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

  • tests/internal/parquet.c
  • plugins/out_s3/s3_stream.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use `#ifdef` FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

  • tests/internal/parquet.c
  • plugins/out_s3/s3_stream.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.

Applied to files:

  • tests/internal/parquet.c
  • plugins/out_s3/s3_stream.c
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.

Applied to files:

  • plugins/out_s3/s3_stream.c
🧬 Code graph analysis (17)
plugins/out_s3/s3_blob.h (1)
plugins/out_s3/s3_blob.c (4)
  • s3_blob_register_parts (403-436)
  • s3_blob_process_events (451-538)
  • s3_blob_recover_state (171-199)
  • s3_blob_notify_delivery (124-161)
tests/include/aws_client_mock.h (1)
tests/include/aws_client_mock.c (1)
  • flb_aws_client_mock_clear_generator_instance (71-74)
plugins/out_s3/s3_blob.c (6)
src/aws/flb_aws_util.c (1)
  • flb_get_s3_key (1311-1389)
plugins/out_s3/s3_auth.c (1)
  • s3_auth_fetch_presigned_url (359-423)
plugins/out_s3/s3_multipart.c (2)
  • s3_multipart_abort (738-800)
  • s3_multipart_check_upload_exists (806-880)
src/flb_input_blob.c (1)
  • flb_input_blob_delivery_notification_destroy (222-231)
src/flb_blob_db.c (2)
  • flb_blob_db_lock (423-428)
  • flb_blob_file_update_remote_id (586-617)
plugins/out_s3/s3_queue.c (1)
  • s3_queue_add_pending_file (269-282)
tests/runtime/out_s3_edge_cases.c (2)
tests/include/s3_test_helpers.h (5)
  • s3_test_next (449-457)
  • s3_test_create_temp_db_path (138-195)
  • s3_test_create_temp_store_dir (205-268)
  • s3_test_cleanup_temp_db (428-441)
  • s3_test_cleanup_temp_store_dir (395-420)
tests/include/aws_client_mock.c (3)
  • flb_aws_client_mock_configure_generator (42-47)
  • flb_aws_client_get_mock_generator (137-140)
  • flb_aws_client_mock_clear_generator_instance (71-74)
plugins/out_s3/s3_store.h (1)
plugins/out_s3/s3_store.c (1)
  • flb_chunk_file_open (553-632)
plugins/out_s3/s3_auth.h (1)
plugins/out_s3/s3_auth.c (3)
  • s3_auth_init_endpoint (79-128)
  • s3_auth_request_presigned_url (173-260)
  • s3_auth_fetch_presigned_url (359-423)
plugins/out_s3/s3_queue.h (1)
plugins/out_s3/s3_queue.c (11)
  • s3_queue_remove (317-326)
  • s3_queue_buffer_chunk (366-388)
  • s3_queue_add_file (126-140)
  • s3_queue_add_part (142-212)
  • s3_queue_add_pending_file (269-282)
  • s3_queue_add_file_unlocked (68-121)
  • s3_queue_add_pending_file_unlocked (218-264)
  • s3_queue_process_entry (765-824)
  • s3_queue_recover_all (867-922)
  • s3_queue_recover_from_database (1015-1103)
  • s3_queue_recover_from_fstore (1168-1228)
tests/internal/parquet.c (3)
src/flb_parquet_impl.cpp (1)
  • field_name (548-548)
src/flb_pack.c (1)
  • flb_pack_json (530-535)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
tests/runtime/out_s3_format.c (3)
tests/include/s3_test_helpers.h (5)
  • s3_test_next (449-457)
  • s3_test_create_temp_db_path (138-195)
  • s3_test_create_temp_store_dir (205-268)
  • s3_test_cleanup_temp_db (428-441)
  • s3_test_cleanup_temp_store_dir (395-420)
tests/include/aws_client_mock.c (3)
  • flb_aws_client_mock_configure_generator (42-47)
  • flb_aws_client_get_mock_generator (137-140)
  • flb_aws_client_mock_clear_generator_instance (71-74)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
tests/runtime/out_s3_blob.c (2)
tests/include/s3_test_helpers.h (5)
  • s3_test_next (449-457)
  • s3_test_create_temp_db_path (138-195)
  • s3_test_create_temp_store_dir (205-268)
  • s3_test_cleanup_temp_db (428-441)
  • s3_test_cleanup_temp_store_dir (395-420)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
plugins/out_s3/s3_multipart.h (1)
plugins/out_s3/s3_multipart.c (13)
  • flb_s3_calculate_optimal_part_size (164-216)
  • s3_multipart_get_md5_base64 (218-248)
  • s3_multipart_create_headers (357-418)
  • s3_multipart_upload_new (451-500)
  • s3_multipart_upload_destroy (420-445)
  • s3_multipart_initiate (502-559)
  • s3_multipart_upload_part (561-653)
  • s3_multipart_complete (655-736)
  • s3_multipart_abort (738-800)
  • s3_multipart_abort_with_url (1022-1043)
  • s3_multipart_check_upload_exists (806-880)
  • s3_multipart_upload_file_part (894-1016)
  • s3_multipart_upload_file (1203-1294)
plugins/out_s3/s3_store.c (4)
src/flb_sds.c (4)
  • flb_sds_create_size (92-95)
  • flb_sds_printf (336-387)
  • flb_sds_destroy (389-399)
  • flb_sds_create (78-90)
src/flb_fstore.c (7)
  • flb_fstore_file_create (153-190)
  • flb_fstore_file_meta_set (74-108)
  • flb_fstore_file_append (298-329)
  • flb_fstore_create (479-525)
  • flb_fstore_destroy (527-564)
  • flb_fstore_stream_create (335-402)
  • flb_fstore_file_inactive (238-256)
include/fluent-bit/flb_mem.h (2)
  • flb_free (126-128)
  • flb_calloc (84-96)
include/fluent-bit/flb_compat.h (1)
  • gmtime_r (76-82)
plugins/out_s3/s3_stream.c (5)
src/aws/flb_aws_compress.c (1)
  • flb_aws_compression_compress (110-127)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
src/flb_sds.c (5)
  • flb_sds_destroy (389-399)
  • flb_sds_create_size (92-95)
  • flb_sds_cat (120-141)
  • flb_sds_create (78-90)
  • flb_sds_printf (336-387)
plugins/out_s3/s3_store.c (1)
  • flb_chunk_file_open (553-632)
tests/internal/processor.c (1)
  • processor (66-126)
tests/runtime/out_s3_config.c (4)
tests/include/s3_test_helpers.h (5)
  • s3_test_next (449-457)
  • s3_test_create_temp_db_path (138-195)
  • s3_test_create_temp_store_dir (205-268)
  • s3_test_cleanup_temp_db (428-441)
  • s3_test_cleanup_temp_store_dir (395-420)
tests/include/aws_client_mock.c (3)
  • flb_aws_client_mock_configure_generator (42-47)
  • flb_aws_client_get_mock_generator (137-140)
  • flb_aws_client_mock_clear_generator_instance (71-74)
src/flb_lib.c (7)
  • flb_create (143-225)
  • flb_service_set (652-678)
  • flb_input (266-276)
  • flb_start (983-994)
  • flb_lib_push (843-870)
  • flb_stop (1011-1055)
  • flb_destroy (228-263)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
plugins/out_s3/s3_stream.h (1)
plugins/out_s3/s3_stream.c (4)
  • stream_compress_file (147-218)
  • stream_process_msgpack_file (593-661)
  • stream_json_processor (663-691)
  • stream_log_key_processor (766-795)
tests/include/s3_test_helpers.h (2)
plugins/out_s3/s3.c (1)
  • setenv (66-69)
tests/include/aws_client_mock.c (1)
  • flb_aws_client_mock_generator_count_unused_requests (143-147)
tests/internal/aws_util.c (1)
src/aws/flb_aws_util.c (1)
  • flb_get_s3_key (1311-1389)
🔇 Additional comments (111)
tests/internal/parquet.c (9)

1-52: Well-structured test infrastructure with clear organization.

The file header, includes, and external function declaration are well-organized. The #ifdef __cplusplus guards around validate_parquet_file are appropriate for C++ linkage.


54-136: Chunk file header implementation correctly follows ChunkIO format.

The write_msgpack_to_chunk_file function properly implements the 24-byte ChunkIO header with:

  • Magic bytes (0xC1, 0x00)
  • Content length (big-endian)
  • Metadata length (big-endian)

The detailed comments documenting the format are helpful for future maintenance.


350-440: Functional tests cover core conversion paths thoroughly.

test_basic_conversion, test_multiple_records, and test_large_record_count provide good coverage of:

  • Single record conversion with validation
  • Multi-record batches (100 records)
  • Large datasets (70,000 records) to trigger batch boundaries

The use of field_expectation arrays with strategic validation points (first, middle, last records) is a good testing pattern.


442-594: Type conversion tests comprehensively cover conversion paths.

The tests validate all major type conversions:

  • Boolean from int/float/string
  • Integer from float/string/bool
  • Float from int/string/bool
  • String from all types including objects/arrays (JSON serialization)

The validation of both type and value for each conversion path ensures the encoder handles coercion correctly.


1006-1086: Negative tests properly validate error handling.

The destructive tests cover important error cases:

  • Truncated/corrupted msgpack data
  • Invalid JSON schema syntax
  • Empty schema (no fields)

Each test correctly validates that the conversion returns -1 on failure.


1267-1422: Schema mismatch tests are critical for crash prevention.

These tests (test_edge_schema_more_fields, test_edge_schema_less_fields, test_edge_field_name_mismatch, test_edge_all_fields_missing) validate important edge cases where schema doesn't match data. The comment "Critical for crash fix validation" indicates these tests guard against potential segfaults or buffer overruns in the encoder.


1872-1939: Batch boundary tests are thorough.

Testing at 65535, 65536, 65537, and 131072 records ensures the Parquet writer handles row group boundaries correctly. The validation of boundary records (first of batch, last of batch, cross-batch) is appropriate.


1941-2016: Special float value handling correctly uses standard macros.

Using NAN, INFINITY, and -INFINITY from <math.h> is the correct approach, as division-by-zero can trigger SIGFPE when floating-point exceptions are enabled.


2081-2156: Test registry is well-organized with clear categorization.

The TEST_LIST provides excellent organization with 6 categories:

  1. Functional tests (9)
  2. Data quality tests (12)
  3. Scale & boundary tests (9)
  4. Compression tests (1, consolidated)
  5. Negative tests (10)
  6. Real-world scenarios (3)

The comments explaining optimization choices (e.g., consolidating compression tests) are helpful.

tests/include/aws_client_mock.h (2)

6-8: LGTM - Improved documentation clarity.

The updated header comment clearly documents the usage pattern, limitations (no multi-test executables), and thread safety concerns. This is helpful for developers using the mock infrastructure.


192-197: LGTM - Essential for proper mock lifecycle management.

This function correctly addresses the use-after-free scenario where flb_destroy() frees the mock client internally, and a subsequent call to configure_generator() would otherwise attempt to destroy an already-freed instance. The documentation clearly explains when to use this.

tests/runtime/CMakeLists.txt (2)

300-306: LGTM - Addresses Windows build requirements.

The conditional linking of ole32 and rpcrt4 for S3 tests on Windows correctly resolves the linker dependencies for GUID generation functions (CoCreateGuid, UuidToStringA, RpcStringFreeA) used in s3_test_helpers.h.


238-246: LGTM - Good test modularization.

Splitting the monolithic S3 test into focused test suites (config, multipart, format, error_handling, edge_cases, special_features, queue, recovery, blob) improves maintainability and allows parallel test execution. Windows-specific linking for ole32 and rpcrt4 is correctly scoped to S3 tests only.

tests/include/s3_test_helpers.h (7)

1-43: LGTM - Well-designed cross-platform abstractions.

The platform-specific macros cleanly abstract away differences between Windows and POSIX systems. The include structure properly guards Windows-specific headers.


81-121: LGTM - Test isolation counter and temp directory helper.

The temp directory resolution follows standard precedence (TMPDIR → TMP → TEMP → /tmp), which is appropriate for cross-platform test execution.


205-268: LGTM - GUID-based directory uniqueness on Windows.

Good use of CoCreateGuid on Windows for globally unique directory names, addressing the previous review concern about counter-based collisions across translation units.


270-342: LGTM - Safe recursive directory removal.

The implementations correctly handle recursive removal using platform-native APIs (nftw with FTW_DEPTH | FTW_PHYS on POSIX, FindFirstFile/FindNextFile on Windows), avoiding the command injection vulnerability from the previous system("rm -rf ...") approach.


351-386: LGTM - Safe path validation.

The validation correctly checks both path separators (/ and \) and includes reasonable fallback paths.


449-457: LGTM - Cross-platform environment variable setting.

Correctly uses _putenv_s on Windows and setenv on POSIX. Based on learnings, Fluent Bit only supports MSVC on Windows, so this is sufficient.


475-507: LGTM - Deterministic mock polling.

The polling-based approach is superior to fixed sleep() calls as it makes tests deterministic and fail-fast when the mock chain isn't consumed as expected.

tests/runtime/out_s3_format.c (6)

18-96: LGTM - Well-structured test with proper cleanup.

The test correctly:

  • Uses s3_test_next() for isolation
  • Checks all return values (TEST_CHECK)
  • Uses S3_TEST_WAIT_MOCK_EXHAUSTED instead of fixed sleep
  • Follows proper cleanup sequence including flb_aws_client_mock_clear_generator_instance() before freeing resources

180-267: LGTM - Good test for aggregation behavior.

The streaming conversion test correctly validates that multiple flb_lib_push calls are aggregated into a single chunk. The comment at lines 195-199 clearly documents the expected behavior.


269-354: LGTM - Comprehensive compression coverage.

Good coverage of gzip, zstd, and snappy compression modes. Each test follows the established pattern consistently.

Also applies to: 356-435, 437-515


517-601: LGTM - log_key extraction tests.

Good coverage of the log_key feature both standalone and combined with compression.

Also applies to: 603-687


851-910: LGTM - Negative test case.

Good practice to test invalid configuration handling. The test correctly expects flb_start() to return -1 for invalid compression type, and properly handles cleanup in both success and failure cases (lines 900-903).


912-926: LGTM - Complete test registration.

All 11 test functions are properly registered in TEST_LIST.

tests/internal/aws_util.c (3)

47-68: Good coverage of edge cases for file path handling.

The test macros comprehensively cover various edge cases including relative paths, triple dots, parent directory traversal, hidden files, and mixed dots/slashes. This helps ensure the path sanitization logic in flb_get_s3_key properly strips leading dots and slashes.


402-533: Well-structured test functions for file path features.

The new test functions follow the established pattern in this file and thoroughly exercise the $FILE_PATH and $FILE_NAME substitution logic. The edge case tests verify proper handling of potentially problematic inputs like parent traversal (../../../etc/passwd) and hidden files.


200-200: Function signature update looks correct.

All existing flb_get_s3_key calls have been updated to pass NULL for the new file_path parameter, maintaining backward compatibility while enabling the new functionality.

tests/runtime/out_s3_queue.c (4)

14-74: Test structure is clean and follows good practices.

The flb_test_queue_basic_operations test demonstrates a well-organized pattern: setup mock chain, create temp resources, configure and start Fluent Bit, push data, wait for completion, and clean up. This pattern is consistently applied across all tests in the file.


145-206: Good test coverage for retry mechanism.

The retry test properly simulates a transient 500 error on UploadPart followed by success, verifying that the queue correctly retries failed operations. The mock chain at lines 156-161 correctly models the failure-then-success scenario.


469-479: Test list registration is complete.

All seven test functions are properly registered in the TEST_LIST, ensuring they will be executed by the test framework.


7-8: This inclusion pattern is intentional and safe.

The aws_client_mock.c file is designed to be directly included in test files. The file itself documents this pattern (see lines 1-10) and notes that each test is compiled as a separate standalone executable, which prevents symbol conflicts. The file uses static declarations for internal functions and variables, making them local to each compiled test binary. There is no risk of multiple definition errors because the tests are not linked together into a single executable.

plugins/out_s3/s3_auth.h (1)

26-31: Well-designed enum for presigned URL types.

The s3_presigned_url_type_t enum provides type safety for the different multipart upload operations. This is cleaner than using magic numbers or string constants.

plugins/out_s3/s3_store.c (7)

33-49: Good cross-platform path length handling.

The FLB_PATH_MAX definition properly handles Windows (MAX_PATH/_MAX_PATH) and POSIX (PATH_MAX) with sensible fallback values. This addresses previous portability concerns.


116-140: Buffer space check with warning threshold is a good addition.

The check_buffer_space function proactively warns when the buffer is 80% full, giving operators time to react before data loss occurs. The error message at line 128-129 provides useful debugging information.


142-159: Clean helper for failed file cleanup.

The cleanup_failed_file function properly handles partial cleanup scenarios by checking for NULL before freeing resources. This prevents memory leaks in error paths.


244-252: Proper cleanup on append failure.

The code now properly cleans up newly created files when append fails (lines 247-250), addressing the previous concern about leaving partially written buffer files.


278-288: Cross-platform stat handling is correct.

Using _stat on Windows and stat on POSIX with appropriate struct types ensures portability. The FLB_SYSTEM_WINDOWS guard is the correct macro for Fluent Bit's Windows support.


506-524: Memory leak fixed in s3_store_file_inactive.

The function now properly frees s3_file->stream_path before freeing the struct (lines 517-519), addressing the previously identified memory leak.


584-620: Robust meta_len validation before seeking.

The code now properly validates meta_len against the remaining file size (lines 613-620) before attempting to seek, preventing undefined behavior on corrupt chunk files. The error messages are informative and include the problematic values.

plugins/out_s3/s3_stream.h (3)

23-27: Proper includes with sys/types.h for off_t.

The header correctly includes <sys/types.h> for off_t, addressing the previous review concern about potential compilation issues.


29-43: Well-designed callback pattern for record processing.

The record_processor_fn typedef and stream_processor_context struct provide a clean, extensible framework for processing records in different formats (JSON, log_key, and potentially Parquet). The context tracks useful metrics like records_processed and bytes_written.


45-62: Good documentation for stream_compress_file.

The documentation clearly explains the offset parameters, which is important since -1 for offset_end means EOF. This streaming approach is memory-efficient for large files.

plugins/out_s3/s3_store.h (2)

51-52: Good API design for cross-module chunk file access.

The context-free chunk file opener with C++ compatibility is well-designed. The implementation (from the provided context in s3_store.c) shows thorough validation of chunk headers, magic bytes, and metadata length bounds.


26-34: LGTM - struct definition is clean and well-documented.

The s3_file struct with the renamed stream_path field is correctly implemented. The field comments clearly describe the purpose of each member. All usages of stream_path in s3_store.c and s3.c are consistent with this change.

plugins/out_s3/s3_blob.c (6)

41-122: Well-implemented abort helper with proper validation and cleanup.

The function correctly:

  • Validates file_remote_id before proceeding
  • Uses stored s3_key when available with appropriate fallback warning
  • Cleans up all resources on error paths including the pre_signed_url leak fix

124-161: LGTM - Proper allocation checks and cleanup.

The function correctly validates the cfl_sds_create return value and cleans up the notification on path allocation failure.


171-199: LGTM - Recovery orchestrator with proper lock handling.

The blob DB lock acquisition is correctly checked, and the unlock failure is appropriately logged as a warning.


201-253: Stale file recovery implementation looks correct.

The function properly:

  • Logs abort failures with detailed context (file_id, path, upload_id, parts, ret)
  • Cleans up allocated strings each iteration
  • Resets NULL pointers after cleanup to avoid double-free

Note: Passing NULL for s3_key at line 225 triggers the legacy fallback in abort_multipart_upload. The comment clarifies this is intentional for stale files without stored keys.


255-401: Comprehensive aborted file handling with proper validation.

The function correctly:

  • Uses stored s3_key for upload validation when available
  • Provides fallback with appropriate warning for legacy data
  • Logs abort failures with detailed context
  • Tracks statistics for observability (resume, fresh start, discarded counts)

The FLB_OUT_RETRY_UNLIMITED check at line 285 is consistent with the project's retry handling approach. Based on learnings, the S3 plugin intentionally uses its own retry logic pattern.


451-538: Well-structured event processing with proper type handling.

The function correctly:

  • Uses int64_t file_id (line 458) to match flb_blob_db_file_insert return type
  • Checks file_id < 0 (line 496) to catch all negative error codes
  • Defensively frees partial allocations on flb_input_blob_file_get_info failure
  • Documents the coroutine context limitations in the architecture comment
plugins/out_s3/s3_auth.c (6)

36-77: LGTM - URL parsing with proper cleanup.

The parse_url function correctly calls url_parts_destroy on all error paths, preventing potential leaks from partial allocations.


79-128: Clean endpoint initialization with proper TLS handling.

The function correctly:

  • Determines HTTPS usage and creates TLS context only when needed
  • Uses direct flb_output_upstream_set call (past upstream flags issue resolved)
  • Cleans up TLS context and URL parts on upstream creation failure

130-153: LGTM - Port validation with proper bounds checking.

The function correctly rejects port 0 by requiring port_val >= 1 (line 141), addressing the past review concern about invalid client connection ports.


173-260: Well-structured HTTP request handling with comprehensive cleanup.

The function properly:

  • Uses a single cleanup label for resource management
  • Handles both reusing existing result_url and creating new allocation
  • Provides detailed error logging including response payload when available

262-357: LGTM - URL path construction with proper encoding and cleanup.

The function correctly:

  • Strips leading / from encoded keys to avoid double slashes
  • Uses explicit / delimiter between bucket and key components
  • Uses distinct endpoint paths for each operation type (create, upload, complete, abort)
  • Cleans up all allocated resources on error

359-423: Comprehensive presigned URL fetching with all edge cases handled.

The function correctly:

  • Guards against uninitialized upstream (line 376-378)
  • Prevents double-slashes in URL construction (lines 393-410)
  • Frees full_url on flb_sds_printf failure (lines 411-414)
  • Returns success with NULL result when no endpoint is configured
tests/runtime/out_s3_multipart.c (3)

14-498: Well-structured test suite with comprehensive coverage.

The tests correctly:

  • Use unique temporary paths via s3_test_create_temp_db_path and s3_test_create_temp_store_dir
  • Check flb_lib_push return values in all tests
  • Use S3_TEST_WAIT_MOCK_EXHAUSTED instead of bare sleep() calls
  • Clean up all resources including temporary files and mock state
  • Cover success paths, failure scenarios, and retry behavior

500-512: LGTM - Complete test registration.

All nine test functions are properly registered with matching names and NULL terminator.


7-8: This is an intentional, documented pattern in the test suite. The header file tests/include/aws_client_mock.h explicitly specifies: "Usage: #include both .h and .c files in test files." This approach is used consistently across 13 test files, including all S3-related tests. No action needed.

Likely an incorrect or invalid review comment.

plugins/out_s3/s3_multipart.h (4)

20-37: LGTM - Clean header setup with appropriate includes.

The header correctly includes sys/types.h for off_t and s3.h for the S3 context type. Helper function declarations are well-documented.


39-76: Well-designed multipart lifecycle API.

The API provides clean separation between:

  • Lifecycle management (new/destroy)
  • AWS API wrappers (initiate, upload_part, complete, abort)
  • Convenience helpers (abort_with_url)

Parameter types are consistent using const char* throughout.


77-89: Excellent documentation for recovery-critical function.

The IMPORTANT comment clearly warns developers about the key regeneration pitfall, which is critical for upload recovery correctness. The return value semantics (1/0/-1) are also well-documented.


90-107: LGTM - Streaming upload API with appropriate types.

The high-level file upload functions use:

  • off_t for byte offsets (correct type for large file support)
  • flb_sds_t for presigned URL (appropriate for dynamic string ownership)
plugins/out_s3/s3_queue.h (4)

1-24: LGTM!

License header, include guards, and dependencies are properly structured.


26-87: LGTM!

Function declarations are well-documented with clear parameter semantics. The unlocked variants properly indicate the caller must hold upload_queue_lock. Return value conventions are clearly documented.


89-101: LGTM!

The three-phase recovery architecture is clearly documented with consistent phase numbering (0, 1, 2). The documentation accurately describes each phase's responsibility: zombie cleanup, state transitions, and queue rebuild.


103-103: LGTM!

Include guard properly closed.

tests/runtime/out_s3_error_handling.c (4)

44-106: LGTM!

Test properly validates AccessDenied error handling with:

  • NULL checks for temp paths with early bailout
  • Polling-based wait instead of fixed sleep
  • Complete cleanup of mock instance and temp resources

108-519: LGTM!

All error handling tests follow a consistent and correct pattern:

  • Proper NULL checks for temp resources with early bailout
  • Polling-based mock exhaustion waits
  • Complete cleanup of all resources

Good coverage of AWS error codes: NoSuchBucket, NoSuchUpload, InvalidAccessKeyId, SignatureDoesNotMatch, SlowDown, and InternalError.


521-741: LGTM!

Auto-retry and failure cleanup tests are well-structured:

  • flb_test_auto_retry_enabled and flb_test_auto_retry_disabled properly test the auto_retry_requests configuration option
  • flb_test_failure_cleanup_and_abort validates the plugin's behavior when UploadPart fails

All tests maintain consistent cleanup patterns.


743-756: LGTM!

Test list properly registers all 10 test functions with NULL terminator.

tests/runtime/out_s3_blob.c (4)

1-82: LGTM!

Includes and first test (flb_test_blob_database_file_config) follow the established pattern with proper mock chain setup and cleanup.


84-164: LGTM!

The flb_test_blob_database_custom_path test properly handles the async DB creation race condition with a retry loop using the cross-platform S3_TEST_SLEEP_MS macro.


166-509: LGTM!

Delivery attempt limit and freshness tests properly verify that:

  • file_delivery_attempt_limit, part_delivery_attempt_limit, and upload_part_freshness_limit configurations are accepted
  • Unlimited delivery attempts (-1) is handled correctly
  • The upload flow completes successfully with these configurations

All tests maintain consistent setup and cleanup patterns.


511-595: LGTM!

The flb_test_upload_chunk_size_config test and TEST_LIST are properly structured. All 8 blob configuration tests are registered.

plugins/out_s3/s3_stream.c (8)

20-51: LGTM!

Platform-specific includes and macros are properly structured:

  • Windows uses windows.h, io.h, fcntl.h, share.h, process.h
  • POSIX uses unistd.h, fcntl.h
  • Abstraction macros (s3_unlink, s3_close, s3_fdopen) provide clean cross-platform interface

147-218: LGTM!

The stream_compress_file function properly:

  • Validates offset parameters with clear error messages
  • Handles the "unknown" compression type in debug logging
  • Cleans up the output file on failure

265-492: LGTM!

The stream_context_setup function properly handles:

  • Secure temp file creation using platform-appropriate methods (GetTempFileName + _sopen_s on Windows, mkstemp on POSIX)
  • Windows compatibility by closing the file before rename()
  • Resource cleanup on allocation failures via stream_context_destroy()
  • Dynamic buffer resizing for Windows temp paths

The TOCTOU race window on Windows between DeleteFile() and _sopen_s() is acknowledged in comments and mitigated by GetTempFileName's unique naming.


494-529: LGTM!

The process_unpacked_records function correctly handles all msgpack return codes:

  • MSGPACK_UNPACK_SUCCESS and MSGPACK_UNPACK_EXTRA_BYTES both process the unpacked record and continue the loop
  • MSGPACK_UNPACK_NOMEM_ERROR returns -1 with an error log
  • MSGPACK_UNPACK_PARSE_ERROR returns -1 with an error log

The loop properly continues processing remaining bytes when EXTRA_BYTES is returned.


531-591: LGTM!

The process_msgpack_stream and create_output_file_marker functions are straightforward and correct.


593-661: LGTM!

The stream_process_msgpack_file function properly:

  • Handles empty input as a no-op returning 0 (not an error)
  • Handles empty output from processor as a no-op
  • Maintains correct cleanup in all code paths

663-795: LGTM!

The JSON and log_key processors are well-implemented:

  • stream_json_processor properly returns -1 on conversion failure
  • find_log_key_in_map handles both STR and BIN key types
  • stream_log_key_processor correctly writes string/bin values directly and falls back to JSON for other types

52-145: The Snappy compression implementation is correct. flb_aws_compression_compress() uses flb_snappy_compress_wrapper(), which calls flb_snappy_compress_framed_data() internally—not raw Snappy compression. This produces output in Google's Snappy framing format with stream identifiers, length fields, and CRC32C checksums, making it safe to concatenate chunks. The code and its comment are accurate.

tests/runtime/out_s3_edge_cases.c (5)

18-98: LGTM!

The flb_test_empty_data_upload test correctly:

  • Uses fixed sleep (not mock exhaustion) since no data is pushed and mock may not be consumed
  • Verifies no crash by asserting ret == 0 after cleanup
  • Properly cleans up all resources

100-431: LGTM!

These tests properly exercise edge cases:

  • Large file chunking with multiple data pushes
  • Maximum concurrent uploads with 10 workers
  • Minimal timeout configuration
  • Timeout triggering before file size threshold

All follow the established pattern with polling-based mock exhaustion and complete cleanup.


433-682: LGTM!

Good coverage of S3 key format edge cases:

  • Special characters and tag segment extraction with $TAG[n]
  • Custom tag delimiter configuration
  • Custom store directory

All tests maintain consistent patterns.


684-933: LGTM!

Boundary value and compression tests are well-structured:

  • flb_test_config_boundary_minimum tests minimum configuration values
  • flb_test_config_boundary_maximum tests high values (100M, 50M, 100 workers)
  • flb_test_compression_small_data correctly uses sizeof() - 1 for payload length (per past review fix)

935-1096: LGTM!

The flb_test_rapid_start_stop test properly validates:

  • Complete cleanup between start/stop cycles
  • Fresh resource allocation with different configurations
  • No state leakage between runs

TEST_LIST correctly registers all 12 edge case tests.

tests/runtime/out_s3_config.c (2)

19-83: LGTM - Well-structured test with proper mock setup and cleanup.

The test correctly sets up the mock chain with METHOD expectations, configures the S3 plugin with s3_key_format, pushes data, and properly cleans up all resources including the temp database and store directory.


1282-1300: Test list is complete and well-organized.

All 15 test functions are properly registered in the TEST_LIST with descriptive names that match the test function purposes.

plugins/out_s3/s3_multipart.c (9)

36-65: LGTM - Cross-platform I/O abstractions are well-implemented.

The platform-specific macros properly address:

  • Binary mode (O_BINARY) on Windows to prevent text-mode corruption of binary data (Parquet files)
  • Cross-platform sleep wrapper using Sleep() on Windows and usleep() on POSIX

106-148: LGTM - ETag extraction correctly preserves quotes.

The function now correctly includes the surrounding quotes in the extracted ETag (line 141: start_quote to end_quote + 1), which is required by the AWS CompleteMultipartUpload API. The bounds checking at lines 130-138 prevents buffer overruns.


164-216: LGTM - Part size calculation correctly enforces AWS constraints.

The function properly handles:

  • AWS minimum part size (5 MiB) at line 208
  • AWS maximum part size (5 GiB) at line 211
  • Maximum 10,000 parts limit with adaptive sizing at lines 187-204
  • Rounding to MiB/GiB boundaries for cleaner sizes

218-248: LGTM - MD5 base64 encoding with proper NUL termination.

The function now correctly NUL-terminates the output buffer (lines 238-244), making it safe for callers to use strlen() on the result. The bounds check at line 239 prevents buffer overflows.


250-299: LGTM - URI building with proper encoding and error handling.

The function correctly:

  • Uses flb_aws_uri_encode_path for proper S3 key encoding
  • Handles presigned URLs as a fast path
  • Handles both /key and key formats
  • Cleans up encoded_key on all error paths

357-418: LGTM - Header creation with proper counter incrementing.

The function correctly increments n after each header assignment, including the storage_class header at line 413. The dynamic allocation matches the counted headers_len.


894-1016: LGTM - File part upload with proper overflow protection and error handling.

The function correctly:

  • Guards against 32-bit overflow with uint64_t cast comparison at line 933
  • Validates offset ranges before processing
  • Reads directly into part_buffer without redundant intermediate buffers
  • Handles short reads properly in the read loop
  • Cleans up resources on all error paths

1083-1201: LGTM - Part upload with safe exponential backoff and AWS limits enforcement.

The function correctly:

  • Ensures at least 1 retry attempt with max_attempts at line 1097
  • Enforces AWS S3 maximum of 10,000 parts at line 1101
  • Uses safe overflow-protected exponential backoff calculation at lines 1168-1184
  • Properly cleans up presigned URLs after each use

1018-1043: LGTM - Abort helper with presigned URL support and fallback.

The function correctly fetches the abort presigned URL before calling s3_multipart_abort(), and gracefully falls back to attempting abort without the presigned URL if the fetch fails. This ensures abort is attempted even when the auth endpoint is unavailable.

plugins/out_s3/s3.h (3)

29-46: LGTM - Well-defined constants and forward declaration.

The header correctly:

  • Forward declares struct flb_parquet_schema to avoid circular includes
  • Includes <pthread.h> and <sys/types.h> for pthread_mutex_t and off_t
  • Defines MAX_FILE_SIZE as 50TB matching current AWS S3 limits
  • Defines AWS multipart constraints (5 MiB min, 5 GiB max, 10,000 parts)

52-92: LGTM - Queue and multipart structures are well-designed.

The upload_queue struct supports both file-level and part-level tracking with appropriate fields. The multipart_upload struct's fixed etags[10000] array is appropriate given AWS S3's 10,000 parts maximum limit.


94-178: LGTM - Context structure with proper thread safety.

The flb_s3 structure correctly adds:

  • pthread_mutex_t upload_queue_lock for thread-safe queue access (line 157)
  • format field for JSON/Parquet distinction
  • cached_arrow_schema pointer using the forward-declared type
plugins/out_s3/s3_queue.c (7)

22-30: LGTM - Cross-platform file access wrapper.

The flb_access macro provides proper Windows/POSIX compatibility for file existence checks, addressing the previous portability concern.


68-212: LGTM - Queue entry addition with proper locking and cleanup.

The functions correctly:

  • Provide both locked and unlocked variants for flexibility
  • Use consistent lock/unlock patterns with pthread_mutex_lock/unlock
  • Clean up all allocated resources on error paths (tag_copy, path_copy, entry)

396-485: LGTM - Part upload with proper ETag memory management.

The function correctly:

  • Frees the ETag SDS after persisting to database (lines 452-454)
  • Cleans up the ETag on failure paths (lines 470-474)
  • Uses correct argument order for flb_blob_db_file_part_in_progress calls

541-713: LGTM - Multipart completion with thorough cleanup.

The function correctly:

  • Validates part_count bounds (0 < count ≤ 10000) before allocation at lines 599-603
  • Uses s3_key from the entry parameter for consistency
  • Cleans up all allocated resources (m_upload fields, fetched strings, ETags) in the cleanup section
  • Handles both success and error paths appropriately

867-922: LGTM - Well-organized three-phase recovery architecture.

The recovery process is clearly structured:

  • Phase 0: Zombie cleanup (reset in_progress parts from crashed runs)
  • Phase 1: State transitions (stale → pending, aborted handling)
  • Phase 2: Queue rebuild from persistent storage

The DB locking around zombie cleanup (lines 877-888) ensures safe concurrent access.


1015-1103: LGTM - Database recovery with proper state management.

The function correctly:

  • Resets in_progress flag before skipping on endpoint mismatch (line 1047)
  • Marks parts as in_progress to prevent re-query (line 1052)
  • Cleans up all allocated strings in the cleanup section
  • Handles both resume and new upload creation paths

1167-1228: LGTM - Fstore recovery with proper chunk handling.

The function correctly:

  • Uses mk_list_foreach_safe for safe iteration
  • Skips the metadata stream and locked chunks
  • Marks chunks exceeding retry limit as inactive
  • Locks chunks before adding to queue and unlocks on failure

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Add Parquet format support to S3 output plugin with:
- Parquet writer integration
- Enhanced multipart upload handling
- Improved queue and store management
- Stream processing updates
- Configuration examples

Signed-off-by: Arbin Cheng <arbin.cheng@coins.ph>
Signed-off-by: Arbin <arbin.cheng@coins.ph>
@kalavt kalavt force-pushed the feature/aws-s3-parquet branch from ff0643c to e7de562 Compare January 15, 2026 12:19
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In `@tests/runtime/out_s3_config.c`:
- Around line 1184-1280: flb_test_invalid_parameter_combinations may call
flb_destroy() without stopping the engine if flb_start() unexpectedly returns 0;
for both failing sub-tests (the "invalid compression" branch and the
"total_file_size > 5TB" branch) check the ret value after flb_start() and if ret
== 0 call flb_stop(ctx) before calling flb_destroy(ctx) (use the existing ret
variable and the flb_stop and flb_destroy functions referenced in
flb_test_invalid_parameter_combinations).
- Around line 42-43: Several tests call
flb_aws_client_mock_clear_generator_instance() in teardown which only nullifies
the pointer and leaks the mock wrapper and its request chain when the S3 plugin
later destroys its surrogate; replace every call to
flb_aws_client_mock_clear_generator_instance() with
flb_aws_client_mock_destroy_generator() so the mock wrapper and associated
resources are freed correctly, updating occurrences around the teardown of tests
that used init_options.client_generator/ flb_aws_client_get_mock_generator()
(e.g., the calls near the teardown blocks currently using
clear_generator_instance).

In `@tests/runtime/out_s3_edge_cases.c`:
- Around line 56-78: Check return values of s3_test_create_temp_db_path("empty")
and s3_test_create_temp_store_dir("empty") for NULL before calling
flb_output_set; if either db_path or store_dir is NULL, perform the same
early-bail cleanup used in out_s3_error_handling.c (free any non-NULL resources,
destroy ctx with flb_destroy(ctx) if created, and mark the test as failed) and
return to avoid dereferencing NULL in flb_output_set. Locate the temp helpers
(s3_test_create_temp_db_path, s3_test_create_temp_store_dir), the ctx creation
(flb_create), and the flb_output_set call to insert the NULL checks and cleanup
path.

In `@tests/runtime/out_s3_error_handling.c`:
- Around line 521-669: Modify the mock chains in flb_test_auto_retry_enabled and
flb_test_auto_retry_disabled so they differ: for flb_test_auto_retry_enabled
build the FLB_AWS_CLIENT_MOCK sequence with CreateMultipartUpload (set STATUS
200), UploadPart returning a transient failure first (set STATUS 503 or 500)
followed by a successful UploadPart (set STATUS 200), then
CompleteMultipartUpload (set STATUS 200); for flb_test_auto_retry_disabled build
the chain with CreateMultipartUpload (200), a single failing UploadPart
(503/500) and then Complete (200) so the failing UploadPart is not retried. Keep
using flb_aws_client_mock_configure_generator / init_options.client_generator
and S3_TEST_WAIT_MOCK_EXHAUSTED, and update any expectations/assertions so the
enabled test consumes the extra retry response (i.e. mock chain exhausted) while
the disabled test does not retry and thus the observed request sequence differs.

In `@tests/runtime/out_s3_multipart.c`:
- Around line 69-109: The test flb_test_create_multipart_with_metadata currently
only covers canned_acl but the comment says “ACL, storage class”; update the
test to set a storage class and assert it on the CreateMultipartUpload request:
add the expected HEADER "x-amz-storage-class" in the mock chain response for the
first POST (the chain variable created by FLB_AWS_CLIENT_MOCK) and pass
"storage_class", "<SOME_CLASS>" into flb_output_set when creating the s3 output
(same call that sets "canned_acl"); ensure the header name and storage class
string match the value you set so the mock assertion triggers.
♻️ Duplicate comments (4)
tests/internal/parquet.c (1)

288-304: Tempfile strategy is collision-prone and not portable.

Using fixed /tmp/flb_test_${name}.* paths risks collisions if tests run concurrently (or reruns leave stale files), and will break on non-POSIX environments. Consider using mkstemp() or including PID/random suffix.

Proposed direction
 static int init_test_context(test_context *ctx, const char *test_name)
 {
     msgpack_sbuffer_init(&ctx->sbuf);
     msgpack_packer_init(&ctx->packer, &ctx->sbuf, msgpack_sbuffer_write);

-    snprintf(ctx->msgpack_file, sizeof(ctx->msgpack_file),
-             "/tmp/flb_test_%s.msgpack", test_name);
-    snprintf(ctx->parquet_file, sizeof(ctx->parquet_file),
-             "/tmp/flb_test_%s.parquet", test_name);
+    snprintf(ctx->msgpack_file, sizeof(ctx->msgpack_file),
+             "/tmp/flb_test_%s_%ld.msgpack", test_name, (long)getpid());
+    snprintf(ctx->parquet_file, sizeof(ctx->parquet_file),
+             "/tmp/flb_test_%s_%ld.parquet", test_name, (long)getpid());
tests/runtime/out_s3_config.c (1)

18-557: Key format + endpoint tests don’t validate URI/host, so they can’t catch config regressions

The s3_key_format_* and endpoint_* tests assert only METHOD (and occasionally a header), but never assert the generated request URI/host/port—so $TAG/$INDEX/$UUID expansion and endpoint parsing can regress while tests still pass. Consider making the format deterministic (e.g., remove %... formatters or freeze time) and adding expect(URI/HOST/PORT, ...) assertions via the mock.

tests/runtime/out_s3_edge_cases.c (1)

6-11: Stop including aws_client_mock.c directly (use header + link the object/library instead)

This file still directly includes ../include/aws_client_mock.c (Line 8). That’s non-idiomatic and risks duplicate-symbol issues depending on how runtime tests are linked.

tests/runtime/out_s3_format.c (1)

8-12: Stop including aws_client_mock.c directly

This file still includes ../include/aws_client_mock.c (Line 10). Prefer including only the header and linking aws_client_mock.c once via the test build system.

🧹 Nitpick comments (10)
plugins/out_s3/s3_store.h (1)

51-52: Consider adding extern "C" guards to align header with advertised C++ compatibility.

The comment claims the function "can be used from anywhere including C++", but the header lacks extern "C" guards. Without these guards, attempting to use this function from C++ code would result in linker errors due to C++ name mangling.

♻️ Proposed fix

Add extern "C" guards to the header:

 `#ifndef` FLB_S3_STORE_H
 `#define` FLB_S3_STORE_H

+#ifdef __cplusplus
+extern "C" {
+#endif
+
 `#include` <fluent-bit/flb_output_plugin.h>
 `#include` <fluent-bit/flb_fstore.h>

And at the end:

 void s3_store_file_lock(struct s3_file *s3_file);
 void s3_store_file_unlock(struct s3_file *s3_file);

+#ifdef __cplusplus
+}
+#endif
+
 `#endif`
plugins/out_s3/s3_multipart.h (1)

94-107: Consider consistency in string parameter types.

s3_multipart_upload_file_part uses flb_sds_t for pre_signed_url (line 100), while s3_multipart_upload_file uses const char * for all string parameters (lines 104-106). This appears intentional based on the implementation where presigned URLs are managed as flb_sds_t, but documenting this design choice would improve maintainability.

plugins/out_s3/s3.h (1)

192-198: Consider documenting the initiate function's relationship with multipart module.

The s3_initiate_multipart_upload function in s3.h appears to be the orchestration entry point that coordinates with s3_multipart_upload_new and related functions in s3_multipart.h. A brief comment clarifying this relationship would help maintainers understand the layered architecture.

tests/runtime/out_s3_config.c (2)

687-983: Option coverage gaps: tests set config but don’t assert resulting request headers

Examples:

  • storage_class variations (Lines 695-744) don’t assert x-amz-storage-class.
  • content_type tests (Lines 841-912) don’t assert Content-Type.
  • send_content_md5 flag tests (Lines 914-982) don’t assert Content-MD5 presence/absence.

Add expect(HEADER, ...) on the appropriate requests so these tests fail when behavior regresses.

Example patch (storage_class header assertion)
@@
-        chain = FLB_AWS_CLIENT_MOCK(
+        chain = FLB_AWS_CLIENT_MOCK(
             response(expect(METHOD, FLB_HTTP_POST),
+                     expect(HEADER, "x-amz-storage-class", classes[i]),
                      set(STATUS, 200), set(PAYLOAD, S3_TEST_MOCK_CREATE_RESP),
                      set(PAYLOAD_SIZE, sizeof(S3_TEST_MOCK_CREATE_RESP) - 1)),

215-220: Avoid fixed S3_TEST_SLEEP_MS(2000); wait for a condition instead

The fixed delay (Line 218) makes the test slower and can be flaky across CI variance. Prefer waiting until the first upload has completed (e.g., poll mock state/store artifacts), then push the second batch.

tests/runtime/out_s3_multipart.c (1)

34-38: Consider explicit bailout if db_path/store_dir is NULL (don’t assume TEST_CHECK aborts)

If TEST_CHECK is non-fatal, later flb_output_set(..., db_path, store_dir, ...) can deref NULL. tests/runtime/out_s3_error_handling.c already uses an explicit if (...) return; pattern—worth standardizing here.

tests/runtime/out_s3_edge_cases.c (2)

83-92: empty_data_upload is “no crash” only; add an assertion that no S3 requests were made

Currently the test just sleeps (Line 85) and stops/destroys. If the plugin accidentally uploads on empty flush, the test might still pass. If the mock API exposes “unused request count”, assert it remains unchanged (or configure the mock to fail on any request for this test).


771-833: workers=100 in config_boundary_maximum may be unnecessarily heavy for CI

If this spawns 100 worker threads, it can slow the suite or cause resource issues. Consider using a smaller “high but reasonable” value (or guard/mark as expensive) while still validating boundary handling.

tests/runtime/out_s3_format.c (2)

17-687: Format/compression/log_key tests don’t assert the transformed upload payload

All tests only validate that the multipart request sequence happens (mock exhausted). That won’t catch regressions where:

  • format=json doesn’t serialize correctly,
  • compression=gzip|zstd|snappy isn’t applied,
  • log_key extraction doesn’t happen.

If the mock can assert on request body bytes, add minimal checks (e.g., gzip magic 1f 8b, zstd magic, snappy framed header) or assert on derived key suffix/headers where applicable.


56-60: Add explicit bailout if db_path/store_dir is NULL

To avoid NULL deref if TEST_CHECK isn’t fatal (or temp creation fails), add an explicit if (db_path == NULL || store_dir == NULL) { ...; return; } cleanup like out_s3_error_handling.c.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ff0643c and e7de562.

📒 Files selected for processing (33)
  • plugins/out_s3/CMakeLists.txt
  • plugins/out_s3/s3.c
  • plugins/out_s3/s3.h
  • plugins/out_s3/s3_auth.c
  • plugins/out_s3/s3_auth.h
  • plugins/out_s3/s3_blob.c
  • plugins/out_s3/s3_blob.h
  • plugins/out_s3/s3_multipart.c
  • plugins/out_s3/s3_multipart.h
  • plugins/out_s3/s3_queue.c
  • plugins/out_s3/s3_queue.h
  • plugins/out_s3/s3_store.c
  • plugins/out_s3/s3_store.h
  • plugins/out_s3/s3_stream.c
  • plugins/out_s3/s3_stream.h
  • tests/include/aws_client_mock.c
  • tests/include/aws_client_mock.h
  • tests/include/s3_test_helpers.h
  • tests/internal/CMakeLists.txt
  • tests/internal/aws_compress.c
  • tests/internal/aws_util.c
  • tests/internal/parquet.c
  • tests/runtime/CMakeLists.txt
  • tests/runtime/out_s3.c
  • tests/runtime/out_s3_blob.c
  • tests/runtime/out_s3_config.c
  • tests/runtime/out_s3_edge_cases.c
  • tests/runtime/out_s3_error_handling.c
  • tests/runtime/out_s3_format.c
  • tests/runtime/out_s3_multipart.c
  • tests/runtime/out_s3_queue.c
  • tests/runtime/out_s3_recovery.c
  • tests/runtime/out_s3_special_features.c
💤 Files with no reviewable changes (1)
  • tests/runtime/out_s3.c
🚧 Files skipped from review as they are similar to previous changes (7)
  • tests/include/s3_test_helpers.h
  • tests/internal/CMakeLists.txt
  • plugins/out_s3/s3_stream.c
  • plugins/out_s3/s3_auth.h
  • plugins/out_s3/s3_stream.h
  • plugins/out_s3/CMakeLists.txt
  • plugins/out_s3/s3_multipart.c
🧰 Additional context used
🧠 Learnings (11)
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.

Applied to files:

  • tests/internal/parquet.c
  • tests/internal/aws_compress.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.

Applied to files:

  • tests/internal/parquet.c
  • tests/internal/aws_compress.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like `#ifdef` FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

  • tests/internal/parquet.c
  • tests/internal/aws_compress.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use `#ifdef` FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

  • tests/internal/parquet.c
  • tests/internal/aws_compress.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.

Applied to files:

  • tests/internal/parquet.c
  • tests/internal/aws_compress.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

  • plugins/out_s3/s3_store.c
  • tests/internal/aws_compress.c
  • tests/runtime/out_s3_error_handling.c
  • plugins/out_s3/s3_queue.c
📚 Learning: 2025-09-04T12:35:36.904Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:3275-3282
Timestamp: 2025-09-04T12:35:36.904Z
Learning: The out_s3 plugin intentionally uses a simple numeric comparison for retry_limit (chunk->failures >= ctx->ins->retry_limit) rather than the standard Fluent Bit pattern that checks for FLB_OUT_RETRY_UNLIMITED (-1). The maintainer wants to keep this current behavior for consistency within the plugin.

Applied to files:

  • plugins/out_s3/s3_blob.c
  • plugins/out_s3/s3_queue.c
📚 Learning: 2025-09-04T12:35:22.872Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:1339-1344
Timestamp: 2025-09-04T12:35:22.872Z
Learning: In the Fluent Bit S3 plugin, the user prefers to maintain current retry_limit behavior without special handling for FLB_OUT_RETRY_UNLIMITED (-1), as there's no documentation indicating -1 should be used for infinite retries and consistency with current logic is preferred.

Applied to files:

  • plugins/out_s3/s3_blob.c
  • plugins/out_s3/s3_auth.c
  • plugins/out_s3/s3_queue.c
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.

Applied to files:

  • tests/internal/aws_compress.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.

Applied to files:

  • tests/internal/aws_compress.c
📚 Learning: 2025-12-17T18:45:02.283Z
Learnt from: edsiper
Repo: fluent/fluent-bit PR: 11286
File: src/flb_http_client.c:1657-1715
Timestamp: 2025-12-17T18:45:02.283Z
Learning: In the Fluent Bit OAuth2 implementation (src/flb_oauth2.c), the function flb_oauth2_get_access_token returns a pointer to the cached token stored in the OAuth2 context (ctx->access_token), not a new allocation. The OAuth2 context manages the token lifecycle, so callers must not free the returned token pointer.

Applied to files:

  • plugins/out_s3/s3_auth.c
  • plugins/out_s3/s3.h
🧬 Code graph analysis (15)
tests/include/aws_client_mock.h (1)
tests/include/aws_client_mock.c (1)
  • flb_aws_client_mock_clear_generator_instance (71-74)
tests/runtime/out_s3_multipart.c (2)
tests/include/s3_test_helpers.h (5)
  • s3_test_next (449-457)
  • s3_test_create_temp_db_path (138-195)
  • s3_test_create_temp_store_dir (205-268)
  • s3_test_cleanup_temp_db (428-441)
  • s3_test_cleanup_temp_store_dir (395-420)
tests/include/aws_client_mock.c (3)
  • flb_aws_client_mock_configure_generator (42-47)
  • flb_aws_client_get_mock_generator (137-140)
  • flb_aws_client_mock_clear_generator_instance (71-74)
tests/internal/parquet.c (3)
src/flb_parquet_impl.cpp (1)
  • field_name (548-548)
src/flb_pack.c (1)
  • flb_pack_json (530-535)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
tests/runtime/out_s3_blob.c (4)
tests/include/s3_test_helpers.h (3)
  • s3_test_next (449-457)
  • s3_test_cleanup_temp_db (428-441)
  • s3_test_cleanup_temp_store_dir (395-420)
tests/include/aws_client_mock.c (3)
  • flb_aws_client_mock_configure_generator (42-47)
  • flb_aws_client_get_mock_generator (137-140)
  • flb_aws_client_mock_clear_generator_instance (71-74)
src/flb_lib.c (6)
  • flb_create (143-225)
  • flb_input (266-276)
  • flb_start (983-994)
  • flb_lib_push (843-870)
  • flb_stop (1011-1055)
  • flb_destroy (228-263)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
tests/runtime/out_s3_queue.c (4)
tests/include/s3_test_helpers.h (5)
  • s3_test_next (449-457)
  • s3_test_create_temp_db_path (138-195)
  • s3_test_create_temp_store_dir (205-268)
  • s3_test_cleanup_temp_db (428-441)
  • s3_test_cleanup_temp_store_dir (395-420)
tests/include/aws_client_mock.c (3)
  • flb_aws_client_mock_configure_generator (42-47)
  • flb_aws_client_get_mock_generator (137-140)
  • flb_aws_client_mock_clear_generator_instance (71-74)
src/flb_lib.c (10)
  • flb_create (143-225)
  • flb_service_set (652-678)
  • flb_input (266-276)
  • flb_input_set (305-335)
  • flb_output (279-289)
  • flb_output_set (520-551)
  • flb_start (983-994)
  • flb_lib_push (843-870)
  • flb_stop (1011-1055)
  • flb_destroy (228-263)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
plugins/out_s3/s3_store.c (2)
src/flb_fstore.c (4)
  • flb_fstore_file_create (153-190)
  • flb_fstore_file_meta_set (74-108)
  • flb_fstore_file_append (298-329)
  • flb_fstore_create (479-525)
include/fluent-bit/flb_compat.h (1)
  • gmtime_r (76-82)
tests/runtime/out_s3_config.c (3)
tests/include/s3_test_helpers.h (5)
  • s3_test_next (449-457)
  • s3_test_create_temp_db_path (138-195)
  • s3_test_create_temp_store_dir (205-268)
  • s3_test_cleanup_temp_db (428-441)
  • s3_test_cleanup_temp_store_dir (395-420)
tests/include/aws_client_mock.c (3)
  • flb_aws_client_mock_configure_generator (42-47)
  • flb_aws_client_get_mock_generator (137-140)
  • flb_aws_client_mock_clear_generator_instance (71-74)
src/flb_lib.c (3)
  • flb_create (143-225)
  • flb_start (983-994)
  • flb_stop (1011-1055)
plugins/out_s3/s3_blob.c (5)
src/aws/flb_aws_util.c (1)
  • flb_get_s3_key (1311-1389)
plugins/out_s3/s3_multipart.c (2)
  • s3_multipart_abort (738-800)
  • s3_multipart_check_upload_exists (806-880)
src/flb_input_blob.c (2)
  • flb_input_blob_delivery_notification_destroy (222-231)
  • flb_input_blob_file_get_info (31-112)
src/flb_blob_db.c (1)
  • flb_blob_file_update_remote_id (586-617)
plugins/out_s3/s3_queue.c (1)
  • s3_queue_add_pending_file (269-282)
plugins/out_s3/s3_store.h (1)
plugins/out_s3/s3_store.c (1)
  • flb_chunk_file_open (553-632)
plugins/out_s3/s3_queue.h (1)
plugins/out_s3/s3_queue.c (11)
  • s3_queue_remove (317-326)
  • s3_queue_buffer_chunk (366-388)
  • s3_queue_add_file (126-140)
  • s3_queue_add_part (142-212)
  • s3_queue_add_pending_file (269-282)
  • s3_queue_add_file_unlocked (68-121)
  • s3_queue_add_pending_file_unlocked (218-264)
  • s3_queue_process_entry (765-824)
  • s3_queue_recover_all (867-922)
  • s3_queue_recover_from_database (1015-1103)
  • s3_queue_recover_from_fstore (1168-1228)
plugins/out_s3/s3_multipart.h (1)
plugins/out_s3/s3_multipart.c (13)
  • flb_s3_calculate_optimal_part_size (164-216)
  • s3_multipart_get_md5_base64 (218-248)
  • s3_multipart_create_headers (357-418)
  • s3_multipart_upload_new (451-500)
  • s3_multipart_upload_destroy (420-445)
  • s3_multipart_initiate (502-559)
  • s3_multipart_upload_part (561-653)
  • s3_multipart_complete (655-736)
  • s3_multipart_abort (738-800)
  • s3_multipart_abort_with_url (1022-1043)
  • s3_multipart_check_upload_exists (806-880)
  • s3_multipart_upload_file_part (894-1016)
  • s3_multipart_upload_file (1203-1294)
tests/runtime/out_s3_error_handling.c (3)
tests/include/s3_test_helpers.h (5)
  • s3_test_next (449-457)
  • s3_test_create_temp_db_path (138-195)
  • s3_test_create_temp_store_dir (205-268)
  • s3_test_cleanup_temp_db (428-441)
  • s3_test_cleanup_temp_store_dir (395-420)
tests/include/aws_client_mock.c (3)
  • flb_aws_client_mock_configure_generator (42-47)
  • flb_aws_client_get_mock_generator (137-140)
  • flb_aws_client_mock_clear_generator_instance (71-74)
src/flb_lib.c (3)
  • flb_create (143-225)
  • flb_service_set (652-678)
  • flb_start (983-994)
tests/internal/aws_util.c (1)
src/aws/flb_aws_util.c (1)
  • flb_get_s3_key (1311-1389)
plugins/out_s3/s3_queue.c (7)
include/fluent-bit/flb_mem.h (1)
  • flb_calloc (84-96)
plugins/out_s3/s3_store.c (4)
  • s3_store_buffer_put (217-258)
  • s3_store_file_unlock (641-646)
  • s3_store_file_delete (526-551)
  • s3_store_file_inactive (506-524)
plugins/out_s3/s3_auth.c (1)
  • s3_auth_fetch_presigned_url (359-423)
plugins/out_s3/s3_multipart.c (2)
  • s3_multipart_upload_file_part (894-1016)
  • s3_multipart_complete (655-736)
src/flb_blob_db.c (3)
  • flb_blob_file_update_remote_id (586-617)
  • flb_blob_db_lock (423-428)
  • flb_blob_db_unlock (430-435)
plugins/out_s3/s3.c (1)
  • s3_upload_file (1015-1120)
lib/cfl/src/cfl_sds.c (1)
  • cfl_sds_len (88-91)
plugins/out_s3/s3_blob.h (1)
plugins/out_s3/s3_blob.c (4)
  • s3_blob_register_parts (403-436)
  • s3_blob_process_events (451-538)
  • s3_blob_recover_state (171-199)
  • s3_blob_notify_delivery (124-161)
🔇 Additional comments (98)
plugins/out_s3/s3_store.c (19)

20-49: Cross-platform path handling looks good.

The cross-platform FLB_PATH_MAX definition correctly handles Windows (MAX_PATH/_MAX_PATH) and POSIX (PATH_MAX) with sensible fallbacks. The inclusion of flb_compat.h and chunkio headers aligns with the codebase patterns.


51-81: Filename hash generation is acceptable.

The djb2-based hash combined with time provides reasonable uniqueness. Note that if tm.tv_nsec is 0, hash2 becomes 0, but this is unlikely in practice and the primary hash from the tag still provides differentiation.


83-114: LGTM!

The function correctly handles the iteration with mk_list_foreach_safe to safely delete files with NULL data, and properly checks both meta size and tag content before returning a match.


116-140: LGTM!

Buffer space checking logic is correct with proper handling of unlimited buffers (size 0) and warning threshold calculations. The early return on line 121-123 correctly guards against division by zero.


142-159: LGTM!

The cleanup helper correctly handles resource release in the right order and safely handles NULL inputs.


161-215: LGTM!

The function has comprehensive error handling with proper cleanup at each failure point using cleanup_failed_file. Resource ownership is clear, and the filename is correctly destroyed after use.


217-258: LGTM!

The function correctly handles append failures by cleaning up newly created files (addressing previous review feedback). The file_created flag properly tracks ownership for cleanup decisions.


260-295: LGTM!

Cross-platform stat handling is properly implemented with _stat/stat guards. The function correctly uses the chunkio constant CIO_FILE_HEADER_MIN and validates the path length with snprintf.


297-335: LGTM!

The file context creation for restoration is well-structured. Note that first_log_time and create_time are set to the current time rather than preserved from the original file, but this is acceptable since the original timestamps aren't stored in the chunk metadata.


337-363: LGTM!

The restoration loop correctly handles existing contexts and gracefully continues on failures. Using mk_list_foreach (non-safe) is appropriate here since the list isn't modified during iteration.


365-381: LGTM!

The function correctly restores buffered files from all streams except the active and metadata streams, which are handled separately.


383-407: LGTM!

The stream name creation uses platform-appropriate time functions with correct error checking (Windows gmtime_s returns 0 on success, POSIX gmtime_r returns non-NULL on success). The Windows-safe format with hyphens ensures consistent naming across platforms.


409-449: LGTM!

The initialization function properly creates the file store and stream, with correct error handling and cleanup on each failure path. The stream_name is correctly destroyed after use, and the CI detection logic has been removed as per previous feedback.


451-484: LGTM!

The exit function correctly frees all allocated resources in the proper order (stream_path before s3_file), nullifies pointers, and destroys the file store.


486-504: LGTM!

Simple and correct check for data presence across all streams.


506-524: LGTM!

The memory leak from the previous review has been addressed - stream_path is now properly freed before the s3_file struct.


526-551: LGTM!

The function has robust null checks and validates ownership consistency before deletion. The check at lines 536-538 prevents potential corruption if fsf->data doesn't match the expected s3_file.


553-632: LGTM!

The chunk file opener has comprehensive validation including magic byte checks, file size validation before seeking past metadata, and proper cleanup on all error paths. The meta_len validation against remaining file size (lines 613-620) addresses previous review concerns about corrupt headers.


634-646: LGTM!

Simple and correct lock/unlock helper functions with null checks.

plugins/out_s3/s3_store.h (1)

26-34: LGTM!

The s3_file struct is well-organized with clear member documentation. The field rename from file_path to stream_path has been applied.

tests/runtime/CMakeLists.txt (2)

238-246: LGTM - Well-organized modular test structure.

Splitting the monolithic S3 test into nine focused test files improves maintainability and allows targeted testing of specific functionality (config, multipart, format, error handling, edge cases, special features, queue, recovery, blob).


300-306: LGTM - Appropriate Windows-specific linking.

The conditional linking of ole32 and rpcrt4 for S3 tests on Windows is correct for resolving RPC/OLE functions used in the test helpers (CoCreateGuid, UuidToStringA, RpcStringFreeA). The regex pattern correctly matches all out_s3* test source files.

tests/internal/parquet.c (4)

1-37: LGTM - Well-structured test file with comprehensive coverage.

The test file properly includes all necessary standard headers and organizes tests into clear categories (functional, data quality, scale/boundary, compression, negative, real-world scenarios).


234-248: LGTM - Clean test context framework.

The test_context structure provides good encapsulation of test resources, and the init/cleanup pattern ensures proper resource management.


1111-1130: Remove unused variable ret.

The variable ret is declared at line 1113 (before this version's cleanup) but the current version at line 1114 shows ctx is used. Looking at the test logic, no ret variable is needed since run_conversion result is checked directly via TEST_CHECK.

Actually, reviewing again - this test function at lines 1111-1130 doesn't declare ret. The past comment may have been addressed.


2081-2156: LGTM - Comprehensive TEST_LIST with clear categorization.

The test list is well-organized into six categories with clear comments, covering functional tests, data quality, scale/boundary, compression, negative tests, and real-world scenarios. The NULL terminator is properly included.

plugins/out_s3/s3_multipart.h (2)

1-25: LGTM - Clean header structure with proper guards.

The header follows Fluent Bit conventions with appropriate include guards, copyright header, and necessary includes for dependencies.


77-89: LGTM - Excellent documentation for recovery-critical function.

The detailed comment explaining that s3_key must be the persisted key (not regenerated) is valuable for preventing subtle bugs during recovery scenarios where time-based or sequence-based keys could mismatch.

plugins/out_s3/s3.h (4)

29-46: LGTM - Correct AWS S3 multipart constraints.

The constants correctly reflect AWS S3 limits:

  • Part size: 5 MiB minimum (S3_AWS_MIN_PART_SIZE) to 5 GiB maximum (S3_AWS_MAX_PART_SIZE)
  • Maximum parts: 10,000 (S3_AWS_MAX_PARTS)
  • Maximum file size: 50 TB (MAX_FILE_SIZE)
  • Default part size: 100 MiB is a reasonable default

52-74: LGTM - Well-structured upload queue.

The upload_queue structure properly tracks all necessary state for queued uploads including file IDs, offsets, S3 key, upload ID, retry counter, and timing information. The needs_upload_creation flag supports both new uploads and recovery scenarios.


150-158: LGTM - Good thread safety with upload_queue_lock.

Adding pthread_mutex_t upload_queue_lock to protect the upload queue access is essential for correct concurrent operation. The state flags (initial_upload_done, is_exiting, needs_recovery) provide clear lifecycle management.


175-178: LGTM - Parquet integration fields properly declared.

The format, schema_str, and cached_arrow_schema fields provide clean integration with the Parquet encoder. The forward declaration at line 33 properly resolves the pointer type without requiring the full Parquet header.

plugins/out_s3/s3_blob.h (1)

1-43: LGTM - Clean blob handling API.

The header provides a well-defined public API for S3 blob operations:

  • s3_blob_register_parts: Splits files into parts for multipart upload
  • s3_blob_process_events: Handles incoming blob events
  • s3_blob_recover_state: Recovery for stale/aborted uploads
  • s3_blob_notify_delivery: Delivery status notification

The documentation above each function clearly explains its purpose. The use of cfl_sds_t for source and file_path in s3_blob_notify_delivery is consistent with the implementation in s3_blob.c.

plugins/out_s3/s3_auth.c (5)

1-128: LGTM - Initialization and URL parsing implementation is well-structured.

The endpoint initialization properly handles TLS context creation for HTTPS, upstream setup, and cleanup on failure paths. The parse_url function now correctly calls url_parts_destroy on error paths.


130-171: LGTM - Port validation and header setup are correct.

The port validation at lines 140-141 properly rejects port 0 and ensures the valid range of 1-65535. Header setup handles both basic auth and bearer token authentication patterns cleanly.


173-260: LGTM - HTTP request handling with proper resource cleanup.

The presigned URL request function properly:

  • Validates and parses the URL
  • Handles connection and HTTP client creation with cleanup on failure
  • Checks response status and handles both payload and no-payload error cases
  • Manages the result URL allocation correctly

262-357: LGTM - URL path construction handles edge cases correctly.

The path builder properly:

  • URL-encodes both the S3 key and upload ID
  • Strips leading slash from encoded key to avoid double slashes
  • Validates required parameters for each operation type
  • Cleans up allocated resources on all error paths

359-423: LGTM - Presigned URL fetching with proper slash handling.

The implementation correctly handles the edge case where both the endpoint URL ends with / and the path starts with /, avoiding double slashes in the final URL. The cleanup paths properly free both url_path and full_url.

plugins/out_s3/s3_blob.c (7)

41-122: LGTM - Multipart abort implementation with proper validation.

The function correctly:

  • Validates file_remote_id before proceeding
  • Uses stored s3_key when available, falling back to regeneration with appropriate warnings
  • Properly cleans up all allocated resources including the presigned URL on error

124-161: LGTM - Delivery notification with proper error handling.

The notification function now properly:

  • Checks allocation of the notification structure
  • Validates cfl_sds_create return for the path (previously flagged)
  • Calls flb_notification_cleanup on enqueue failure

171-199: LGTM - Recovery state handling with proper locking.

The lock/unlock return values are now checked and logged appropriately. The function coordinates stale and aborted file handling under the DB lock.


201-253: LGTM - Stale file recovery with abort failure logging.

The implementation now logs abort failures with detailed context (file_id, path, upload_id, parts, return code) as previously requested.


255-401: LGTM - Aborted file handling with comprehensive retry logic.

The implementation properly:

  • Uses stored s3_key for validation when available
  • Falls back to key regeneration with warnings for legacy data
  • Distinguishes between retry scenarios (with/without upload_id)
  • Logs abort failures for discarded files with full context

451-538: LGTM - Event processing with proper cleanup and error handling.

The implementation correctly:

  • Uses int64_t file_id to avoid truncation from flb_blob_db_file_insert
  • Checks file_id < 0 for error detection (line 496)
  • Defensively frees partial allocations on flb_input_blob_file_get_info failure
  • Uses %zu format specifier for size_t (line 497)
  • Cleans up DB entries on registration or queue failures

403-436: Error check pattern is correct.

flb_blob_db_file_part_insert() returns 0 (FLB_BLOB_DB_SUCCESS) on success and negative error codes (e.g., FLB_BLOB_DB_ERROR_FILE_PART_INSERT = -209) on failure. The check if (ret < 0) at line 426 properly detects all error conditions.

plugins/out_s3/s3_queue.h (2)

20-27: LGTM - Header includes and return type are correct.

The s3_queue_remove declaration now returns int (line 26), matching the implementation in s3_queue.c (lines 317-326) which returns -1 on NULL entry and 0 on success.


89-101: LGTM - Phase documentation is now consistent.

The three-phase architecture comment (lines 91-93) now correctly documents Phase 0, 1, 2, matching the implementation in s3_queue_recover_all.

plugins/out_s3/s3_queue.c (10)

22-30: LGTM - Windows portability for file access check.

The flb_access macro provides cross-platform compatibility by mapping to _access on Windows and standard access on POSIX systems.


68-121: LGTM - Queue entry allocation with proper cleanup.

The unlocked add function properly handles allocation failures with cascading cleanup of previously allocated resources (tag_copy, path_copy, entry).


142-212: LGTM - Part queue entry with complete field initialization.

The s3_queue_add_part function:

  • Properly allocates and initializes all entry fields
  • Has cascading cleanup for all allocation failures
  • Acquires the lock only when adding to the list

284-326: LGTM - Entry destruction and removal are properly separated.

The design cleanly separates:

  • s3_queue_entry_destroy: Frees memory without list manipulation
  • s3_queue_remove: Removes from list and then destroys

This allows callers to handle list removal themselves when needed.


396-485: LGTM - DB-tracked upload with ETag memory management.

The ETag SDS string is now properly destroyed after being saved to the database (lines 452-454) and on failure paths (lines 470-474), fixing the memory leak identified in past reviews.


541-713: LGTM - Multipart completion with proper validation and cleanup.

The implementation now:

  • Validates part_count bounds (lines 599-603) before allocation
  • Properly cleans up all ETags in the 10000-element array (lines 691-695)
  • Handles abort scenarios correctly

783-798: LGTM - DB NULL check added for upload creation failure.

The flb_blob_db_file_set_aborted_state call is now guarded by ctx->blob_db.db != NULL (line 793), consistent with other DB operations in the file.


867-922: LGTM - Three-phase recovery with proper lock handling.

The recovery implementation correctly:

  • Acquires and releases locks around zombie cleanup
  • Logs lock failures appropriately
  • Coordinates the three phases in order

1031-1103: LGTM - Database recovery with endpoint mismatch handling.

The implementation now resets in_progress flags before skipping files with endpoint mismatches (line 1047), preventing files from being hidden until zombie cleanup.


1167-1228: LGTM - Fstore recovery for non-DB-tracked files.

The fstore recovery properly:

  • Skips the metadata stream
  • Checks lock status and failure counts
  • Locks files before adding to queue and unlocks on failure
tests/include/aws_client_mock.h (2)

6-8: LGTM - Improved documentation for mock usage limitations.

The updated comments clearly document important usage constraints: single-test-per-executable limitation and thread-safety concerns with static variables.


192-197: LGTM - Clear generator instance function for test cleanup.

This function addresses a common test pattern where flb_destroy() frees the mock client, and subsequent calls to configure_generator would otherwise cause use-after-free. The documentation clearly explains when to use this function.

tests/internal/aws_compress.c (4)

8-8: LGTM!

The include for flb_snappy.h is correctly added to support the new Snappy framed compression test.


77-245: Well-structured Snappy framed compression test.

The test comprehensively covers:

  1. Simple string round-trip validation
  2. Large data spanning multiple 64KB blocks
  3. Concatenated frame handling (streaming scenario)
  4. Graceful failure on NULL/empty input

Memory management is handled correctly with proper cleanup in all paths.


413-413: LGTM!

Test registration follows the established pattern for compression tests.


482-483: LGTM!

Minor formatting cleanup.

tests/include/aws_client_mock.c (6)

2-10: Helpful documentation for the mock pattern.

The comment clearly explains that this .c file is directly included (not separately compiled) and warns about symbol conflicts if multiple tests are linked together. This is important guidance for maintainers.


54-64: Good defensive pattern for double-free prevention.

Caching the instance locally and clearing the global before destruction prevents re-entrant double-free scenarios. This is a robust approach.


66-74: Necessary lifecycle management function.

This function correctly addresses the case where the S3 plugin has already freed the mock client during flb_destroy(). Clearing the pointer without freeing prevents use-after-free when configure_generator is called again.


229-233: LGTM!

Adding FLB_HTTP_DELETE completes the HTTP method coverage for mock expectations.


266-299: Correct dynamic allocation for response data.

The DATA field must be dynamically allocated since flb_http_client_destroy() will free it. The implementation correctly:

  • Searches for explicit size parameters first
  • Falls back to strlen for string data
  • Allocates with null-terminator space
  • Sets both data_len and data_size

Note: The strlen fallback assumes the config value is a null-terminated string, which is the expected usage pattern for mock responses.


306-311: LGTM!

The macro correctly excludes FLB_AWS_CLIENT_MOCK_SET_DATA since it's handled specially above with dynamic allocation.

tests/runtime/out_s3_queue.c (9)

1-12: LGTM!

Good test file structure with proper includes. Note that including aws_client_mock.c directly (line 8) follows the documented pattern where each test is built as a standalone executable.


14-74: Well-structured basic queue test.

The test correctly:

  • Sets up a mock request chain for multipart upload flow
  • Creates isolated temp paths for db and store
  • Configures the S3 output plugin
  • Exercises the queue with data push
  • Performs comprehensive cleanup including temp files

The use of flb_aws_client_mock_clear_generator_instance() after flb_destroy() correctly handles the case where the plugin may have already freed the mock client.


77-142: LGTM!

Multiple workers test correctly configures workers=5 and pushes multiple chunks to exercise worker distribution.


145-206: Good retry mechanism test coverage.

The mock chain correctly simulates a 500 error on the first UploadPart, followed by a successful retry. This validates the plugin's error handling and retry logic.


209-269: LGTM!

Timeout-triggered upload test correctly sets upload_timeout=2s with a large total_file_size=100M to ensure the upload is triggered by timeout rather than size.


272-335: LGTM!

Concurrent uploads test with 10 workers and 10 data pushes exercises the concurrent processing path.


338-400: LGTM!

Graceful shutdown test with grace=2 validates that pending queue entries are handled during shutdown.


403-467: LGTM!

High concurrency stress test with 16 workers and 20 rapid pushes (10ms intervals) provides good coverage for race condition detection.


469-479: LGTM!

Test list correctly registers all 7 queue tests.

tests/runtime/out_s3_blob.c (11)

1-12: LGTM!

Good test file structure with proper includes for blob database tests.


19-82: LGTM!

Basic blob_database_file configuration test with proper setup and cleanup.


137-151: Good async-safe stat() verification.

The retry loop correctly handles the race condition between flb_start() and async database creation. Using S3_TEST_SLEEP_MS(100) ensures Windows compatibility.


166-172: LGTM!

Good documentation explaining that these tests verify configuration acceptance rather than actual retry behavior, which is handled by the recovery mechanism.


175-239: LGTM!

File delivery attempt limit configuration test.


242-306: LGTM!

Part delivery attempt limit configuration test.


309-373: LGTM!

Unlimited delivery attempts (-1) configuration test.


380-443: LGTM!

Upload part freshness limit configuration test.


446-509: LGTM!

Short freshness limit configuration test.


516-579: LGTM!

Upload chunk size configuration test.


585-595: LGTM!

Test list correctly registers all 8 blob tests.

tests/internal/aws_util.c (8)

47-68: Comprehensive test coverage for new S3 key format variables.

The test macros cover:

  • $FILE_PATH: Full path substitution
  • $FILE_NAME: Filename-only extraction
  • Combined usage of both variables
  • Edge cases: relative paths, triple dots, parent traversal, hidden files, mixed dots/slashes

This provides thorough coverage of the path sanitization logic in flb_get_s3_key.


76-76: LGTM!

Good representative file path for testing with a realistic nginx log rotation example.


200-200: LGTM!

All existing flb_get_s3_key calls correctly updated with NULL file_path parameter for backward compatibility.


402-416: LGTM!

Test validates basic $FILE_PATH substitution with the full sanitized path.


418-432: LGTM!

Test validates $FILE_NAME extraction (just the filename portion).


434-448: LGTM!

Test validates using both $FILE_PATH and $FILE_NAME in the same format string.


450-533: Good edge case coverage for path sanitization.

The edge case tests verify that the path sanitization logic correctly handles:

  • Relative paths (./)
  • Triple dot prefixes (...)
  • Parent traversal attempts (../../../)
  • Hidden files (.hidden)
  • Mixed dots and slashes

This ensures S3 keys are safe from path traversal vulnerabilities.


550-558: LGTM!

All 8 new tests correctly registered in TEST_LIST.

tests/runtime/out_s3_config.c (1)

6-10: The direct .c inclusion pattern is intentional and appropriate for this codebase's test architecture.

The runtime tests are built as separate standalone executables (one per test file), not linked into a single binary. The aws_client_mock.c file explicitly documents this pattern with a note stating: "Each test is built as a standalone executable, avoiding symbol conflicts." The static declarations in aws_client_mock.c (static flb_aws_client_mock_instance, static mock_generator, static mock_client_vtable) ensure each test gets its own isolated mock state with no duplicate symbol conflicts.

Separately compiling and linking aws_client_mock.c as suggested would actually break the per-test mock isolation that the current design provides. The current approach is idiomatic for the runtime test model used here.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines 521 to 670
/* Test: auto_retry_requests enabled */
void flb_test_auto_retry_enabled(void)
{
flb_ctx_t *ctx;
int in_ffd, out_ffd, ret;
char *db_path, *store_dir;
struct flb_out_s3_init_options init_options = {0};
struct flb_aws_client_mock_request_chain *chain;

s3_test_next();

/* With auto_retry enabled, transient failures should be retried */
chain = FLB_AWS_CLIENT_MOCK(
response(
set(STATUS, 200),
set(PAYLOAD, S3_TEST_MOCK_CREATE_RESP),
set(PAYLOAD_SIZE, sizeof(S3_TEST_MOCK_CREATE_RESP) - 1)
),
response(
set(STATUS, 200),
set(DATA, S3_TEST_MOCK_UPLOAD_PART_RESP),
set(DATA_SIZE, sizeof(S3_TEST_MOCK_UPLOAD_PART_RESP) - 1)
),
response(
set(STATUS, 200),
set(PAYLOAD, S3_TEST_MOCK_COMPLETE_RESP),
set(PAYLOAD_SIZE, sizeof(S3_TEST_MOCK_COMPLETE_RESP) - 1)
)
);

flb_aws_client_mock_configure_generator(chain);
init_options.client_generator = flb_aws_client_get_mock_generator();

db_path = s3_test_create_temp_db_path("err");
store_dir = s3_test_create_temp_store_dir("err");
TEST_CHECK(db_path != NULL);
TEST_CHECK(store_dir != NULL);
if (db_path == NULL || store_dir == NULL) {
flb_free(db_path);
flb_free(store_dir);
return;
}

ctx = flb_create();
TEST_CHECK(ctx != NULL);
flb_service_set(ctx, "flush", "1", "grace", "1", NULL);

in_ffd = flb_input(ctx, (char *)"lib", NULL);
TEST_CHECK(in_ffd >= 0);
flb_input_set(ctx, in_ffd, "tag", "test", NULL);

out_ffd = flb_output(ctx, (char *)"s3", (struct flb_lib_out_cb *)&init_options);
TEST_CHECK(out_ffd >= 0);
flb_output_set(ctx, out_ffd, "match", "*", "region", S3_TEST_DEFAULT_REGION, "bucket", S3_TEST_DEFAULT_BUCKET,
"blob_database_file", db_path, "store_dir", store_dir,
"total_file_size", "10M", "upload_chunk_size", "5M",
"upload_timeout", "1s", "auto_retry_requests", "true", NULL);

ret = flb_start(ctx);
TEST_CHECK(ret == 0);
ret = flb_lib_push(ctx, in_ffd, (char *)JSON_TD, sizeof(JSON_TD) - 1);
TEST_CHECK(ret >= 0);

/* Poll until mock chain exhausted instead of fixed sleep */
S3_TEST_WAIT_MOCK_EXHAUSTED(5, "auto_retry_enabled");

flb_stop(ctx);
flb_destroy(ctx);
flb_aws_client_mock_clear_generator_instance();
s3_test_cleanup_temp_db(db_path);
s3_test_cleanup_temp_store_dir(store_dir);
flb_free(db_path);
flb_free(store_dir);
}

/* Test: auto_retry_requests disabled */
void flb_test_auto_retry_disabled(void)
{
flb_ctx_t *ctx;
int in_ffd, out_ffd, ret;
char *db_path, *store_dir;
struct flb_out_s3_init_options init_options = {0};
struct flb_aws_client_mock_request_chain *chain;

s3_test_next();

chain = FLB_AWS_CLIENT_MOCK(
response(
set(STATUS, 200),
set(PAYLOAD, S3_TEST_MOCK_CREATE_RESP),
set(PAYLOAD_SIZE, sizeof(S3_TEST_MOCK_CREATE_RESP) - 1)
),
response(
set(STATUS, 200),
set(DATA, S3_TEST_MOCK_UPLOAD_PART_RESP),
set(DATA_SIZE, sizeof(S3_TEST_MOCK_UPLOAD_PART_RESP) - 1)
),
response(
set(STATUS, 200),
set(PAYLOAD, S3_TEST_MOCK_COMPLETE_RESP),
set(PAYLOAD_SIZE, sizeof(S3_TEST_MOCK_COMPLETE_RESP) - 1)
)
);

flb_aws_client_mock_configure_generator(chain);
init_options.client_generator = flb_aws_client_get_mock_generator();

db_path = s3_test_create_temp_db_path("err");
store_dir = s3_test_create_temp_store_dir("err");
TEST_CHECK(db_path != NULL);
TEST_CHECK(store_dir != NULL);
if (db_path == NULL || store_dir == NULL) {
flb_free(db_path);
flb_free(store_dir);
return;
}

ctx = flb_create();
TEST_CHECK(ctx != NULL);
flb_service_set(ctx, "flush", "1", "grace", "1", NULL);

in_ffd = flb_input(ctx, (char *)"lib", NULL);
TEST_CHECK(in_ffd >= 0);
flb_input_set(ctx, in_ffd, "tag", "test", NULL);

out_ffd = flb_output(ctx, (char *)"s3", (struct flb_lib_out_cb *)&init_options);
TEST_CHECK(out_ffd >= 0);
flb_output_set(ctx, out_ffd, "match", "*", "region", S3_TEST_DEFAULT_REGION, "bucket", S3_TEST_DEFAULT_BUCKET,
"blob_database_file", db_path, "store_dir", store_dir,
"total_file_size", "10M", "upload_chunk_size", "5M",
"upload_timeout", "1s", "auto_retry_requests", "false", NULL);

ret = flb_start(ctx);
TEST_CHECK(ret == 0);
ret = flb_lib_push(ctx, in_ffd, (char *)JSON_TD, sizeof(JSON_TD) - 1);
TEST_CHECK(ret >= 0);

/* Poll until mock chain exhausted instead of fixed sleep */
S3_TEST_WAIT_MOCK_EXHAUSTED(5, "auto_retry_disabled");

flb_stop(ctx);
flb_destroy(ctx);
flb_aws_client_mock_clear_generator_instance();
s3_test_cleanup_temp_db(db_path);
s3_test_cleanup_temp_store_dir(store_dir);
flb_free(db_path);
flb_free(store_dir);
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

auto_retry_enabled/disabled don’t test retry behavior (mock chain is all-200)

Both tests currently run the same successful request chain, so they can’t catch regressions in auto_retry_requests. Inject a transient failure (e.g., UploadPart 503/500 then 200) and assert the expected sequence differs between enabled vs disabled.

Example direction (enabled: retry UploadPart once)
@@
-    /* With auto_retry enabled, transient failures should be retried */
-    chain = FLB_AWS_CLIENT_MOCK(
-        response(
-            set(STATUS, 200),
-            set(PAYLOAD, S3_TEST_MOCK_CREATE_RESP),
-            set(PAYLOAD_SIZE, sizeof(S3_TEST_MOCK_CREATE_RESP) - 1)
-        ),
-        response(
-            set(STATUS, 200),
-            set(DATA, S3_TEST_MOCK_UPLOAD_PART_RESP),
-            set(DATA_SIZE, sizeof(S3_TEST_MOCK_UPLOAD_PART_RESP) - 1)
-        ),
-        response(
-            set(STATUS, 200),
-            set(PAYLOAD, S3_TEST_MOCK_COMPLETE_RESP),
-            set(PAYLOAD_SIZE, sizeof(S3_TEST_MOCK_COMPLETE_RESP) - 1)
-        )
-    );
+    /* With auto_retry enabled, transient failures should be retried */
+    chain = FLB_AWS_CLIENT_MOCK(
+        response(expect(METHOD, FLB_HTTP_POST),
+                 set(STATUS, 200),
+                 set(PAYLOAD, S3_TEST_MOCK_CREATE_RESP),
+                 set(PAYLOAD_SIZE, sizeof(S3_TEST_MOCK_CREATE_RESP) - 1)),
+        response(expect(METHOD, FLB_HTTP_PUT),
+                 set(STATUS, 503),
+                 set(PAYLOAD, AWS_ERROR_SLOW_DOWN),
+                 set(PAYLOAD_SIZE, sizeof(AWS_ERROR_SLOW_DOWN) - 1)),
+        response(expect(METHOD, FLB_HTTP_PUT),
+                 set(STATUS, 200),
+                 set(DATA, S3_TEST_MOCK_UPLOAD_PART_RESP),
+                 set(DATA_SIZE, sizeof(S3_TEST_MOCK_UPLOAD_PART_RESP) - 1)),
+        response(expect(METHOD, FLB_HTTP_POST),
+                 set(STATUS, 200),
+                 set(PAYLOAD, S3_TEST_MOCK_COMPLETE_RESP),
+                 set(PAYLOAD_SIZE, sizeof(S3_TEST_MOCK_COMPLETE_RESP) - 1))
+    );
🤖 Prompt for AI Agents
In `@tests/runtime/out_s3_error_handling.c` around lines 521 - 669, Modify the
mock chains in flb_test_auto_retry_enabled and flb_test_auto_retry_disabled so
they differ: for flb_test_auto_retry_enabled build the FLB_AWS_CLIENT_MOCK
sequence with CreateMultipartUpload (set STATUS 200), UploadPart returning a
transient failure first (set STATUS 503 or 500) followed by a successful
UploadPart (set STATUS 200), then CompleteMultipartUpload (set STATUS 200); for
flb_test_auto_retry_disabled build the chain with CreateMultipartUpload (200), a
single failing UploadPart (503/500) and then Complete (200) so the failing
UploadPart is not retried. Keep using flb_aws_client_mock_configure_generator /
init_options.client_generator and S3_TEST_WAIT_MOCK_EXHAUSTED, and update any
expectations/assertions so the enabled test consumes the extra retry response
(i.e. mock chain exhausted) while the disabled test does not retry and thus the
observed request sequence differs.

Add test coverage for:
- AWS utilities
- Parquet implementation
- S3 output configuration
- Edge cases handling
- Error handling
- Format conversion
- Multipart uploads
- Queue management
- Recovery mechanisms
- Special features

Signed-off-by: Arbin Cheng <arbin.cheng@coins.ph>
Signed-off-by: Arbin <arbin.cheng@coins.ph>
@kalavt kalavt force-pushed the feature/aws-s3-parquet branch from e7de562 to c1dd6a8 Compare January 15, 2026 13:35
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@tests/runtime/out_s3_blob.c`:
- Around line 75-82: Replace calls to
flb_aws_client_mock_clear_generator_instance() with
flb_aws_client_mock_destroy_generator() to properly free the mock wrapper and
request chain; update every teardown block (e.g., the instance at the shown
lines and the other occurrences around the test cleanup blocks) so that after
flb_stop(ctx) and flb_destroy(ctx) you call
flb_aws_client_mock_destroy_generator() instead of
flb_aws_client_mock_clear_generator_instance(), preserving the surrounding
cleanup calls (s3_test_cleanup_temp_db, s3_test_cleanup_temp_store_dir,
flb_free) in the same order.

In `@tests/runtime/out_s3_edge_cases.c`:
- Around line 111-122: Replace calls to
flb_aws_client_mock_clear_generator_instance() with
flb_aws_client_mock_destroy_generator() to properly tear down the AWS client
mock and avoid leaks; update every test teardown that currently calls
flb_aws_client_mock_clear_generator_instance() (e.g., in
flb_test_empty_data_upload and the other teardown blocks around the occurrences
listed) so they call flb_aws_client_mock_destroy_generator() instead, ensuring
the mock generator is destroyed during flb_stop()/flb_destroy() cleanup.
♻️ Duplicate comments (3)
tests/internal/parquet.c (1)

288-304: Avoid fixed /tmp/flb_test_*.{msgpack,parquet} paths (collisions + non-portable).
This is a known flake source if tests run concurrently or on non-POSIX environments.

Suggested direction (centralized temp-path helper + use it everywhere)
 static int init_test_context(test_context *ctx, const char *test_name)
 {
@@
-    snprintf(ctx->msgpack_file, sizeof(ctx->msgpack_file),
-             "/tmp/flb_test_%s.msgpack", test_name);
-    snprintf(ctx->parquet_file, sizeof(ctx->parquet_file),
-             "/tmp/flb_test_%s.parquet", test_name);
+    /* TODO: generate unique temp paths (mkstemp/GetTempFileName) instead of fixed /tmp names */
+    snprintf(ctx->msgpack_file, sizeof(ctx->msgpack_file),
+             "/tmp/flb_test_%s_%ld.msgpack", test_name, (long) getpid());
+    snprintf(ctx->parquet_file, sizeof(ctx->parquet_file),
+             "/tmp/flb_test_%s_%ld.parquet", test_name, (long) getpid());

Also applies to: 732-786, 1872-1939, 1941-2016

tests/runtime/out_s3_config.c (1)

75-83: Fix mock teardown: replace flb_aws_client_mock_clear_generator_instance() with flb_aws_client_mock_destroy_generator() to avoid leaks.

Proposed patch
-    flb_aws_client_mock_clear_generator_instance();
+    flb_aws_client_mock_destroy_generator();

Also applies to: 142-150, 222-230, 288-296, 354-362, 421-429, 492-500, 549-557, 616-624, 673-681, 751-759, 826-834, 903-911, 973-981, 1044-1052, 1112-1120, 1170-1178, 1233-1239, 1280-1286

tests/runtime/out_s3_edge_cases.c (1)

124-208: Add the same db_path/store_dir NULL-guard pattern used in flb_test_empty_data_upload() to the rest of the tests.
Otherwise a temp-path allocation failure can turn into a NULL deref inside flb_output_set().

Also applies to: 210-371, 373-455, 457-625, 627-706, 708-790, 792-874, 876-957, 959-1103

🧹 Nitpick comments (6)
tests/internal/aws_compress.c (1)

214-230: Potential memory leak if decompression fails after partial allocation.

If flb_snappy_uncompress_framed_data fails (returns non-zero) but has already partially allocated decompressed_data, the current code path skips the free at line 229. This could result in a memory leak depending on the implementation's error handling behavior.

Consider initializing decompressed_data = NULL before the call and adding a defensive free on failure:

♻️ Suggested defensive cleanup
+                decompressed_data = NULL;
                 /* Decompress concatenated data */
                 ret = flb_snappy_uncompress_framed_data(concatenated, concat_len,
                                                         &decompressed_data, &decompressed_len);
                 TEST_CHECK(ret == 0);
                 TEST_MSG("flb_snappy_uncompress_framed_data failed for concatenated chunks, ret=%d", ret);
 
                 if (ret == 0) {
                     TEST_CHECK(decompressed_len == total_input_len);
                     TEST_MSG("Length mismatch for concatenated: expected %zu, got %zu",
                              total_input_len, decompressed_len);
 
                     ret = memcmp(expected, decompressed_data, total_input_len);
                     TEST_CHECK(ret == 0);
                     TEST_MSG("Content mismatch for concatenated chunks");
 
                     flb_free(decompressed_data);
                 }
+                else if (decompressed_data != NULL) {
+                    flb_free(decompressed_data);
+                }
tests/runtime/out_s3_error_handling.c (1)

521-669: Consider adding transient failures to differentiate auto_retry tests.

Both flb_test_auto_retry_enabled and flb_test_auto_retry_disabled use identical all-success mock chains. They verify the configuration is accepted but don't actually test retry behavior differences.

To meaningfully test auto_retry_requests, inject a transient failure (e.g., 503 SlowDown on UploadPart) and verify that:

  • With auto_retry_requests=true: the mock chain includes retry response(s)
  • With auto_retry_requests=false: the request is not retried

This is a test coverage gap rather than a bug.

tests/runtime/out_s3_blob.c (1)

18-82: Consider a small setup/teardown helper to reduce copy/paste across tests.
Not blocking, but these tests repeat the same context/mocks/temp-dir lifecycle with only 1–2 knobs changed, which increases maintenance risk.

Also applies to: 174-239, 241-373, 379-579

tests/internal/parquet.c (1)

408-441: CI runtime risk: record-count tests are very large (70k–131k+) and duplicated across cases.
If this slows CI, consider reducing counts, merging overlaps, or gating “long” tests behind an env flag.

Also applies to: 1872-1939

tests/runtime/out_s3_config.c (1)

215-221: Avoid fixed sleeps for sequencing if possible.
S3_TEST_SLEEP_MS(2000) can be slow/flaky. If there’s a deterministic signal (e.g., waiting for first upload completion before pushing again), prefer that.

tests/runtime/out_s3_edge_cases.c (1)

15-122: flb_test_empty_data_upload: consider asserting “no upload happened” (or explicitly allow both).
Right now it’s effectively “no crash” + a sleep. If the plugin starts uploading empty payloads due to a regression, this test won’t catch it.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e7de562 and c1dd6a8.

📒 Files selected for processing (18)
  • tests/include/aws_client_mock.c
  • tests/include/aws_client_mock.h
  • tests/include/s3_test_helpers.h
  • tests/internal/CMakeLists.txt
  • tests/internal/aws_compress.c
  • tests/internal/aws_util.c
  • tests/internal/parquet.c
  • tests/runtime/CMakeLists.txt
  • tests/runtime/out_s3.c
  • tests/runtime/out_s3_blob.c
  • tests/runtime/out_s3_config.c
  • tests/runtime/out_s3_edge_cases.c
  • tests/runtime/out_s3_error_handling.c
  • tests/runtime/out_s3_format.c
  • tests/runtime/out_s3_multipart.c
  • tests/runtime/out_s3_queue.c
  • tests/runtime/out_s3_recovery.c
  • tests/runtime/out_s3_special_features.c
💤 Files with no reviewable changes (1)
  • tests/runtime/out_s3.c
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/internal/CMakeLists.txt
  • tests/include/aws_client_mock.h
🧰 Additional context used
🧠 Learnings (8)
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.

Applied to files:

  • tests/internal/parquet.c
  • tests/internal/aws_compress.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.

Applied to files:

  • tests/internal/parquet.c
  • tests/internal/aws_compress.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like `#ifdef` FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.

Applied to files:

  • tests/internal/parquet.c
  • tests/internal/aws_compress.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use `#ifdef` FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.

Applied to files:

  • tests/internal/parquet.c
  • tests/internal/aws_compress.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.

Applied to files:

  • tests/internal/parquet.c
  • tests/internal/aws_compress.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.

Applied to files:

  • tests/runtime/out_s3_error_handling.c
  • tests/internal/aws_compress.c
  • tests/include/s3_test_helpers.h
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.

Applied to files:

  • tests/internal/aws_compress.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.

Applied to files:

  • tests/internal/aws_compress.c
🧬 Code graph analysis (7)
tests/internal/parquet.c (3)
src/flb_parquet_impl.cpp (1)
  • field_name (548-548)
src/flb_pack.c (1)
  • flb_pack_json (530-535)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
tests/runtime/out_s3_blob.c (2)
tests/include/s3_test_helpers.h (5)
  • s3_test_next (449-457)
  • s3_test_create_temp_db_path (138-195)
  • s3_test_create_temp_store_dir (205-268)
  • s3_test_cleanup_temp_db (428-441)
  • s3_test_cleanup_temp_store_dir (395-420)
tests/include/aws_client_mock.c (3)
  • flb_aws_client_mock_configure_generator (42-47)
  • flb_aws_client_get_mock_generator (137-140)
  • flb_aws_client_mock_clear_generator_instance (71-74)
tests/runtime/out_s3_format.c (3)
tests/include/s3_test_helpers.h (5)
  • s3_test_next (449-457)
  • s3_test_create_temp_db_path (138-195)
  • s3_test_create_temp_store_dir (205-268)
  • s3_test_cleanup_temp_db (428-441)
  • s3_test_cleanup_temp_store_dir (395-420)
tests/include/aws_client_mock.c (3)
  • flb_aws_client_mock_configure_generator (42-47)
  • flb_aws_client_get_mock_generator (137-140)
  • flb_aws_client_mock_clear_generator_instance (71-74)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
tests/runtime/out_s3_edge_cases.c (3)
tests/include/s3_test_helpers.h (5)
  • s3_test_next (449-457)
  • s3_test_create_temp_db_path (138-195)
  • s3_test_create_temp_store_dir (205-268)
  • s3_test_cleanup_temp_db (428-441)
  • s3_test_cleanup_temp_store_dir (395-420)
tests/include/aws_client_mock.c (3)
  • flb_aws_client_mock_configure_generator (42-47)
  • flb_aws_client_get_mock_generator (137-140)
  • flb_aws_client_mock_clear_generator_instance (71-74)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
tests/internal/aws_compress.c (3)
src/flb_snappy.c (2)
  • flb_snappy_compress_framed_data (134-255)
  • flb_snappy_uncompress_framed_data (257-481)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
src/aws/flb_aws_compress.c (1)
  • flb_aws_compression_get_type (87-108)
tests/include/s3_test_helpers.h (2)
plugins/out_s3/s3.c (1)
  • setenv (66-69)
tests/include/aws_client_mock.c (1)
  • flb_aws_client_mock_generator_count_unused_requests (143-147)
tests/internal/aws_util.c (3)
src/aws/flb_aws_util.c (1)
  • flb_get_s3_key (1311-1389)
src/flb_sds.c (1)
  • flb_sds_destroy (389-399)
tests/internal/fuzzers/aws_util_fuzzer.c (1)
  • initialization_crutch (27-36)
🔇 Additional comments (28)
tests/internal/aws_compress.c (6)

77-124: LGTM!

The comment block correctly references Google's Snappy framing format specification. Test case 1 properly validates the round-trip compression for a simple string with correct memory management - variables are reset to NULL after freeing, and both success and failure paths handle cleanup appropriately.


126-165: LGTM!

Test case 2 properly exercises multi-block compression with a 100KB buffer that spans multiple 64KB Snappy blocks. Memory management is correct across all code paths, and the test validates both length and content integrity of the round-trip.


241-245: LGTM!

Test case 4 correctly validates that the API returns -1 for NULL/empty input, matching the documented behavior of flb_snappy_compress_framed_data.


413-413: LGTM!

Test registration follows the existing pattern and correctly adds the new Snappy test to the test suite.


482-483: LGTM!

Minor formatting cleanup - removes extraneous whitespace after the function call.


8-8: No changes needed — Snappy is unconditionally available in Fluent Bit.

Snappy is bundled in lib/snappy-fef67ac/ and always compiled as part of the core build, similar to ZSTD. Unlike optional components like Arrow/Parquet that use #ifdef FLB_HAVE_ARROW guards, Snappy has no conditional compilation flag and is always available. The include #include <fluent-bit/flb_snappy.h> at line 8 is correct and does not require guards.

Likely an incorrect or invalid review comment.

tests/include/aws_client_mock.c (3)

56-74: LGTM! Safe lifecycle management for mock instances.

The separation of clearing and destroying prevents double-free scenarios when the S3 plugin has already freed the mock client. The flb_aws_client_mock_clear_generator_instance() function enables clean test reuse.


266-299: LGTM! Proper dynamic allocation for DATA field.

The DATA field handling correctly allocates memory that flb_http_client_destroy() can safely free, avoiding use-after-free issues with static test data.


306-313: LGTM! Correctly excludes DATA from generic expansion.

The macro condition prevents double-handling of the DATA field which is specially processed above.

tests/include/s3_test_helpers.h (4)

138-195: LGTM! Robust cross-platform temp file creation.

The implementation correctly uses platform-specific APIs (GetTempFileNameA / mkstemp) for atomic unique file creation, with proper cleanup on allocation failure.


395-420: LGTM! Safe directory cleanup with proper validation.

The implementation correctly validates paths against known temp directories before deletion, uses platform-specific APIs instead of system(), and properly handles recursive directory removal.


475-507: LGTM! Deterministic mock chain verification.

The polling approach replaces arbitrary sleep() calls with condition-based waiting, ensuring tests fail deterministically if retry logic regresses rather than passing based on timing luck.


88-88: Static counter in header is appropriate for the build model.

g_s3_test_counter is declared static, giving each translation unit its own copy. Since each S3 test is compiled as a separate executable (confirmed: add_executable is called once per test source file in the foreach loop, with each test linked independently), this per-executable isolation is correct and prevents any counter collisions between tests.

tests/runtime/out_s3_error_handling.c (2)

1-13: LGTM! Clean test structure with proper includes.

The test file uses the cross-platform helpers from s3_test_helpers.h and avoids direct POSIX includes, addressing previous portability concerns.


44-106: LGTM! Robust test pattern with proper cleanup.

The test follows best practices with proper NULL checks (lines 71-75), early return on failure, and comprehensive cleanup including flb_aws_client_mock_clear_generator_instance().

tests/runtime/out_s3_format.c (3)

9-10: Direct .c include is intentional for test architecture.

The #include "../include/aws_client_mock.c" pattern is documented in the mock header: each test is built as a standalone executable, so this avoids the need for a separate test library while preventing symbol conflicts.


84-86: LGTM! Return value properly checked.

All flb_lib_push() calls now capture and validate the return value, addressing the previous review concern.


851-910: LGTM! Good negative test for configuration validation.

The test properly verifies that invalid compression types cause startup failure, with correct handling of the unexpected success path.

tests/runtime/out_s3_multipart.c (4)

69-126: LGTM! Metadata test now covers both ACL and storage class.

The test properly expects both x-amz-acl and x-amz-storage-class headers on the CreateMultipartUpload request, and configures both canned_acl and storage_class options, addressing the previous review concern.


336-388: LGTM! Proper abort flow testing.

The test correctly expects the DELETE request (AbortMultipartUpload) after CompleteMultipartUpload fails, validating the cleanup behavior.


487-491: LGTM! Push return values properly checked in loop.

Each flb_lib_push() call in the loop now validates the return value, addressing the previous review concern about silent failures.


503-515: LGTM! Complete test registration.

All test functions are properly registered in the TEST_LIST with correct null terminator.

tests/internal/aws_util.c (2)

47-67: Confirm whether $FILE_PATH should normalize internal ./ segments (current tests expect them preserved).
If the intent is “safe + normalized path,” consider collapsing ./ and ../ segments (after initial stripping) rather than emitting them into S3 keys. If the intent is “literal path minus dangerous prefix,” current tests are consistent.

Also applies to: 450-533


199-233: Nice coverage expansion for $FILE_PATH / $FILE_NAME and edge prefixes.
These cases line up with the current implementation’s prefix-stripping behavior and should help prevent regressions.

Also applies to: 237-267, 269-305

tests/internal/parquet.c (2)

25-35: Build gating for Parquet/Arrow dependencies is already properly configured.

The parquet test is correctly gated by FLB_PARQUET_ENCODER in the CMakeLists.txt: the test file is only added to compilation when the flag is enabled (lines 138-143), and linking with ARROW_LIBRARIES and PARQUET_LIBRARIES is also conditional on the same flag (lines 240-241). The test will not be built if the encoder is disabled, preventing any linker failures.


69-136: CRC fields are safe to hardcode to 0 in this test fixture.

The test writes a raw file that is read via flb_chunk_file_open, which validates only the magic bytes and metadata length—it does not validate CRC. Additionally, ChunkIO's CRC validation is optional (controlled by the CIO_CHECKSUM flag) and is only applied during write operations through the ChunkIO API, not when reading files with standard file I/O. Since the test fixture is not created through ChunkIO's normal initialization, CRC validation will not be triggered. This approach is consistent with how test fixtures safely bypass validation checks.

Likely an incorrect or invalid review comment.

tests/runtime/CMakeLists.txt (2)

238-246: LGTM! Well-organized modular test structure.

Splitting the monolithic S3 test file into nine focused test files improves test maintainability and makes it easier to identify failures in specific S3 subsystems (config, multipart, format, error handling, etc.).


300-306: LGTM! Correct Windows library linkage for UUID functions.

The ole32 and rpcrt4 libraries are appropriate for the Windows UUID APIs mentioned in the comment. These functions are actually used in s3_test_helpers.h in the s3_test_create_temp_store_dir() function for generating unique temporary directories on Windows. The MATCHES "out_s3" pattern correctly captures all nine S3 test files.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +75 to +82
flb_stop(ctx);
flb_destroy(ctx);
flb_aws_client_mock_clear_generator_instance();
s3_test_cleanup_temp_db(db_path);
s3_test_cleanup_temp_store_dir(store_dir);
flb_free(db_path);
flb_free(store_dir);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix mock teardown: flb_aws_client_mock_clear_generator_instance() likely leaks request chains; use flb_aws_client_mock_destroy_generator() instead.
clear_generator_instance() only NULLs the global pointer; it doesn’t free the mock wrapper / request chain. Given how often these tests run, this can accumulate and also undermines leak-checking (Valgrind).

Proposed patch
-    flb_aws_client_mock_clear_generator_instance();
+    flb_aws_client_mock_destroy_generator();

Also applies to: 156-164, 231-239, 298-306, 365-373, 435-443, 501-509, 571-579

🤖 Prompt for AI Agents
In `@tests/runtime/out_s3_blob.c` around lines 75 - 82, Replace calls to
flb_aws_client_mock_clear_generator_instance() with
flb_aws_client_mock_destroy_generator() to properly free the mock wrapper and
request chain; update every teardown block (e.g., the instance at the shown
lines and the other occurrences around the test cleanup blocks) so that after
flb_stop(ctx) and flb_destroy(ctx) you call
flb_aws_client_mock_destroy_generator() instead of
flb_aws_client_mock_clear_generator_instance(), preserving the surrounding
cleanup calls (s3_test_cleanup_temp_db, s3_test_cleanup_temp_store_dir,
flb_free) in the same order.

Comment on lines +687 to +835
/* Test: storage_class variations */
void flb_test_storage_class_variations(void)
{
flb_ctx_t *ctx;
int in_ffd, out_ffd, ret;
char *db_path, *store_dir;
struct flb_out_s3_init_options init_options = {0};
struct flb_aws_client_mock_request_chain *chain;
const char *classes[] = {
"STANDARD",
"STANDARD_IA",
"GLACIER",
"INTELLIGENT_TIERING"
};
int i;

for (i = 0; i < 4; i++) {
s3_test_next();

chain = FLB_AWS_CLIENT_MOCK(
response(expect(METHOD, FLB_HTTP_POST),
set(STATUS, 200), set(PAYLOAD, S3_TEST_MOCK_CREATE_RESP),
set(PAYLOAD_SIZE, sizeof(S3_TEST_MOCK_CREATE_RESP) - 1)),
response(expect(METHOD, FLB_HTTP_PUT),
set(STATUS, 200), set(DATA, S3_TEST_MOCK_UPLOAD_PART_RESP),
set(DATA_SIZE, sizeof(S3_TEST_MOCK_UPLOAD_PART_RESP) - 1)),
response(expect(METHOD, FLB_HTTP_POST),
set(STATUS, 200), set(PAYLOAD, S3_TEST_MOCK_COMPLETE_RESP),
set(PAYLOAD_SIZE, sizeof(S3_TEST_MOCK_COMPLETE_RESP) - 1))
);

flb_aws_client_mock_configure_generator(chain);
init_options.client_generator = flb_aws_client_get_mock_generator();

db_path = s3_test_create_temp_db_path("cfg");
store_dir = s3_test_create_temp_store_dir("cfg");
TEST_CHECK(db_path != NULL);
TEST_CHECK(store_dir != NULL);

ctx = flb_create();
TEST_CHECK(ctx != NULL);
flb_service_set(ctx, "flush", "1", "grace", "1", NULL);

in_ffd = flb_input(ctx, (char *)"lib", NULL);
TEST_CHECK(in_ffd >= 0);
flb_input_set(ctx, in_ffd, "tag", "test", NULL);

out_ffd = flb_output(ctx, (char *)"s3", (struct flb_lib_out_cb *)&init_options);
TEST_CHECK(out_ffd >= 0);
flb_output_set(ctx, out_ffd, "match", "*",
"region", S3_TEST_DEFAULT_REGION,
"bucket", S3_TEST_DEFAULT_BUCKET,
"blob_database_file", db_path,
"store_dir", store_dir,
"storage_class", classes[i],
"total_file_size", "1M",
"upload_timeout", "1s", NULL);

ret = flb_start(ctx);
TEST_CHECK(ret == 0);

ret = flb_lib_push(ctx, in_ffd, (char *)JSON_TD, sizeof(JSON_TD) - 1);
TEST_CHECK(ret >= 0);

S3_TEST_WAIT_MOCK_EXHAUSTED(5, "storage_class_variations");
flb_stop(ctx);
flb_destroy(ctx);
flb_aws_client_mock_clear_generator_instance();
s3_test_cleanup_temp_db(db_path);
s3_test_cleanup_temp_store_dir(store_dir);
flb_free(db_path);
flb_free(store_dir);
}
}

/* Test: canned_acl options */
void flb_test_canned_acl_options(void)
{
flb_ctx_t *ctx;
int in_ffd, out_ffd, ret;
char *db_path, *store_dir;
struct flb_out_s3_init_options init_options = {0};
struct flb_aws_client_mock_request_chain *chain;
const char *acls[] = {
"private",
"public-read",
"bucket-owner-full-control"
};
int i;

for (i = 0; i < 3; i++) {
s3_test_next();

chain = FLB_AWS_CLIENT_MOCK(
response(expect(METHOD, FLB_HTTP_POST),
expect(HEADER, "x-amz-acl", acls[i]),
set(STATUS, 200), set(PAYLOAD, S3_TEST_MOCK_CREATE_RESP),
set(PAYLOAD_SIZE, sizeof(S3_TEST_MOCK_CREATE_RESP) - 1)),
response(expect(METHOD, FLB_HTTP_PUT),
set(STATUS, 200), set(DATA, S3_TEST_MOCK_UPLOAD_PART_RESP),
set(DATA_SIZE, sizeof(S3_TEST_MOCK_UPLOAD_PART_RESP) - 1)),
response(expect(METHOD, FLB_HTTP_POST),
set(STATUS, 200), set(PAYLOAD, S3_TEST_MOCK_COMPLETE_RESP),
set(PAYLOAD_SIZE, sizeof(S3_TEST_MOCK_COMPLETE_RESP) - 1))
);

flb_aws_client_mock_configure_generator(chain);
init_options.client_generator = flb_aws_client_get_mock_generator();

db_path = s3_test_create_temp_db_path("cfg");
store_dir = s3_test_create_temp_store_dir("cfg");
TEST_CHECK(db_path != NULL);
TEST_CHECK(store_dir != NULL);

ctx = flb_create();
TEST_CHECK(ctx != NULL);
flb_service_set(ctx, "flush", "1", "grace", "1", NULL);

in_ffd = flb_input(ctx, (char *)"lib", NULL);
TEST_CHECK(in_ffd >= 0);
flb_input_set(ctx, in_ffd, "tag", "test", NULL);

out_ffd = flb_output(ctx, (char *)"s3", (struct flb_lib_out_cb *)&init_options);
TEST_CHECK(out_ffd >= 0);
flb_output_set(ctx, out_ffd, "match", "*",
"region", S3_TEST_DEFAULT_REGION,
"bucket", S3_TEST_DEFAULT_BUCKET,
"blob_database_file", db_path,
"store_dir", store_dir,
"canned_acl", acls[i],
"total_file_size", "1M",
"upload_timeout", "1s", NULL);

ret = flb_start(ctx);
TEST_CHECK(ret == 0);

ret = flb_lib_push(ctx, in_ffd, (char *)JSON_TD, sizeof(JSON_TD) - 1);
TEST_CHECK(ret >= 0);

S3_TEST_WAIT_MOCK_EXHAUSTED(5, "canned_acl_options");
flb_stop(ctx);
flb_destroy(ctx);
flb_aws_client_mock_clear_generator_instance();
s3_test_cleanup_temp_db(db_path);
s3_test_cleanup_temp_store_dir(store_dir);
flb_free(db_path);
flb_free(store_dir);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Strengthen config tests with header expectations (low-flake, validates behavior).
Right now these mostly validate “plugin starts + uploads,” not that the option affected requests. Suggest:

  • storage_class: expect x-amz-storage-class: <class> on CreateMultipartUpload
  • content_type: expect Content-Type: <type> (where set)
  • send_content_md5: when true, expect Content-MD5 header is present (and absent when false), if that’s how the plugin implements it
Example (storage_class_variations)
-        chain = FLB_AWS_CLIENT_MOCK(
-            response(expect(METHOD, FLB_HTTP_POST),
+        chain = FLB_AWS_CLIENT_MOCK(
+            response(expect(METHOD, FLB_HTTP_POST),
+                     expect(HEADER, "x-amz-storage-class", classes[i]),
                      set(STATUS, 200), set(PAYLOAD, S3_TEST_MOCK_CREATE_RESP),
                      set(PAYLOAD_SIZE, sizeof(S3_TEST_MOCK_CREATE_RESP) - 1)),
             response(expect(METHOD, FLB_HTTP_PUT),
                      set(STATUS, 200), set(DATA, S3_TEST_MOCK_UPLOAD_PART_RESP),
                      set(DATA_SIZE, sizeof(S3_TEST_MOCK_UPLOAD_PART_RESP) - 1)),
             response(expect(METHOD, FLB_HTTP_POST),
                      set(STATUS, 200), set(PAYLOAD, S3_TEST_MOCK_COMPLETE_RESP),
                      set(PAYLOAD_SIZE, sizeof(S3_TEST_MOCK_COMPLETE_RESP) - 1))
         );

Also applies to: 841-983

Comment on lines 111 to 122
flb_stop(ctx);
flb_destroy(ctx);
flb_aws_client_mock_clear_generator_instance();

/* Verify no crash occurred - reaching here is the test */
TEST_CHECK(ret == 0);

s3_test_cleanup_temp_db(db_path);
s3_test_cleanup_temp_store_dir(store_dir);
flb_free(db_path);
flb_free(store_dir);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix mock teardown: flb_aws_client_mock_clear_generator_instance() should be flb_aws_client_mock_destroy_generator() (leak prevention).
This is especially important in flb_test_empty_data_upload() where the mock chain may never be consumed.

Proposed patch
-    flb_aws_client_mock_clear_generator_instance();
+    flb_aws_client_mock_destroy_generator();

Also applies to: 200-208, 282-290, 363-371, 447-455, 530-538, 617-625, 698-706, 782-790, 866-874, 949-957, 1028-1036, 1092-1103

🤖 Prompt for AI Agents
In `@tests/runtime/out_s3_edge_cases.c` around lines 111 - 122, Replace calls to
flb_aws_client_mock_clear_generator_instance() with
flb_aws_client_mock_destroy_generator() to properly tear down the AWS client
mock and avoid leaks; update every test teardown that currently calls
flb_aws_client_mock_clear_generator_instance() (e.g., in
flb_test_empty_data_upload and the other teardown blocks around the occurrences
listed) so they call flb_aws_client_mock_destroy_generator() instead, ensuring
the mock generator is destroyed during flb_stop()/flb_destroy() cleanup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants