-
Notifications
You must be signed in to change notification settings - Fork 1.9k
s3_output: add parquet format support #11312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughAdds a Parquet encoder and Arrow/Parquet discovery; refactors Out S3 into modular subsystems (multipart, auth/presigned, blob, queue, stream, store); extends blob DB and S3 key templating; adds Snappy framed compression and a C++ Parquet streaming implementation; and introduces extensive tests and CI/CMake updates. Changes
Sequence Diagram(s)sequenceDiagram
participant Plugin as Out S3 plugin
participant DB as Blob DB
participant Store as Local Store (fstore)
participant Auth as Auth service
participant S3 as AWS S3
Plugin->>DB: register file metadata & parts
Plugin->>Store: write chunk / stream file
Plugin->>Auth: request presigned URL (CREATE)
Auth-->>Plugin: presigned create URL
Plugin->>S3: initiate multipart (presigned)
S3-->>Plugin: upload_id
loop per part
Plugin->>Auth: request presigned URL (UPLOAD_PART)
Auth-->>Plugin: presigned part URL
Plugin->>S3: upload part (PUT)
S3-->>Plugin: ETag
Plugin->>DB: mark part uploaded / store ETag
end
Plugin->>Auth: request presigned URL (COMPLETE)
Auth-->>Plugin: presigned complete URL
Plugin->>S3: complete multipart
S3-->>Plugin: success/failure
Plugin->>DB: notify delivery or trigger recovery
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
b4c0a26 to
a821fec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Nitpick comments (10)
src/CMakeLists.txt (1)
525-528: Clarify SASL library linking scope.The comments indicate "KAFKA SPECIFIC" for SASL library linking, but the linking appears in both the shared library and binary targets. This suggests it's a general requirement when
FLB_SASL_ENABLEDis true, not specific to this Parquet PR.Consider updating the comment to be more accurate:
🔎 Suggested comment improvement
- # KAFKA SPECIFIC - Link SASL library for shared library + # Link SASL library when SASL support is enabled (used by Kafka and other plugins) if(FLB_SASL_ENABLED) target_link_libraries(fluent-bit-shared sasl2) endif()- # KAFKA SPECIFIC - Link SASL library for binary + # Link SASL library when SASL support is enabled (used by Kafka and other plugins) if(FLB_SASL_ENABLED) target_link_libraries(fluent-bit-bin sasl2) endif()Also applies to: 600-603
examples/s3_output/run-s3-parquet-test.sh (1)
38-39: Add error checking for fluent-bit binary and configuration.The script assumes the Fluent Bit binary and configuration file exist without verification, which could lead to confusing errors.
🔎 Proposed pre-flight checks
+# Verify fluent-bit binary exists +if [ ! -f ./build/bin/fluent-bit ]; then + echo "Error: fluent-bit binary not found at ./build/bin/fluent-bit" + echo "Please build Fluent Bit first:" + echo " cd ../../build && cmake .. && make" + exit 1 +fi + +# Verify configuration file exists +if [ ! -f fluent-bit-s3-parquet.conf ]; then + echo "Error: Configuration file fluent-bit-s3-parquet.conf not found" + exit 1 +fi + +# Check AWS credentials +if [ -z "$AWS_ACCESS_KEY_ID" ] || [ -z "$AWS_SECRET_ACCESS_KEY" ]; then + echo "Warning: AWS credentials not set. Fluent Bit may fail to upload to S3." + echo "Set credentials with: export AWS_ACCESS_KEY_ID=... AWS_SECRET_ACCESS_KEY=..." + echo "" +fi + # Run Fluent Bit ./build/bin/fluent-bit -c fluent-bit-s3-parquet.confexamples/s3_output/README.md (1)
112-114: Consider adding language identifiers to fenced code blocks.Static analysis detected some fenced code blocks without language specifiers, which improves syntax highlighting and readability.
🔎 Minor markdown improvements
-``` +```text Upload Trigger = upload_timeout reached OR total_file_size reached```diff -``` +```text s3://s3-bucket-kafka-sink/fluent-bit-logs/ ├── putobject/app/day=YYYYMMDD/ # Case 1: Small files via PutObject └── multipart/access/day=YYYYMMDD/ # Case 2: Large files via Multipart Upload```diff -``` +```text [parquet] Data quality summary for 53206 records: [parquet] Missing non-nullable fields (defaults used): [parquet] field='level' count=52506 [parquet] field='timestamp' count=52506</details> Also applies to: 177-179, 270-272 </blockquote></details> <details> <summary>examples/s3_output/athena-queries.sql (2)</summary><blockquote> `38-45`: **Use placeholder bucket names in example documentation.** The hardcoded bucket name `s3-bucket-kafka-sink` appears throughout the file. Consider using a placeholder like `<your-bucket-name>` or `my-fluent-bit-bucket` to make it clearer that users should substitute their own bucket name. --- `99-99`: **Document the placeholder date format.** The hardcoded date `20251224` is used in all example queries. Consider adding a note at the top of the query section (around line 83) explaining that users should replace this with their actual date, similar to the note at line 346. </blockquote></details> <details> <summary>cmake/parquet.cmake (2)</summary><blockquote> `151-163`: **Consider allowing default paths as fallback.** Using `NO_DEFAULT_PATH` exclusively may miss Arrow/Parquet installations in standard system locations not explicitly listed. Consider removing `NO_DEFAULT_PATH` or adding a second search without it as a fallback to improve detection success on systems with non-standard installations. <details> <summary>🔎 Suggested approach</summary> ```diff find_path(ARROW_INCLUDE_DIR NAMES arrow/api.h PATHS ${SEARCH_PATHS} PATH_SUFFIXES ${INCLUDE_SUFFIXES} include/arrow - NO_DEFAULT_PATH )Or add a fallback search without
NO_DEFAULT_PATHif the first search fails.
223-226: Unicode characters in CMake messages may cause issues.The checkmark (✓) and cross (✗) Unicode characters in status messages may not render correctly on all terminals or CI systems. Consider using ASCII alternatives like
[OK]and[MISSING].🔎 Proposed fix
if(ARROW_FOUND AND PARQUET_FOUND) - message(STATUS "✓ Arrow found: ${ARROW_LIBRARIES}") + message(STATUS "[OK] Arrow found: ${ARROW_LIBRARIES}") message(STATUS " Include dirs: ${ARROW_INCLUDE_DIRS}") - message(STATUS "✓ Parquet found: ${PARQUET_LIBRARIES}") + message(STATUS "[OK] Parquet found: ${PARQUET_LIBRARIES}") message(STATUS " Include dirs: ${PARQUET_INCLUDE_DIRS}") else() - message(STATUS "✗ Arrow/Parquet not found") + message(STATUS "[MISSING] Arrow/Parquet not found")examples/s3_output/fluent-bit-s3-parquet.conf (1)
145-145: Very long schema_str line reduces readability.The
schema_strvalue is a single long JSON line that's difficult to read and maintain. Consider adding a comment above it with a formatted version of the schema for documentation purposes, or noting in the README that multi-line values would need escaping.src/flb_parquet_impl.cpp (1)
250-251: Appending nullptr to BinaryBuilder may cause undefined behavior.The
append_default_valuefor BINARY type passesnullptrwith size 0. While this may work, it's safer to use an empty byte array explicitly.🔎 Proposed fix
case arrow::Type::BINARY: - return static_cast<arrow::BinaryBuilder*>(builder)->Append(static_cast<const uint8_t*>(nullptr), 0).ok(); + return static_cast<arrow::BinaryBuilder*>(builder)->Append(reinterpret_cast<const uint8_t*>(""), 0).ok();plugins/out_s3/s3.c (1)
49-56: Duplicate forward declarations.These forward declarations at lines 49-56 duplicate those at lines 86-91. Remove the duplicates to avoid confusion.
🔎 Proposed fix
-/* Forward declarations */ -static struct multipart_upload *get_upload(struct flb_s3 *ctx, - const char *tag, int tag_len); -static struct multipart_upload *create_upload(struct flb_s3 *ctx, - const char *tag, int tag_len, - time_t file_first_log_time); -static flb_sds_t flb_pack_msgpack_extract_log_key(void *out_context, const char *data, - uint64_t bytes, struct flb_config *config); -
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
.github/workflows/unit-tests.yamlCMakeLists.txtcmake/parquet.cmakeexamples/s3_output/README.mdexamples/s3_output/athena-queries.sqlexamples/s3_output/fluent-bit-s3-parquet.confexamples/s3_output/run-s3-parquet-test.shinclude/fluent-bit/aws/flb_aws_compress.hinclude/fluent-bit/flb_parquet.hplugins/out_s3/s3.cplugins/out_s3/s3.hsrc/CMakeLists.txtsrc/aws/CMakeLists.txtsrc/aws/compression/CMakeLists.txtsrc/aws/compression/arrow/CMakeLists.txtsrc/aws/compression/arrow/compress.csrc/aws/compression/arrow/compress.hsrc/aws/flb_aws_compress.csrc/flb_parquet_impl.cpp
💤 Files with no reviewable changes (5)
- src/aws/compression/CMakeLists.txt
- src/aws/compression/arrow/compress.c
- src/aws/compression/arrow/compress.h
- src/aws/CMakeLists.txt
- src/aws/compression/arrow/CMakeLists.txt
🧰 Additional context used
🧠 Learnings (16)
📓 Common learnings
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.
Applied to files:
examples/s3_output/README.mdinclude/fluent-bit/flb_parquet.h
📚 Learning: 2025-08-07T10:15:46.187Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10691
File: .github/workflows/unit-tests.yaml:94-101
Timestamp: 2025-08-07T10:15:46.187Z
Learning: Apache Arrow latest packages have changing checksums, making hardcoded checksum verification impractical for CI workflows that download the latest version.
Applied to files:
.github/workflows/unit-tests.yamlcmake/parquet.cmake
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
.github/workflows/unit-tests.yamlsrc/CMakeLists.txtcmake/parquet.cmakeCMakeLists.txt
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.
Applied to files:
.github/workflows/unit-tests.yamlsrc/CMakeLists.txtcmake/parquet.cmakeCMakeLists.txt
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
.github/workflows/unit-tests.yamlsrc/CMakeLists.txtsrc/aws/flb_aws_compress.cCMakeLists.txtplugins/out_s3/s3.cinclude/fluent-bit/aws/flb_aws_compress.h
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Applied to files:
.github/workflows/unit-tests.yamlsrc/CMakeLists.txtsrc/aws/flb_aws_compress.cCMakeLists.txtplugins/out_s3/s3.cinclude/fluent-bit/aws/flb_aws_compress.h
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.
Applied to files:
.github/workflows/unit-tests.yamlsrc/CMakeLists.txtsrc/aws/flb_aws_compress.cCMakeLists.txtplugins/out_s3/s3.cinclude/fluent-bit/aws/flb_aws_compress.h
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.
Applied to files:
.github/workflows/unit-tests.yamlsrc/CMakeLists.txtsrc/aws/flb_aws_compress.cCMakeLists.txtplugins/out_s3/s3.cinclude/fluent-bit/aws/flb_aws_compress.h
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.
Applied to files:
.github/workflows/unit-tests.yamlsrc/CMakeLists.txtsrc/aws/flb_aws_compress.cCMakeLists.txtinclude/fluent-bit/aws/flb_aws_compress.h
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.
Applied to files:
.github/workflows/unit-tests.yamlsrc/CMakeLists.txtCMakeLists.txt
📚 Learning: 2025-12-22T05:39:02.291Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11250
File: lib/ripser-1.2.1/Makefile:1-18
Timestamp: 2025-12-22T05:39:02.291Z
Learning: In fluent/fluent-bit, the Makefile at lib/ripser-1.2.1/Makefile is imported from upstream Ripser and is not used in the actual build process. The project uses CMake for building (lib/ripser-1.2.1/CMakeLists.txt), so changes to the imported Makefile are not necessary.
Applied to files:
src/CMakeLists.txt
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.
Applied to files:
src/CMakeLists.txtsrc/aws/flb_aws_compress.cCMakeLists.txtplugins/out_s3/s3.cinclude/fluent-bit/aws/flb_aws_compress.h
📚 Learning: 2025-09-14T09:46:09.531Z
Learnt from: aminvakil
Repo: fluent/fluent-bit PR: 10844
File: conf/fluent-bit:13-15
Timestamp: 2025-09-14T09:46:09.531Z
Learning: For fluent-bit Debian packaging, /opt/fluent-bit/bin/ is the appropriate installation path since the package may be installed from non-official Debian sources, making /opt compliant with FHS for optional software packages.
Applied to files:
src/CMakeLists.txt
📚 Learning: 2025-09-04T07:28:37.083Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10832
File: src/wasm/CMakeLists.txt:112-131
Timestamp: 2025-09-04T07:28:37.083Z
Learning: In fluent-bit CMake files, the user cosmo0920 prefers treating Git as a command rather than a package, emphasizing that Git is not a pkg-config retrievable package but just a command.
Applied to files:
src/CMakeLists.txt
📚 Learning: 2025-09-04T12:35:22.872Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:1339-1344
Timestamp: 2025-09-04T12:35:22.872Z
Learning: In the Fluent Bit S3 plugin, the user prefers to maintain current retry_limit behavior without special handling for FLB_OUT_RETRY_UNLIMITED (-1), as there's no documentation indicating -1 should be used for infinite retries and consistency with current logic is preferred.
Applied to files:
plugins/out_s3/s3.c
🧬 Code graph analysis (3)
src/flb_parquet_impl.cpp (1)
include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
src/aws/flb_aws_compress.c (2)
src/flb_gzip.c (1)
flb_gzip_compress(157-252)src/flb_zstd.c (1)
flb_zstd_compress(33-57)
plugins/out_s3/s3.h (1)
plugins/out_s3/s3.c (1)
s3_put_object(1663-1800)
🪛 Cppcheck (2.19.0)
plugins/out_s3/s3.c
[error] 791-791: failed to expand 'flb_plg_error', it is invalid to use a preprocessor directive as macro parameter
(syntaxError)
🪛 LanguageTool
examples/s3_output/README.md
[grammar] ~56-~56: Ensure spelling is correct
Context: ..., 201, 404, 500, 401) - Response times (2ms to 5000ms) - Client information (IP, us...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~465-~465: Ensure spelling is correct
Context: ...les uploaded when reaching 50MB (before 60min timeout) Verification Steps: 1. R...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🪛 markdownlint-cli2 (0.18.1)
examples/s3_output/README.md
112-112: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
177-177: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
270-270: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (14)
include/fluent-bit/flb_parquet.h (1)
82-85: LGTM! Well-documented API with clear ownership semantics.The function signature and documentation are comprehensive and clear. The API design properly communicates:
- Memory ownership (caller must free with flb_free)
- Error handling (NULL on failure)
- Schema format expectations (Arrow JSON schema)
- Supported types and examples
include/fluent-bit/aws/flb_aws_compress.h (2)
33-46: Clear deprecation notice and migration guidance.The documentation clearly separates true compression algorithms from format converters and provides excellent guidance:
- ARROW marked as deprecated with rationale
- PARQUET (5) directs users to use
format=parquetinstead- Clear distinction between compression and format conversion
This will help users migrate away from the deprecated patterns.
30-31: No backward compatibility issue—this renumbering is safe.The compression constants are internal implementation details used only at runtime. Configuration files use string values (
compression=gzip,compression=snappy,compression=zstd) which are mapped to numeric constants viaflb_aws_compression_get_type(). Since the numeric constant values are never persisted to storage and users never interact with them directly, renumbering fromFLB_AWS_COMPRESS_ZSTD=4toFLB_AWS_COMPRESS_ZSTD=3does not break backward compatibility.Likely an incorrect or invalid review comment.
CMakeLists.txt (1)
284-286: LGTM! Consistent encoder enablement pattern.When
FLB_ALLis enabled, both Avro and Parquet encoders are automatically enabled, following the existing pattern for optional features. This provides a consistent user experience.src/CMakeLists.txt (1)
157-166: LGTM! Proper C++ integration for Parquet implementation.The Parquet encoder support is correctly integrated:
- Source file added conditionally based on
FLB_PARQUET_ENCODER- Explicit CXX language property set for the C++ implementation file
- Follows the existing pattern for optional features like Avro encoder
src/aws/flb_aws_compress.c (2)
31-37: LGTM! Correct adapter pattern for Snappy compression.The wrapper function properly adapts
flb_snappy_compress(which useschar*pointers) to the standard compression interface (which usesvoid*pointers). This follows the same pattern as the existingflb_gzip_compressandflb_zstd_compressfunctions shown in the relevant code snippets.
46-76: Excellent documentation and clean separation of concerns.The updated comments clearly distinguish:
- True compression algorithms (gzip, snappy, zstd)
- Deprecated format converters (ARROW, PARQUET)
- Guidance to use
format=parquetinstead ofcompression=parquetThe removal of Arrow/Parquet compression options from this module aligns with the PR's goal of decoupling format conversion from the compression subsystem.
examples/s3_output/README.md (1)
1-518: Exceptional documentation! Comprehensive and production-ready guidance.This README provides outstanding documentation that covers:
- Clear test scenarios with realistic use cases
- Detailed explanation of upload triggers (
upload_timeoutvstotal_file_size)- Schema design best practices with nullable/non-nullable guidance
- Data quality monitoring and troubleshooting
- Performance testing methodology
- Athena integration examples
The documentation will significantly help users understand Parquet encoder usage in production scenarios.
plugins/out_s3/s3.h (1)
234-235: LGTM - API signature updated for SDS-based buffers.The signature change from
char* bodytoflb_sds_t bodyaligns with the SDS-based buffering approach used throughout the refactored code, ensuring consistent memory management.examples/s3_output/fluent-bit-s3-parquet.conf (1)
104-117: LGTM - Well-documented PutObject configuration.The configuration clearly documents the upload strategy, timeout behavior, and use case. The comments explaining the difference between
upload_timeoutandtotal_file_sizeare helpful for users.src/flb_parquet_impl.cpp (1)
696-710: LGTM - Exception handling with proper cleanup.The exception handlers correctly destroy the msgpack unpacked result and free the output buffer if allocated, preventing memory leaks on error paths.
plugins/out_s3/s3.c (3)
752-774: LGTM - Deprecated compression values handled gracefully.The code properly warns users about deprecated
arrow/parquetcompression values while still supporting them by mapping to the newformat=parquetapproach. The build-time guard ensures helpful error messages when Parquet isn't available.
1583-1630: LGTM - Parquet conversion with proper error handling.The Parquet conversion path validates
schema_stris present, handles conversion failures gracefully, and properly manages memory by converting the Parquet buffer toflb_sds_tfor consistent lifecycle management.
3951-3962: Storing raw msgpack improves efficiency for Parquet.Storing raw msgpack data and deferring format conversion to upload time is the right approach for columnar formats like Parquet, which benefit from batching records together for better compression and schema inference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
plugins/out_s3/s3.c (1)
3701-3725: Leak ofchunkSDS inunit_test_flush()path.In
unit_test_flush()you now treatchunkasflb_sds_t, but never destroy it:s3_store_buffer_put(..., chunk, ...); ret = construct_request_buffer(ctx, chunk, upload_file, &buffer, &buffer_size); ... ret = upload_data(..., buffer, ...); flb_sds_destroy(buffer); FLB_OUTPUT_RETURN(ret);
construct_request_buffer()only reads fromchunkand does not take ownership, sounit_test_flush()shouldflb_sds_destroy(chunk)once that call returns. As written, every test‑mode flush leaks the SDS buffer.Suggested minimal fix
static void unit_test_flush(void *out_context, struct s3_file *upload_file, const char *tag, int tag_len, flb_sds_t chunk, int chunk_size, struct multipart_upload *m_upload_file, time_t file_first_log_time) { int ret; flb_sds_t buffer = NULL; size_t buffer_size; struct flb_s3 *ctx = out_context; s3_store_buffer_put(ctx, upload_file, tag, tag_len, chunk, (size_t) chunk_size, file_first_log_time); - ret = construct_request_buffer(ctx, chunk, upload_file, &buffer, &buffer_size); + ret = construct_request_buffer(ctx, chunk, upload_file, &buffer, &buffer_size); + flb_sds_destroy(chunk); if (ret < 0) { flb_plg_error(ctx->ins, "Could not construct request buffer for %s", upload_file->file_path); FLB_OUTPUT_RETURN(FLB_RETRY); } ret = upload_data(ctx, upload_file, m_upload_file, buffer, buffer_size, tag, tag_len); flb_sds_destroy(buffer); FLB_OUTPUT_RETURN(ret); }
♻️ Duplicate comments (7)
examples/s3_output/run-s3-parquet-test.sh (1)
6-16: Logic issue: directories created then immediately cleaned.The script creates directories (lines 8-11) and then immediately removes their contents (line 15). This sequence is confusing and redundant. If the intent is to ensure clean directories exist, reverse the order or use
mkdir -palone (which is idempotent).🔎 Proposed fix
# Create temporary directories STORE_DIR="/tmp/fluent-bit-test" + +# Clean old data first (if exists) +if [ -d "$STORE_DIR" ]; then + rm -rf "$STORE_DIR" +fi + +# Create fresh directories mkdir -p "$STORE_DIR/parquet-gzip-simple" mkdir -p "$STORE_DIR/json-snappy-complex-low" mkdir -p "$STORE_DIR/json-gzip-complex-medium" mkdir -p "$STORE_DIR/json-zstd-complex-high" echo "✓ Created store directories" - -# Clean old data -rm -rf "$STORE_DIR"/*/ -echo "✓ Cleaned old data"This also addresses the safety concern from the previous review about
rm -rfon a potentially unset variable, by checking directory existence first.plugins/out_s3/s3.h (1)
123-124: Unusedapply_outer_compressionfield instruct flb_s3.
apply_outer_compressionis not referenced anywhere inplugins/out_s3/s3.c(compression behavior is driven byctx->format != FLB_S3_FORMAT_PARQUETinstead), so this field is dead state and adds confusion. Consider removing it or wiring it into the compression path; otherwise future readers might assume it actually controls outer compression.CMakeLists.txt (1)
202-202: Update Parquet C++ standard from C++11 to C++17.The Parquet encoder block enables C++ and forces
CMAKE_CXX_STANDARD 11, but current Apache Arrow/Parquet C++ releases require at least C++17. Keeping this at 11 will either fail the build or subtly misconfigure Arrow compilation whenFLB_PARQUET_ENCODERis ON. Please bump this to 17 (and consider aligning the earlier simdutfCMAKE_CXX_STANDARDas well).You should double‑check against the Arrow/Parquet version you target (their build docs list the minimum C++ standard) and confirm that setting
CMAKE_CXX_STANDARD 17is required whenFLB_PARQUET_ENCODERis enabled.Also applies to: 278-287, 1311-1316
plugins/out_s3/s3.c (1)
4210-4217:compressionconfig map still writes a string into anintfield.The
compressionentry is declared asFLB_CONFIG_MAP_STRbut its offset points tostruct flb_s3::compression(anint):{ FLB_CONFIG_MAP_STR, "compression", NULL, 0, FLB_TRUE, offsetof(struct flb_s3, compression), ... },During
flb_output_config_map_set(), this causes the parser to store achar *into anintslot, corruptingcompressionand adjacent fields. You now parsecompressionexplicitly viaflb_output_get_property()incb_s3_init(), so this map binding is both unsafe and unnecessary.Recommend either:
- Removing the struct offset (
FLB_FALSE, 0) and treatingcompressionas a pure property, or- Introducing a
char *compression_strfield and mapping the string there.src/flb_parquet_impl.cpp (3)
138-151: Add overflow handling for uint64 → int64 inconvert_to_int64().
MSGPACK_OBJECT_POSITIVE_INTEGERvalues are cast directly fromuint64_ttoint64_t:if (obj->type == MSGPACK_OBJECT_POSITIVE_INTEGER) { return builder->Append(static_cast<int64_t>(obj->via.u64)); }Values greater than
INT64_MAXwill silently wrap, corrupting data in the Parquet file. This mirrors the earlierint32bug you already guarded against.You should add an overflow check similar to
convert_to_int32()and either clamp, append a default, or return an error so the caller can account for it.Please confirm how Arrow/Parquet is expected to handle out‑of‑range
uint64values (e.g., whether to clamp, treat as error, or promote to unsigned logical type) based on the versions you target, and align this converter with that behavior.
213-226: Timestamp conversion ignores schema time unit.
convert_to_timestamp()always treats the input as seconds and multiplies by 1,000,000 to get microseconds:return builder->Append(static_cast<int64_t>(obj->via.u64) * 1000000);However, when parsing the schema you support
"unit": "s"|"ms"|"us"|"ns"and buildarrow::timestamp(time_unit)accordingly. The converter never looks at that unit, so a schema declaringmsornswill still receive values scaled as if they were seconds→µs.You likely want to:
- Downcast
type(or the builder’s type) toarrow::TimestampType,- Inspect its
unit(),- Apply the appropriate multiplier/divider so the stored values actually match the declared unit.
Double‑check the Arrow
TimestampTypesemantics for units (SECOND,MILLI,MICRO,NANO) and ensure the scaling here matches how your upstream logs encode timestamps (seconds vs ms vs ns).Also applies to: 386-424, 505-555
301-321: Storingmsgpack_objectinstances acrossmsgpack_unpack_next()iterations may rely on msgpack zone semantics.You collect
msgpack_objectinstances into astd::vectorand process them after the unpack loop:std::vector<msgpack_object> records; ... while (msgpack_unpack_next(&result, ...) == MSGPACK_UNPACK_SUCCESS) { ... records.push_back(array.ptr[i]); // or result.data } ... for (auto &record : records) { ... }These
msgpack_objects contain pointers into themsgpack_unpackedzone. This is fine only ifmsgpack_unpack_next()never invalidates existing zone allocations while reusing the samemsgpack_unpackedacross iterations; some msgpack documentation/patterns assume you consume each object immediately.If the zone is reused or compacted between iterations,
recordscould end up with dangling pointers once more data is unpacked.Please verify against the msgpack‑c docs for
msgpack_unpackedandmsgpack_unpack_next()whether it is safe to keepmsgpack_objectcopies after subsequentmsgpack_unpack_next()calls, or whether you must either:
- Process each record inside the loop, or
- Deep‑copy the record into your own storage (including map/array/string contents) before the next unpack.
Also applies to: 474-555
🧹 Nitpick comments (9)
examples/s3_output/run-s3-parquet-test.sh (1)
38-39: Relative paths may cause execution failures.The script uses relative paths for both the binary (
./build/bin/fluent-bit) and config file (fluent-bit-s3-parquet.conf). This will fail unless run from the repository root.Consider adding a working directory check or using
$SCRIPT_DIRto make paths relative to the script location.🔎 Proposed improvement
+# Get script directory for relative paths +SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" +REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)" + # Run Fluent Bit -./build/bin/fluent-bit -c fluent-bit-s3-parquet.conf +"$REPO_ROOT/build/bin/fluent-bit" -c "$SCRIPT_DIR/fluent-bit-s3-parquet.conf"examples/s3_output/athena-queries.sql (2)
38-44: Hardcoded bucket name - consider adding placeholder instructions.The S3 bucket
s3-bucket-kafka-sinkis hardcoded throughout. While acceptable for examples, consider adding a comment at the top noting that users should replace this with their own bucket name.
99-101: Hardcoded date filter in example queries.All queries use
day='20251224'which matches "today's date" per the context. For a reusable example, consider using a dynamic date or adding a note about updating this value.examples/s3_output/README.md (3)
112-114: Add language specifier to fenced code block.Per markdownlint, fenced code blocks should have a language specified. This block shows a logical expression.
🔎 Proposed fix
-``` +```text Upload Trigger = upload_timeout reached OR total_file_size reached</details> --- `177-181`: **Add language specifier to fenced code block.** Per markdownlint, fenced code blocks should have a language specified for the directory tree. <details> <summary>🔎 Proposed fix</summary> ```diff -``` +```text s3://s3-bucket-kafka-sink/fluent-bit-logs/ ├── putobject/app/day=YYYYMMDD/ # Case 1: Small files via PutObject └── multipart/access/day=YYYYMMDD/ # Case 2: Large files via Multipart Upload</details> --- `270-275`: **Add language specifier to fenced code block.** Per markdownlint, fenced code blocks should have a language specified for log output examples. <details> <summary>🔎 Proposed fix</summary> ```diff -``` +```text [parquet] Data quality summary for 53206 records: [parquet] Missing non-nullable fields (defaults used): [parquet] field='level' count=52506 [parquet] field='timestamp' count=52506</details> </blockquote></details> <details> <summary>cmake/parquet.cmake (1)</summary><blockquote> `150-163`: **Consider adding default path fallback for Arrow discovery.** The `NO_DEFAULT_PATH` option prevents CMake from searching standard system paths. This might miss Arrow installations in non-standard but common locations not listed in `SEARCH_PATHS`. Consider removing `NO_DEFAULT_PATH` or adding a secondary search without it as a fallback. <details> <summary>🔎 Proposed improvement</summary> ```diff find_path(ARROW_INCLUDE_DIR NAMES arrow/api.h PATHS ${SEARCH_PATHS} PATH_SUFFIXES ${INCLUDE_SUFFIXES} include/arrow - NO_DEFAULT_PATH ) find_library(ARROW_LIBRARY NAMES ${ARROW_LIB_NAMES} PATHS ${SEARCH_PATHS} PATH_SUFFIXES ${LIB_SUFFIXES} - NO_DEFAULT_PATH )Alternatively, keep
NO_DEFAULT_PATHfor the first search and add a fallback search without it if the first fails.include/fluent-bit/flb_parquet.h (2)
23-25: Consider removing unused include.The header includes
<fluent-bit/flb_sds.h>but the function signature doesn't useflb_sds_t. If this is not used elsewhere in the implementation exposed by this header, consider removing it to reduce compilation dependencies.#!/bin/bash # Check if flb_sds types are used in the parquet implementation or header rg -n "flb_sds" src/flb_parquet_impl.cpp include/fluent-bit/flb_parquet.h
82-85: Missing extern "C" guard for C++ compatibility.Since this header will be included by C++ code (
flb_parquet_impl.cpp), consider addingextern "C"guards to ensure proper C linkage when compiled as C++.🔎 Proposed fix
+#ifdef __cplusplus +extern "C" { +#endif + void *flb_msgpack_raw_to_parquet(const void *in_buf, size_t in_size, const char *schema_str, int compression, size_t *out_size); +#ifdef __cplusplus +} +#endif + #endif
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
.github/workflows/unit-tests.yamlCMakeLists.txtcmake/parquet.cmakeexamples/s3_output/README.mdexamples/s3_output/athena-queries.sqlexamples/s3_output/fluent-bit-s3-parquet.confexamples/s3_output/run-s3-parquet-test.shinclude/fluent-bit/aws/flb_aws_compress.hinclude/fluent-bit/flb_parquet.hplugins/out_s3/s3.cplugins/out_s3/s3.hsrc/CMakeLists.txtsrc/aws/CMakeLists.txtsrc/aws/compression/CMakeLists.txtsrc/aws/compression/arrow/CMakeLists.txtsrc/aws/compression/arrow/compress.csrc/aws/compression/arrow/compress.hsrc/aws/flb_aws_compress.csrc/flb_parquet_impl.cpp
💤 Files with no reviewable changes (5)
- src/aws/compression/arrow/compress.h
- src/aws/compression/arrow/CMakeLists.txt
- src/aws/compression/CMakeLists.txt
- src/aws/CMakeLists.txt
- src/aws/compression/arrow/compress.c
🚧 Files skipped from review as they are similar to previous changes (2)
- .github/workflows/unit-tests.yaml
- examples/s3_output/fluent-bit-s3-parquet.conf
🧰 Additional context used
🧠 Learnings (15)
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
src/aws/flb_aws_compress.csrc/CMakeLists.txtCMakeLists.txtplugins/out_s3/s3.cinclude/fluent-bit/aws/flb_aws_compress.h
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Applied to files:
src/aws/flb_aws_compress.csrc/CMakeLists.txtCMakeLists.txtsrc/flb_parquet_impl.cppplugins/out_s3/s3.cinclude/fluent-bit/aws/flb_aws_compress.h
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.
Applied to files:
src/aws/flb_aws_compress.csrc/CMakeLists.txtCMakeLists.txtplugins/out_s3/s3.cinclude/fluent-bit/aws/flb_aws_compress.h
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.
Applied to files:
src/aws/flb_aws_compress.csrc/CMakeLists.txtCMakeLists.txtplugins/out_s3/s3.cinclude/fluent-bit/aws/flb_aws_compress.h
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.
Applied to files:
src/aws/flb_aws_compress.csrc/CMakeLists.txtCMakeLists.txtplugins/out_s3/s3.cinclude/fluent-bit/aws/flb_aws_compress.h
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.
Applied to files:
src/aws/flb_aws_compress.csrc/CMakeLists.txtCMakeLists.txt
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
src/CMakeLists.txtCMakeLists.txtcmake/parquet.cmake
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.
Applied to files:
src/CMakeLists.txtCMakeLists.txtcmake/parquet.cmake
📚 Learning: 2025-12-22T05:39:02.291Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11250
File: lib/ripser-1.2.1/Makefile:1-18
Timestamp: 2025-12-22T05:39:02.291Z
Learning: In fluent/fluent-bit, the Makefile at lib/ripser-1.2.1/Makefile is imported from upstream Ripser and is not used in the actual build process. The project uses CMake for building (lib/ripser-1.2.1/CMakeLists.txt), so changes to the imported Makefile are not necessary.
Applied to files:
src/CMakeLists.txt
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.
Applied to files:
src/CMakeLists.txtCMakeLists.txt
📚 Learning: 2025-09-14T09:46:09.531Z
Learnt from: aminvakil
Repo: fluent/fluent-bit PR: 10844
File: conf/fluent-bit:13-15
Timestamp: 2025-09-14T09:46:09.531Z
Learning: For fluent-bit Debian packaging, /opt/fluent-bit/bin/ is the appropriate installation path since the package may be installed from non-official Debian sources, making /opt compliant with FHS for optional software packages.
Applied to files:
src/CMakeLists.txt
📚 Learning: 2025-09-04T07:28:37.083Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10832
File: src/wasm/CMakeLists.txt:112-131
Timestamp: 2025-09-04T07:28:37.083Z
Learning: In fluent-bit CMake files, the user cosmo0920 prefers treating Git as a command rather than a package, emphasizing that Git is not a pkg-config retrievable package but just a command.
Applied to files:
src/CMakeLists.txt
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.
Applied to files:
include/fluent-bit/flb_parquet.hexamples/s3_output/README.md
📚 Learning: 2025-09-04T12:35:22.872Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:1339-1344
Timestamp: 2025-09-04T12:35:22.872Z
Learning: In the Fluent Bit S3 plugin, the user prefers to maintain current retry_limit behavior without special handling for FLB_OUT_RETRY_UNLIMITED (-1), as there's no documentation indicating -1 should be used for infinite retries and consistency with current logic is preferred.
Applied to files:
plugins/out_s3/s3.c
📚 Learning: 2025-08-07T10:15:46.187Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10691
File: .github/workflows/unit-tests.yaml:94-101
Timestamp: 2025-08-07T10:15:46.187Z
Learning: Apache Arrow latest packages have changing checksums, making hardcoded checksum verification impractical for CI workflows that download the latest version.
Applied to files:
cmake/parquet.cmake
🧬 Code graph analysis (4)
src/aws/flb_aws_compress.c (2)
src/flb_gzip.c (1)
flb_gzip_compress(157-252)src/flb_zstd.c (1)
flb_zstd_compress(33-57)
src/flb_parquet_impl.cpp (1)
include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
include/fluent-bit/flb_parquet.h (1)
src/flb_parquet_impl.cpp (2)
flb_msgpack_raw_to_parquet(265-711)flb_msgpack_raw_to_parquet(265-268)
plugins/out_s3/s3.h (1)
plugins/out_s3/s3.c (1)
s3_put_object(1663-1800)
🪛 Cppcheck (2.19.0)
plugins/out_s3/s3.c
[error] 791-791: failed to expand 'flb_plg_error', it is invalid to use a preprocessor directive as macro parameter
(syntaxError)
🪛 LanguageTool
examples/s3_output/README.md
[grammar] ~56-~56: Ensure spelling is correct
Context: ..., 201, 404, 500, 401) - Response times (2ms to 5000ms) - Client information (IP, us...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~465-~465: Ensure spelling is correct
Context: ...les uploaded when reaching 50MB (before 60min timeout) Verification Steps: 1. R...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🪛 markdownlint-cli2 (0.18.1)
examples/s3_output/README.md
112-112: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
177-177: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
270-270: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (19)
src/aws/flb_aws_compress.c (2)
31-37: LGTM! Wrapper function correctly adapts the signature.The wrapper properly casts between
void*andchar*to adaptflb_snappy_compress's signature to the uniform compression interface. This follows the same pattern used for gzip and zstd.
45-76: Good refactoring to clarify compression algorithms vs format converters.The updated comments clearly distinguish true compression algorithms (gzip, snappy, zstd) from format converters (Arrow/Parquet). Removing ARROW/PARQUET from the runtime
compression_optionstable while keeping deprecated macros in the header is a clean backward-compatible approach.One minor observation: the comment on line 59 ("FLB_AWS_COMPRESS_NONE which is 0 is reserved for array footer") is now orphaned after the blank line. Consider moving it closer to the footer entry on line 77.
include/fluent-bit/aws/flb_aws_compress.h (2)
33-46: Good deprecation strategy with clear documentation.The comments clearly explain that ARROW is deprecated and PARQUET should use
format=parquetinstead. Keeping the macros prevents compilation errors for any downstream code that references them.
28-31: No backward compatibility issue from ZSTD renumbering.Compression type configuration uses string keywords ("gzip", "snappy", "zstd"), not numeric values. The numeric constants (0, 1, 2, 3, etc.) are internal implementation details used only at runtime for comparisons and branching. The
flb_aws_compression_get_type()function converts configuration strings to these numeric types at startup. Since numeric values are never persisted, serialized, or exposed to external systems, renumbering internal constants has no impact on user configurations or backward compatibility.cmake/parquet.cmake (1)
222-226: Good: Clear status reporting with emoji indicators.The status messages with ✓ and ✗ symbols provide clear visual feedback during CMake configuration. The inclusion of library paths and include directories aids debugging.
src/CMakeLists.txt (3)
157-166: LGTM! Parquet encoder integration follows existing patterns.The Parquet encoder source is conditionally added and marked as C++ correctly. This follows the same pattern as the AVRO encoder block above.
525-528: SASL linking additions appear correct for Kafka integration.The SASL library is conditionally linked when
FLB_SASL_ENABLEDis set, properly handling both the shared library and binary targets. This follows standard CMake patterns.Also applies to: 600-605
398-405: The code is correctly guarded. Whenif(FLB_PARQUET_ENCODER)at line 399 is true, it means the parentCMakeLists.txtverified that bothARROW_FOUNDandPARQUET_FOUNDwere true (lines 1320-1328), and it would have setFLB_PARQUET_ENCODER OFFotherwise. The variablesARROW_LIBRARIESandPARQUET_LIBRARIESare set and exported withPARENT_SCOPEincmake/parquet.cmake, ensuring they are available in the subdirectory scope.include/fluent-bit/flb_parquet.h (1)
27-85: Excellent API documentation with comprehensive schema examples.The documentation thoroughly covers:
- Parameter descriptions with format specifications
- Supported type mappings
- Usage example with proper memory management
- Links to Arrow documentation
This level of documentation is exemplary for a public API.
plugins/out_s3/s3.h (1)
197-205: Format fields, macros, ands3_put_objectprototype look consistent.
format/schema_strfields,FLB_S3_FORMAT_JSON/PARQUETmacros, and the updateds3_put_objectprototype (usingflb_sds_t buffer) all match their usage inplugins/out_s3/s3.cand cleanly expose JSON vs Parquet behavior.Also applies to: 234-235
plugins/out_s3/s3.c (8)
192-199: Correctly skippingContent-Encodingfor Parquet payloads.Conditioning the
Content-Encodingheader on(compression == gzip|zstd) && format != FLB_S3_FORMAT_PARQUETensures Parquet objects are not wrapped in an outer HTTP compression layer while still advertising compression for JSON payloads. This matches the “internal only” compression model for Parquet and looks good.Also applies to: 227-239
803-823: Parquet format path: good separation of internal vs outer compression, but hard‑fails withoutschema_str.The Parquet branch in
construct_request_buffer()correctly:
- Avoids outer HTTP compression and uses
ctx->compressiononly as the internal Parquet codec.- Requires
schema_strand bails out early with a clear error if it’s missing.- Converts the Arrow buffer to
flb_sds_tand frees the Arrow‑allocated memory.This behavior is sound: format‑specific requirements are enforced once at conversion time, and buffer ownership is cleanly transferred to Fluent Bit.
Also applies to: 1515-1631
1217-1277: SDS compression path inupload_data()is generally correct, but relies solely onformatto skip outer compression.The new
upload_data()signature and logic look good overall:
bodyis nowflb_sds_t, withbody_to_send/size_to_sendcorrectly switched to the compressed SDS when compression succeeds.- The compression block is gated by
ctx->compression != FLB_AWS_COMPRESS_NONE && ctx->format != FLB_S3_FORMAT_PARQUET, so Parquet payloads are never double‑compressed.- All error paths that allocate
compressed_bodyalso destroy it, and both PutObject and multipart paths clean upcompressed_bodyafter use.No functional issues seen here; the behavior matches the intended “outer compression only for non‑Parquet” design.
Also applies to: 1325-1392
1442-1513: SDS buffer lifecycle aroundconstruct_request_buffer()/upload_data()is consistent (except unit tests).
put_all_chunks(),send_upload_request(), andcb_s3_upload()all:
- Call
construct_request_buffer()to obtain anflb_sds_t buffer.- Pass
bufferintos3_put_object()/upload_data().- Always
flb_sds_destroy(buffer)afterward, regardless of success/failure.This gives a single clear owner for each formatted buffer and avoids leaks in the main upload paths.
Also applies to: 1991-2015, 3440-3488, 3701-3724
1515-1661: Error handling and locking inconstruct_request_buffer()look solid.
- Handles the illegal
(new_data == NULL && chunk == NULL)combination explicitly.- For file‑backed chunks, locks the
s3_fileonly after a successful read, and reliably unlocks on every error path (realloc failure, Parquet schema error, conversion failure).- Frees the intermediate msgpack buffer in all paths (success and failure).
- Ensures
formatted_data/formatted_sizeare both valid before returning and destroys the SDS on conversion failure.This is a good, consistent ownership pattern between the file store, msgpack buffer, and formatted SDS.
1663-1800:s3_put_object()SDS refactor is behavior‑preserving.Switching the body parameter to
flb_sds_tintegrates cleanly:
- MD5 calculation and HTTP request still operate on a contiguous buffer (
body,body_size).- URI/key construction, seq_index updates, error paths, and logging remain unchanged.
- No new ownership expectations are imposed on callers (they still manage the SDS lifetime).
3953-3963: Raw msgpack buffering incb_s3_flush()is a good fit for Parquet.Creating
chunkas an SDS copy of the raw msgpack and deferring JSON/Parquet conversion toconstruct_request_buffer():
- Avoids double conversion when retries occur.
- Makes it easy to plug in additional formats later, since the on‑disk representation stays msgpack.
- Keeps buffer ownership straightforward via
buffer_chunk()/send_upload_request().No issues spotted here.
Also applies to: 4036-4085
4219-4231: Newformat,schema_str,upload_timeout, and bearer token options are wired correctly.
formatis exposed as a documented string option and interpreted viaflb_output_get_property("format", ...).schema_stris mapped intostruct flb_s3::schema_strand used by the Parquet conversion path.upload_timeoutis now a properFLB_CONFIG_MAP_TIMEoption and is referenced consistently in timeout checks and timer setup.authorization_endpoint_bearer_tokenfollows the existing auth option patterns.These mappings and docs align with how the fields are used later in the file.
Also applies to: 4381-4386, 4406-4414
src/flb_parquet_impl.cpp (1)
265-299: Overall Parquet conversion pipeline and memory ownership look good.
- Parameter validation, compression mapping, schema parsing (via cJSON), and Arrow schema construction are all checked with clear error logs.
- Builders are created per field, and all failure paths free cJSON resources and destroy the msgpack zone before returning.
- Record conversion tracks data‑quality stats (missing fields, type failures, int32 overflow, float→int clamps) and emits a concise summary only when issues exist.
- Parquet writing uses
BufferOutputStream(with auto‑growing buffer), closes the writer, then copies the final Arrow buffer intoflb_malloc‑managed memory (output_buffer), which is freed on all exception paths.The high‑level control flow and ownership model are solid; the remaining issues are mainly around numeric edge cases and msgpack object lifetime.
Also applies to: 331-446, 449-611, 616-692
a821fec to
5e868fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
CMakeLists.txt (1)
1311-1316: Parquet encoder should request C++17, not C++11Arrow/Parquet C++ now require at least C++17; forcing
CMAKE_CXX_STANDARD 11in the Parquet block will fail with current Arrow toolchains or silently misconfigure builds. Please bump this to 17 (and keep it consistent with any otherCMAKE_CXX_STANDARDoverrides).if(FLB_PARQUET_ENCODER) enable_language(CXX) set(CMAKE_CXX_STANDARD 17) include(cmake/parquet.cmake) ... endif()#!/bin/bash # Verify Arrow C++ minimum standard in your environment rg -n "CMAKE_CXX_STANDARD" CMakeLists.txt cmake -Scmake/parquet.cmake (1)
149-157: Fix Arrow include search (include/arrowsuffix still produces wrong include dir)In the manual search:
find_path(ARROW_INCLUDE_DIR NAMES arrow/api.h PATHS ${SEARCH_PATHS} PATH_SUFFIXES ${INCLUDE_SUFFIXES} include/arrow NO_DEFAULT_PATH )using
include/arrowas a suffix causesARROW_INCLUDE_DIRto resolve to${prefix}/include/arrow, so the later version check:if(EXISTS "${ARROW_INCLUDE_DIRS}/arrow/util/config.h")looks for
${prefix}/include/arrow/arrow/util/config.h, which is incorrect for standard Arrow layouts (${prefix}/include/arrow/...).Drop the
include/arrowsuffix and rely on${INCLUDE_SUFFIXES}(typically justinclude), e.g.:find_path(ARROW_INCLUDE_DIR NAMES arrow/api.h PATHS ${SEARCH_PATHS} PATH_SUFFIXES ${INCLUDE_SUFFIXES} NO_DEFAULT_PATH )The Parquet search’s
include/parquetsuffix is fine and should remain.Also applies to: 198-216
plugins/out_s3/s3.c (1)
744-775: Backward compatibility broken:compression=arrow|parquetnow fails instead of mapping toformat=parquetThe check at line 752 can never be true. When
flb_aws_compression_get_type(tmp)is called with"arrow"or"parquet", it returns-1because these strings are not in thecompression_optionstable inflb_aws_compress.c(which only includes gzip, snappy, and zstd). This causes the code to hit the "Unknown compression type" error path instead of handling these deprecated aliases.The constants
FLB_AWS_COMPRESS_ARROW(4) andFLB_AWS_COMPRESS_PARQUET(5) are defined but unreachable, contradicting the documented deprecated behavior and the inline comments claiming support.Handle these strings explicitly before calling
flb_aws_compression_get_type:tmp = flb_output_get_property("compression", ins); if (tmp) { if (strcasecmp(tmp, "arrow") == 0 || strcasecmp(tmp, "parquet") == 0) { flb_plg_warn(ctx->ins, "DEPRECATED: compression=%s is deprecated. Use format=parquet instead.", tmp); #ifdef FLB_HAVE_PARQUET_ENCODER ctx->format = FLB_S3_FORMAT_PARQUET; #else flb_plg_error(ctx->ins, "Parquet format is not supported in this build. " "Rebuild with -DFLB_PARQUET_ENCODER=On."); return -1; #endif } else { ret = flb_aws_compression_get_type(tmp); if (ret == -1) { flb_plg_error(ctx->ins, "Unknown compression type: %s", tmp); return -1; } ctx->compression = ret; } }
🧹 Nitpick comments (3)
plugins/out_s3/s3.c (2)
49-57: Remove duplicate static declarations forget_upload/create_uploadBoth functions are declared twice (once near the top as forward declarations and again before their definitions). This is harmless but noisy and can confuse readers; you only need one forward declaration (or none if you reorder definitions).
Also applies to: 86-92
1218-1277: SDS-based outer compression flow is sound, but pre/post compression logging is now misleadingThe refactor in
upload_datato:
- Use
body_to_send/size_to_send(possibly compressed) while leaving the originalbody/body_sizeunchanged, and- Wrap the compressed payload in an
flb_sds_t(compressed_body) with consistent cleanup,looks correct and avoids raw
malloc/freehandling in callers.However, the informational log in the "too small, using PutObject" branch still prints:
flb_plg_info(ctx->ins, "Pre-compression upload_chunk_size= %zu, After compression, chunk is only %zu bytes, ...", preCompress_size, body_size);while the actual payload on the wire is
size_to_send, notbody_size. For Parquet, this message can also trigger even though no outer compression runs.Consider:
- Basing the "after compression" size on
size_to_send, and- Gating this log on
ctx->format != FLB_S3_FORMAT_PARQUETto avoid implying any outer compression there.Also applies to: 1300-1314, 1330-1392
examples/s3_output/README.md (1)
63-72: Add language identifiers to fenced code blocks for better tooling supportThe README content is solid and matches the new S3/Parquet behavior, but several fenced blocks (config snippets, shell commands, JSON schemas, SQL) are missing language tags. Adding them will quiet markdownlint and improve rendering/IDE support, e.g.:
```ini [OUTPUT] Name s3 ...cmake .. make -j8{ "fields": [ ... ] }SELECT ...;Also applies to: 85-95, 144-148, 152-156, 160-164, 178-181, 209-218, 241-251, 296-314, 341-349, 361-365, 384-426, 432-438, 442-456, 460-466, 484-495
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
.github/workflows/unit-tests.yamlCMakeLists.txtcmake/parquet.cmakeexamples/s3_output/README.mdexamples/s3_output/athena-queries.sqlexamples/s3_output/fluent-bit-s3-parquet.confexamples/s3_output/run-s3-parquet-test.shinclude/fluent-bit/aws/flb_aws_compress.hplugins/out_s3/s3.cplugins/out_s3/s3.hsrc/CMakeLists.txtsrc/aws/CMakeLists.txtsrc/aws/compression/CMakeLists.txtsrc/aws/compression/arrow/CMakeLists.txtsrc/aws/compression/arrow/compress.csrc/aws/compression/arrow/compress.hsrc/aws/flb_aws_compress.c
💤 Files with no reviewable changes (5)
- src/aws/compression/arrow/CMakeLists.txt
- src/aws/compression/arrow/compress.c
- src/aws/compression/arrow/compress.h
- src/aws/CMakeLists.txt
- src/aws/compression/CMakeLists.txt
🚧 Files skipped from review as they are similar to previous changes (4)
- examples/s3_output/athena-queries.sql
- examples/s3_output/fluent-bit-s3-parquet.conf
- examples/s3_output/run-s3-parquet-test.sh
- .github/workflows/unit-tests.yaml
🧰 Additional context used
🧠 Learnings (16)
📓 Common learnings
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
src/CMakeLists.txtcmake/parquet.cmakeCMakeLists.txt
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.
Applied to files:
src/CMakeLists.txtcmake/parquet.cmakeCMakeLists.txt
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Applied to files:
src/CMakeLists.txtinclude/fluent-bit/aws/flb_aws_compress.hplugins/out_s3/s3.cCMakeLists.txtsrc/aws/flb_aws_compress.c
📚 Learning: 2025-12-22T05:39:02.291Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11250
File: lib/ripser-1.2.1/Makefile:1-18
Timestamp: 2025-12-22T05:39:02.291Z
Learning: In fluent/fluent-bit, the Makefile at lib/ripser-1.2.1/Makefile is imported from upstream Ripser and is not used in the actual build process. The project uses CMake for building (lib/ripser-1.2.1/CMakeLists.txt), so changes to the imported Makefile are not necessary.
Applied to files:
src/CMakeLists.txt
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
src/CMakeLists.txtinclude/fluent-bit/aws/flb_aws_compress.hplugins/out_s3/s3.cCMakeLists.txtsrc/aws/flb_aws_compress.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.
Applied to files:
src/CMakeLists.txtCMakeLists.txt
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.
Applied to files:
src/CMakeLists.txtinclude/fluent-bit/aws/flb_aws_compress.hplugins/out_s3/s3.cCMakeLists.txtsrc/aws/flb_aws_compress.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.
Applied to files:
src/CMakeLists.txtinclude/fluent-bit/aws/flb_aws_compress.hplugins/out_s3/s3.cCMakeLists.txtsrc/aws/flb_aws_compress.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.
Applied to files:
src/CMakeLists.txtinclude/fluent-bit/aws/flb_aws_compress.hplugins/out_s3/s3.cCMakeLists.txtsrc/aws/flb_aws_compress.c
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.
Applied to files:
src/CMakeLists.txtinclude/fluent-bit/aws/flb_aws_compress.hCMakeLists.txtsrc/aws/flb_aws_compress.c
📚 Learning: 2025-09-14T09:46:09.531Z
Learnt from: aminvakil
Repo: fluent/fluent-bit PR: 10844
File: conf/fluent-bit:13-15
Timestamp: 2025-09-14T09:46:09.531Z
Learning: For fluent-bit Debian packaging, /opt/fluent-bit/bin/ is the appropriate installation path since the package may be installed from non-official Debian sources, making /opt compliant with FHS for optional software packages.
Applied to files:
src/CMakeLists.txt
📚 Learning: 2025-09-04T07:28:37.083Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10832
File: src/wasm/CMakeLists.txt:112-131
Timestamp: 2025-09-04T07:28:37.083Z
Learning: In fluent-bit CMake files, the user cosmo0920 prefers treating Git as a command rather than a package, emphasizing that Git is not a pkg-config retrievable package but just a command.
Applied to files:
src/CMakeLists.txt
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.
Applied to files:
examples/s3_output/README.md
📚 Learning: 2025-08-07T10:15:46.187Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10691
File: .github/workflows/unit-tests.yaml:94-101
Timestamp: 2025-08-07T10:15:46.187Z
Learning: Apache Arrow latest packages have changing checksums, making hardcoded checksum verification impractical for CI workflows that download the latest version.
Applied to files:
cmake/parquet.cmake
📚 Learning: 2025-09-04T12:35:22.872Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:1339-1344
Timestamp: 2025-09-04T12:35:22.872Z
Learning: In the Fluent Bit S3 plugin, the user prefers to maintain current retry_limit behavior without special handling for FLB_OUT_RETRY_UNLIMITED (-1), as there's no documentation indicating -1 should be used for infinite retries and consistency with current logic is preferred.
Applied to files:
plugins/out_s3/s3.c
🪛 Cppcheck (2.19.0)
plugins/out_s3/s3.c
[error] 791-791: failed to expand 'flb_plg_error', it is invalid to use a preprocessor directive as macro parameter
(syntaxError)
🪛 LanguageTool
examples/s3_output/README.md
[grammar] ~56-~56: Ensure spelling is correct
Context: ..., 201, 404, 500, 401) - Response times (2ms to 5000ms) - Client information (IP, us...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~465-~465: Ensure spelling is correct
Context: ...les uploaded when reaching 50MB (before 60min timeout) Verification Steps: 1. R...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🪛 markdownlint-cli2 (0.18.1)
examples/s3_output/README.md
112-112: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
177-177: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
270-270: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (11)
plugins/out_s3/s3.h (1)
123-199: New format/schema fields and S3 format macros look consistent with implementationAppending
format,schema_str, the JSON/PARQUET macros, and the updateds3_put_objectprototype matches hows3.cnow routes JSON vs Parquet and SDS bodies. No structural or API consistency issues here.Also applies to: 201-204, 233-234
plugins/out_s3/s3.c (7)
192-239: Header logic correctly skips outer Content-Encoding for ParquetThe updated
create_headersnow omitsContent-Encodingwhenctx->format == FLB_S3_FORMAT_PARQUET, matching the fact that Parquet uses its own internal compression. This aligns headers with the new upload semantics.
777-823: Format selection and Parquet encoder guards are consistent with new designThe new
formatparsing and#ifdef FLB_HAVE_PARQUET_ENCODERchecks correctly:
- Default to JSON unless Parquet was set (including via legacy compression mapping once fixed).
- Reject
format=parquetbuilds that lackFLB_PARQUET_ENCODER.- For Parquet builds, keep
ctx->compressionfor internal Parquet compression only and skip outer compression.This matches the intended separation of JSON vs Parquet and internal-vs-outer compression.
1663-1800: Updateds3_put_objectsignature and SDS handling are consistentThe new
s3_put_objecttakingflb_sds_t body, size_t body_sizeis wired correctly:
- Callers pass SDS buffers and later destroy them (or the owning wrappers).
- MD5 computation and HTTP request use the SDS pointer as a regular
char *.- Error paths free
uriand preserveseq_indexsemantics via thedecrement_indexlabel.No memory lifetime issues are apparent around the new body type.
1991-2015: SDS buffer ownership insend_upload_requestandunit_test_flushis now explicitBoth paths:
- Create an SDS buffer via
construct_request_buffer,- Pass it into
upload_data,- Always
flb_sds_destroy(buffer)after the call,which removes the previous mixed ownership around raw buffers and avoids leaks in error paths.
Also applies to: 3706-3724
3953-4089: Flush path correctly preserves raw MsgPack and defers formatting
cb_s3_flushnow:
- Stores raw MsgPack into an SDS (
chunk),- Uses
s3_store_file_get/buffer_chunkto persist raw MsgPack,- Only converts to JSON/Parquet inside
construct_request_bufferright before upload.This is the right abstraction for supporting multiple output formats (JSON vs Parquet) without duplicating buffer logic.
4210-4231: Compression/format/schema/upload_timeout config docs largely match behavior, but note legacy compression caveatThe updated config map entries:
- Document
compressionasnone|gzip|snappy|zstdwith legacyarrow/parquetvalues deprecated,- Introduce
formatandschema_str(required forformat=parquet),- Add a documented
upload_timeoutparameter, and- Add
authorization_endpoint_bearer_token,which aligns with the new logic in
cb_s3_initand the Parquet encoder path—once the legacycompression=arrow|parquethandling is fixed as noted above.Also applies to: 4381-4413
1450-1660: Raw MsgPack buffering + deferred JSON/Parquet conversion is correctly implementedThe new
construct_request_buffer/put_all_chunksflow:
- Reads raw MsgPack from
s3_store_file_read,- Optionally appends in-memory MsgPack (
new_data) viaflb_realloc+memcpy,- Converts once per upload to either:
log_keyprojection,- JSON lines, or
- Parquet via
flb_msgpack_raw_to_parquet(withschema_strandctx->compression),- Properly cleans up all internal allocations (msgpack buffer, intermediate formatted data) on all error paths before returning,
- Never sets
*out_bufon error, ensuring callers receive only valid or NULL pointers, and- Returns an
flb_sds_twhich all four callers properly destroy on the success path.Chunk locking/unlocking on error paths is balanced throughout. Memory management is sound.
src/CMakeLists.txt (1)
157-166: Parquet encoder and SASL library wiring in src/CMakeLists.txt look correct
flb_parquet_impl.cppis only built whenFLB_PARQUET_ENCODERis enabled and is explicitly marked as C++, matching the Arrow/Parquet dependency.ARROW_LIBRARIES/PARQUET_LIBRARIESare only appended toFLB_DEPSunder the same flag, consistent withcmake/parquet.cmake.- SASL is linked into both
fluent-bit-sharedandfluent-bit-binonly whenFLB_SASL_ENABLEDis set, so non-Kafka or non-SASL builds remain unaffected.Also applies to: 398-405, 525-528, 600-603
include/fluent-bit/aws/flb_aws_compress.h (1)
23-47: Compression enum layout is fine; ensure callers don’t rely on ARROW/PARQUET inflb_aws_compression_get_typeThe new assignments (NONE=0, GZIP=1, SNAPPY=2, ZSTD=3, ARROW=4 [deprecated], PARQUET=5 [“use format=parquet instead”]) are internally consistent and safe as long as all code uses the macros, not raw integers.
Given
flb_aws_compress.cnow only recognizesgzip|snappy|zstd, ARROW/PARQUET are effectively compile-time constants for legacy conditionals; callers must not expectflb_aws_compression_get_type()to ever return these values anymore. The S3 plugin currently still does (and should be fixed as noted ins3.c).src/aws/flb_aws_compress.c (1)
24-38: SNAPPY support is wired correctly; note that ARROW/PARQUET are no longer valid compression types
flb_snappy_compress_wrappercorrectly adaptsflb_snappy_compressto the common(void **out_data, size_t *out_len)signature and is registered under"snappy"incompression_options.- The options table now only contains true compression algorithms (gzip/snappy/zstd); ARROW/PARQUET have been removed, so
flb_aws_compression_get_type("arrow")/"parquet"will now return-1.This matches the new separation between compression and format conversion, but requires callers (notably the S3 plugin) to treat
"arrow"/"parquet"as legacy format aliases rather than compression types—see the earlier comment ins3.c.Also applies to: 57-78
5e868fc to
2555471
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (3)
examples/s3_output/athena-queries.sql (1)
168-181: ReplacePERCENTILEwithapprox_percentilefor Athena compatibility.Athena does not support the
PERCENTILE(duration_ms, 0.95)syntax. Useapprox_percentile(duration_ms, 0.95)instead.🔎 Proposed fix
SELECT path, method, COUNT(*) as request_count, AVG(duration_ms) as avg_duration_ms, - PERCENTILE(duration_ms, 0.95) as p95_duration_ms, + approx_percentile(duration_ms, 0.95) as p95_duration_ms, MAX(duration_ms) as max_duration_ms FROM access_logs_parquetcmake/parquet.cmake (1)
149-170: Remove theinclude/arrowsuffix from Arrow header search.The
PATH_SUFFIXES include/arrowinfind_pathfor Arrow headers is incorrect. When searching forarrow/api.h, CMake would find it at${prefix}/include/arrow/arrow/api.hwhich doesn't exist. The correct include directory should be${prefix}/includeso that#include <arrow/api.h>resolves correctly.🔎 Proposed fix
# Search for Arrow if(NOT ARROW_FOUND) find_path(ARROW_INCLUDE_DIR NAMES arrow/api.h PATHS ${SEARCH_PATHS} - PATH_SUFFIXES ${INCLUDE_SUFFIXES} include/arrow + PATH_SUFFIXES ${INCLUDE_SUFFIXES} NO_DEFAULT_PATH )plugins/out_s3/s3.c (1)
744-824: Legacy compression path should explicitly default to GZIP.The comment at line 764 says "Keep default gzip compression" but
ctx->compressionremainsFLB_AWS_COMPRESS_NONE(set at line 745). This contradicts user expectations for backward compatibility.🔎 Suggested fix
#ifndef FLB_HAVE_PARQUET_ENCODER flb_plg_error(ctx->ins, "Parquet format is not supported in this build. " "Rebuild with -DFLB_PARQUET_ENCODER=On."); return -1; #else ctx->format = FLB_S3_FORMAT_PARQUET; - /* Keep default gzip compression */ + /* Default to gzip compression for backward compatibility */ + if (ctx->compression == FLB_AWS_COMPRESS_NONE) { + ctx->compression = FLB_AWS_COMPRESS_GZIP; + } #endif
🧹 Nitpick comments (3)
examples/s3_output/README.md (2)
112-114: Add language specifier to fenced code block.The code block at line 112 is missing a language specifier, which affects syntax highlighting and linting.
🔎 Proposed fix
-``` +```text Upload Trigger = upload_timeout reached OR total_file_size reached</details> --- `177-181`: **Add language specifier to fenced code block.** The code block showing S3 upload paths is missing a language specifier. <details> <summary>🔎 Proposed fix</summary> ```diff -``` +```text s3://s3-bucket-kafka-sink/fluent-bit-logs/ ├── putobject/app/day=YYYYMMDD/ # Case 1: Small files via PutObject └── multipart/access/day=YYYYMMDD/ # Case 2: Large files via Multipart Upload</details> </blockquote></details> <details> <summary>src/flb_parquet_impl.cpp (1)</summary><blockquote> `708-736`: **Memory management is correct, but ownership should be documented.** The function allocates output_buffer with flb_malloc (line 710) and copies Arrow data into it (line 716). Exception handlers properly clean up on error (line 726). **Suggestion:** Add a comment in the function documentation (or in the header file) explicitly stating that the caller must free the returned buffer with flb_free(). </blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used**: defaults **Review profile**: CHILL **Plan**: Pro <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 5e868fceaf3521954dee257936901981215314ea and 2555471660ff1c95ba1231130e46af6cf36d9017. </details> <details> <summary>📒 Files selected for processing (19)</summary> * `.github/workflows/unit-tests.yaml` * `CMakeLists.txt` * `cmake/parquet.cmake` * `examples/s3_output/README.md` * `examples/s3_output/athena-queries.sql` * `examples/s3_output/fluent-bit-s3-parquet.conf` * `examples/s3_output/run-s3-parquet-test.sh` * `include/fluent-bit/aws/flb_aws_compress.h` * `include/fluent-bit/flb_parquet.h` * `plugins/out_s3/s3.c` * `plugins/out_s3/s3.h` * `src/CMakeLists.txt` * `src/aws/CMakeLists.txt` * `src/aws/compression/CMakeLists.txt` * `src/aws/compression/arrow/CMakeLists.txt` * `src/aws/compression/arrow/compress.c` * `src/aws/compression/arrow/compress.h` * `src/aws/flb_aws_compress.c` * `src/flb_parquet_impl.cpp` </details> <details> <summary>💤 Files with no reviewable changes (5)</summary> * src/aws/compression/arrow/compress.h * src/aws/CMakeLists.txt * src/aws/compression/CMakeLists.txt * src/aws/compression/arrow/CMakeLists.txt * src/aws/compression/arrow/compress.c </details> <details> <summary>🚧 Files skipped from review as they are similar to previous changes (4)</summary> * include/fluent-bit/flb_parquet.h * src/aws/flb_aws_compress.c * examples/s3_output/run-s3-parquet-test.sh * examples/s3_output/fluent-bit-s3-parquet.conf </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>🧠 Learnings (16)</summary> <details> <summary>📓 Common learnings</summary>Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like#ifdef FLB_HAVE_ZSTD. Unlike some other optional components such as ARROW/PARQUET (which use#ifdef FLB_HAVE_ARROWguards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.</details> <details> <summary>📚 Learning: 2025-08-07T10:15:46.187Z</summary>Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10691
File: .github/workflows/unit-tests.yaml:94-101
Timestamp: 2025-08-07T10:15:46.187Z
Learning: Apache Arrow latest packages have changing checksums, making hardcoded checksum verification impractical for CI workflows that download the latest version.**Applied to files:** - `cmake/parquet.cmake` - `.github/workflows/unit-tests.yaml` </details> <details> <summary>📚 Learning: 2025-08-31T12:46:11.940Z</summary>Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.**Applied to files:** - `cmake/parquet.cmake` - `.github/workflows/unit-tests.yaml` - `src/CMakeLists.txt` - `CMakeLists.txt` </details> <details> <summary>📚 Learning: 2025-08-31T12:46:11.940Z</summary>Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka isFLB_PREFER_SYSTEM_LIB_KAFKA=ON.**Applied to files:** - `cmake/parquet.cmake` - `.github/workflows/unit-tests.yaml` - `src/CMakeLists.txt` - `CMakeLists.txt` </details> <details> <summary>📚 Learning: 2025-08-29T06:25:27.250Z</summary>Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like#ifdef FLB_HAVE_ZSTD. Unlike some other optional components such as ARROW/PARQUET (which use#ifdef FLB_HAVE_ARROWguards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.**Applied to files:** - `.github/workflows/unit-tests.yaml` - `include/fluent-bit/aws/flb_aws_compress.h` - `plugins/out_s3/s3.c` - `src/CMakeLists.txt` - `CMakeLists.txt` </details> <details> <summary>📚 Learning: 2025-08-29T06:24:55.855Z</summary>Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.**Applied to files:** - `.github/workflows/unit-tests.yaml` - `src/flb_parquet_impl.cpp` - `include/fluent-bit/aws/flb_aws_compress.h` - `plugins/out_s3/s3.c` - `src/CMakeLists.txt` - `CMakeLists.txt` </details> <details> <summary>📚 Learning: 2025-08-29T06:25:27.250Z</summary>Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like#ifdef FLB_HAVE_ZSTD. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.**Applied to files:** - `.github/workflows/unit-tests.yaml` - `include/fluent-bit/aws/flb_aws_compress.h` - `plugins/out_s3/s3.c` - `src/CMakeLists.txt` - `CMakeLists.txt` </details> <details> <summary>📚 Learning: 2025-08-29T06:24:26.170Z</summary>Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.**Applied to files:** - `.github/workflows/unit-tests.yaml` - `include/fluent-bit/aws/flb_aws_compress.h` - `plugins/out_s3/s3.c` - `src/CMakeLists.txt` - `CMakeLists.txt` </details> <details> <summary>📚 Learning: 2025-08-29T06:24:44.797Z</summary>Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.**Applied to files:** - `.github/workflows/unit-tests.yaml` - `src/CMakeLists.txt` - `CMakeLists.txt` </details> <details> <summary>📚 Learning: 2025-09-08T11:21:33.975Z</summary>Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.**Applied to files:** - `.github/workflows/unit-tests.yaml` - `src/CMakeLists.txt` - `CMakeLists.txt` </details> <details> <summary>📚 Learning: 2025-08-29T06:25:02.561Z</summary>Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree atlib/zstd-1.5.7and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards likeFLB_HAVE_ZSTDand is always available. Headers like<fluent-bit/flb_zstd.h>can be included directly without guards.**Applied to files:** - `include/fluent-bit/aws/flb_aws_compress.h` - `plugins/out_s3/s3.c` - `src/CMakeLists.txt` - `CMakeLists.txt` </details> <details> <summary>📚 Learning: 2025-09-04T12:35:22.872Z</summary>Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:1339-1344
Timestamp: 2025-09-04T12:35:22.872Z
Learning: In the Fluent Bit S3 plugin, the user prefers to maintain current retry_limit behavior without special handling for FLB_OUT_RETRY_UNLIMITED (-1), as there's no documentation indicating -1 should be used for infinite retries and consistency with current logic is preferred.**Applied to files:** - `plugins/out_s3/s3.c` </details> <details> <summary>📚 Learning: 2025-12-22T05:39:02.291Z</summary>Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11250
File: lib/ripser-1.2.1/Makefile:1-18
Timestamp: 2025-12-22T05:39:02.291Z
Learning: In fluent/fluent-bit, the Makefile at lib/ripser-1.2.1/Makefile is imported from upstream Ripser and is not used in the actual build process. The project uses CMake for building (lib/ripser-1.2.1/CMakeLists.txt), so changes to the imported Makefile are not necessary.**Applied to files:** - `src/CMakeLists.txt` </details> <details> <summary>📚 Learning: 2025-09-14T09:46:09.531Z</summary>Learnt from: aminvakil
Repo: fluent/fluent-bit PR: 10844
File: conf/fluent-bit:13-15
Timestamp: 2025-09-14T09:46:09.531Z
Learning: For fluent-bit Debian packaging, /opt/fluent-bit/bin/ is the appropriate installation path since the package may be installed from non-official Debian sources, making /opt compliant with FHS for optional software packages.**Applied to files:** - `src/CMakeLists.txt` </details> <details> <summary>📚 Learning: 2025-09-04T07:28:37.083Z</summary>Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10832
File: src/wasm/CMakeLists.txt:112-131
Timestamp: 2025-09-04T07:28:37.083Z
Learning: In fluent-bit CMake files, the user cosmo0920 prefers treating Git as a command rather than a package, emphasizing that Git is not a pkg-config retrievable package but just a command.**Applied to files:** - `src/CMakeLists.txt` </details> <details> <summary>📚 Learning: 2025-11-21T06:23:29.770Z</summary>Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.**Applied to files:** - `examples/s3_output/README.md` </details> </details><details> <summary>🧬 Code graph analysis (2)</summary> <details> <summary>src/flb_parquet_impl.cpp (1)</summary><blockquote> <details> <summary>include/fluent-bit/flb_mem.h (1)</summary> * `flb_free` (126-128) </details> </blockquote></details> <details> <summary>plugins/out_s3/s3.h (1)</summary><blockquote> <details> <summary>plugins/out_s3/s3.c (1)</summary> * `s3_put_object` (1663-1800) </details> </blockquote></details> </details><details> <summary>🪛 Cppcheck (2.19.0)</summary> <details> <summary>plugins/out_s3/s3.c</summary> [error] 791-791: failed to expand 'flb_plg_error', it is invalid to use a preprocessor directive as macro parameter (syntaxError) </details> </details> <details> <summary>🪛 LanguageTool</summary> <details> <summary>examples/s3_output/README.md</summary> [grammar] ~56-~56: Ensure spelling is correct Context: ..., 201, 404, 500, 401) - Response times (2ms to 5000ms) - Client information (IP, us... (QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1) --- [grammar] ~465-~465: Ensure spelling is correct Context: ...les uploaded when reaching 50MB (before 60min timeout) **Verification Steps:** 1. R... (QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1) </details> </details> <details> <summary>🪛 markdownlint-cli2 (0.18.1)</summary> <details> <summary>examples/s3_output/README.md</summary> 112-112: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 177-177: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 270-270: Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> </details> </details> <details> <summary>🔇 Additional comments (28)</summary><blockquote> <details> <summary>.github/workflows/unit-tests.yaml (2)</summary><blockquote> `56-79`: **CI matrix configuration for Parquet encoder looks correct.** The Parquet encoder test configuration properly: - Uses the new `FLB_PARQUET_ENCODER` flag - Excludes Clang compiler (consistent with other specialized build options like coverage) - Aligns with the broader transition from Arrow GLib to C++ Arrow --- `117-125`: **Apache Arrow setup uses correct packages and official distribution source.** The configuration correctly: - Uses `packages.apache.org` as the official distribution source - Installs `libarrow-dev` and `libparquet-dev` (C++ libraries) instead of the GLib variants - Aligns the step name and condition with the new `FLB_PARQUET_ENCODER` flag </blockquote></details> <details> <summary>include/fluent-bit/aws/flb_aws_compress.h (2)</summary><blockquote> `33-46`: **Good documentation distinguishing compression from format conversion.** The comment block clearly explains: - The distinction between actual compression algorithms and file format conversions - Arrow's deprecation status with guidance not to use in new code - That Parquet should be used via `format=parquet` rather than `compression=parquet` This helps prevent user confusion about the difference between compression and format conversion. --- `24-46`: The concern about backward compatibility from renumbering constants is not applicable. The compression type configuration uses string-based names ("gzip", "snappy", "zstd", "parquet") via an internal lookup table, not numeric values. The enum constants are purely internal implementation details that never get serialized, persisted, or exposed to external code. Renumbering these internal values has no backward compatibility impact. </blockquote></details> <details> <summary>examples/s3_output/athena-queries.sql (1)</summary><blockquote> `1-80`: **Well-structured Athena table definitions and query examples.** The SQL file provides comprehensive examples covering: - Partition projection for automatic partition discovery - Both JSON and Parquet format table definitions - Realistic query patterns for log analysis This is valuable documentation for users implementing the new Parquet format support. </blockquote></details> <details> <summary>examples/s3_output/README.md (1)</summary><blockquote> `1-80`: **Excellent documentation for Parquet format support.** The README provides comprehensive coverage of: - Upload methods (PutObject vs Multipart) - Format and compression options - Schema definition and type handling with best practices - Troubleshooting guidance - Performance testing scenarios This is high-quality documentation that will help users adopt the new Parquet format feature. </blockquote></details> <details> <summary>plugins/out_s3/s3.h (2)</summary><blockquote> `195-203`: **Format and schema fields properly added for Parquet support.** The new struct members and constants are well-organized: - `format` field for output format selection - `schema_str` for user-defined Parquet schema - Clear format type constants with JSON=0 as default --- `233-234`: **Updated `s3_put_object` signature uses SDS buffer type.** The function signature change from raw buffer to `flb_sds_t` aligns with the SDS-based buffer handling throughout the S3 upload flow. </blockquote></details> <details> <summary>src/CMakeLists.txt (3)</summary><blockquote> `157-166`: **Parquet encoder C++ integration correctly configured.** The Parquet encoder integration: - Adds `flb_parquet_impl.cpp` under `FLB_PARQUET_ENCODER` guard - Properly marks the file as C++ using `set_source_files_properties` - Follows the existing pattern for conditional source inclusion --- `398-405`: **Parquet library dependencies properly linked.** The `ARROW_LIBRARIES` and `PARQUET_LIBRARIES` are correctly added to `FLB_DEPS` when `FLB_PARQUET_ENCODER` is enabled, ensuring proper linkage with the detected Arrow/Parquet libraries from `cmake/parquet.cmake`. --- `525-528`: **Verify SASL linking changes are intentionally part of this PR.** The SASL library linking for Kafka support (`sasl2`) is added here but appears unrelated to the Parquet encoder feature. Please confirm this change is intentional for this PR or if it should be in a separate commit. Also applies to: 600-605 </blockquote></details> <details> <summary>cmake/parquet.cmake (1)</summary><blockquote> `1-83`: **Well-designed cross-platform library detection module.** The parquet.cmake module provides robust library detection with: - Three fallback methods (pkg-config → CMake config → manual search) - Platform-specific path handling for Windows, macOS (including Apple Silicon), and Linux - Proper handling of both shared and static library targets - Clear status messaging </blockquote></details> <details> <summary>CMakeLists.txt (3)</summary><blockquote> `202-202`: **New `FLB_PARQUET_ENCODER` option properly defined.** The option follows the existing naming convention for encoders (`FLB_AVRO_ENCODER`) and is disabled by default, which is appropriate for an optional feature requiring external dependencies. --- `284-286`: **`FLB_ALL` properly enables both encoders.** The "Enable all features" mode now consistently enables both Avro and Parquet encoders. --- `1311-1329`: **Parquet encoder integration is well-structured.** The configuration block: - Enables C++ language support as required by Arrow/Parquet - Uses C++17 standard (addressing past review comment about Arrow 10.0+ requiring C++17) - Integrates with `cmake/parquet.cmake` for library detection - Gracefully disables the feature with warnings when dependencies are missing </blockquote></details> <details> <summary>src/flb_parquet_impl.cpp (6)</summary><blockquote> `1-40`: **LGTM: Headers and includes are properly organized.** The extern "C" wrapper correctly isolates C headers from C++ code, and all necessary dependencies are included. --- `41-94`: **LGTM: Converter class design is clean and comprehensive.** The statistics tracking for overflows and type conversions will help users identify data quality issues. --- `139-158`: **Overflow handling correctly implemented.** The uint64 to int64 overflow check at lines 142-146 properly addresses the previous review concern. Values exceeding INT64_MAX are clamped and tracked in statistics. --- `220-257`: **Timestamp unit handling correctly implemented.** The function now properly reads the target time unit from the builder's type (lines 221-223) and applies the appropriate multiplier. This addresses the previous review concern. **Note:** The code assumes input timestamps are in seconds (line 225). Ensure this is documented in the public API or consider accepting a parameter for input unit. --- `458-547`: **Excellent: Immediate record processing avoids dangling pointer issue.** The lambda processes each msgpack record immediately within the unpack loop (lines 469-546), which correctly addresses the past review comment about storing msgpack_object pointers that become invalid. The use of string_view for field lookup (line 493) also avoids unnecessary string copies. --- `332-443`: **Schema parsing is robust with good error messages.** The JSON schema parsing properly validates required fields (lines 365-377) and provides helpful error messages that include the problematic schema content (lines 337-338, 347-348). Timestamp unit parsing (lines 410-420) correctly handles all standard Arrow time units. </blockquote></details> <details> <summary>plugins/out_s3/s3.c (7)</summary><blockquote> `42-57`: **LGTM: Proper conditional compilation for Parquet support.** The FLB_HAVE_PARQUET_ENCODER guard ensures the header is only included when Parquet support is enabled. --- `195-229`: **Correct: Content-Encoding header skipped for Parquet.** Lines 196-197 and 228-229 properly skip the Content-Encoding header when `format=parquet`, since Parquet handles compression internally within the file format. This avoids double-compression overhead. --- `1244-1277`: **LGTM: Compression properly skipped for Parquet with correct memory management.** The check at line 1245 ensures Parquet format bypasses outer compression. The compressed data is properly converted to flb_sds_t (lines 1260-1272) for consistent memory management, and cleanup is handled in all code paths (lines 1333-1335, 1362-1364, 1377-1379, 1388-1391). --- `1582-1630`: **Parquet conversion properly integrated with good error handling.** The Parquet conversion block (lines 1582-1630) correctly: - Validates schema_str is provided (lines 1588-1596) - Calls the Parquet encoder with compression setting (lines 1598-1603) - Converts the result to flb_sds_t for consistent memory management (lines 1615-1627) - Frees the intermediate parquet_buffer (line 1617) - Handles all error paths with proper cleanup Verify that the schema_str requirement is documented in user-facing documentation and error messages are clear enough for users to understand the schema format. --- `1663-1664`: **LGTM: Signature change to flb_sds_t improves consistency.** The s3_put_object signature change to accept flb_sds_t (line 1664) aligns with the overall refactoring for consistent SDS-based buffer management. --- `3953-3962`: **Excellent architectural decision with clear documentation.** The comment at lines 3953-3956 clearly explains the design choice to store raw msgpack and defer format conversion until upload. This is particularly beneficial for Parquet, which can batch multiple records into optimized columnar storage. --- `4210-4231`: **Config map correctly defined with comprehensive documentation.** The configuration parameters are properly defined: - `compression` and `format` use offset 0 (lines 4211, 4220), indicating manual parsing in cb_s3_init, which is correct - `schema_str` is mapped to `ctx->schema_str` (line 4228) - Descriptions clearly document supported values, defaults, and usage (lines 4212-4217, 4221-4225, 4229-4231) **Note:** The past review comment about compression type mismatch is not applicable here since offset 0 means no automatic field mapping. </blockquote></details> </blockquote></details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
664cb4e to
eff671c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
plugins/out_s3/s3.c (2)
1217-1324: Usesize_to_send(compressed size) for routing decisions and loggingInside
upload_data(), after optional compression you introduce:flb_sds_t body_to_send = body; size_t size_to_send = body_size; ... preCompress_size = body_size; body_to_send = compressed_body; size_to_send = temp_size;but the subsequent logic still uses the original
body_sizefor decisions and logging:
- Timeout/size checks for whether to
goto put_objectvsmultipart.- The “Pre-compression ... After compression ...” log still uses
body_sizefor the “after” value.- Existing upload path compares
body_sizetoMIN_CHUNKED_UPLOAD_SIZE.With compression enabled, this means routing decisions and logs are based on the uncompressed size, which diverges from the actual payload size now stored in
size_to_send. It also breaks the intent of the informational log (both values end up identical).Switching those uses of
body_sizetosize_to_sendpreserves the old semantics (decisions based on the payload actually sent) and fixes the log.Suggested diff
- else if (body_size >= ctx->file_size) { + else if (size_to_send >= ctx->file_size) { @@ - else if(body_size > MIN_CHUNKED_UPLOAD_SIZE) { + else if(size_to_send > MIN_CHUNKED_UPLOAD_SIZE) { @@ - else { - if ((ctx->use_put_object == FLB_FALSE && (ctx->compression == FLB_AWS_COMPRESS_GZIP || ctx->compression == FLB_AWS_COMPRESS_ZSTD))) { - flb_plg_info(ctx->ins, "Pre-compression upload_chunk_size= %zu, After compression, chunk is only %zu bytes, " - "the chunk was too small, using PutObject to upload", preCompress_size, body_size); - } + else { + if (ctx->use_put_object == FLB_FALSE && + (ctx->compression == FLB_AWS_COMPRESS_GZIP || + ctx->compression == FLB_AWS_COMPRESS_ZSTD)) { + flb_plg_info(ctx->ins, + "Pre-compression upload_chunk_size= %zu, " + "After compression, chunk is only %zu bytes, " + "the chunk was too small, using PutObject to upload", + preCompress_size, size_to_send); + } @@ - if (body_size < MIN_CHUNKED_UPLOAD_SIZE) { + if (size_to_send < MIN_CHUNKED_UPLOAD_SIZE) {
1450-1500: SDS lifetime is mostly correct; fix leak inunit_test_flushThe SDS refactor is generally consistent:
construct_request_buffer()now always returnsflb_sds_tand callers destroy buffers withflb_sds_destroy().send_upload_request(),put_all_chunks(), andcb_s3_upload()correctly treat the returned buffer as an SDS and free it onceupload_data()completes.upload_data()cleans up the optionalcompressed_bodySDS in all exit paths.One leak remains in
unit_test_flush():s3_store_buffer_put(ctx, upload_file, tag, tag_len, chunk, (size_t) chunk_size, file_first_log_time); ret = construct_request_buffer(ctx, chunk, upload_file, &buffer, &buffer_size); ... ret = upload_data(ctx, upload_file, m_upload_file, buffer, buffer_size, tag, tag_len); /* buffer destroyed, but chunk is never freed */Unlike
buffer_chunk()and other call sites,chunkis never destroyed here, so every unit‑test flush leaks one SDS buffer whenFLB_S3_PLUGIN_UNDER_TESTis set.You can mirror
buffer_chunk()and freechunkimmediately afterconstruct_request_buffer():Suggested diff
- s3_store_buffer_put(ctx, upload_file, tag, tag_len, - chunk, (size_t) chunk_size, file_first_log_time); - ret = construct_request_buffer(ctx, chunk, upload_file, &buffer, &buffer_size); - if (ret < 0) { + s3_store_buffer_put(ctx, upload_file, tag, tag_len, + chunk, (size_t) chunk_size, file_first_log_time); + ret = construct_request_buffer(ctx, chunk, upload_file, &buffer, &buffer_size); + flb_sds_destroy(chunk); + if (ret < 0) { flb_plg_error(ctx->ins, "Could not construct request buffer for %s", upload_file->file_path); FLB_OUTPUT_RETURN(FLB_RETRY); }Also applies to: 1991-2015, 3701-3724
🧹 Nitpick comments (9)
examples/s3_output/run-s3-parquet-test.sh (2)
1-5: Consider adding shell script safety options.For test scripts, adding safety options helps catch errors early:
🔎 Suggested improvement
#!/bin/bash +set -euo pipefail echo "=== Fluent Bit S3 Comprehensive Test ===" echo ""
37-39: Cleanup section appears incomplete.The script echoes "Cleaning up..." but doesn't perform any actual cleanup. If cleanup is intentionally omitted (e.g., to allow inspection of test artifacts), consider adding a comment explaining this. Otherwise, add the cleanup logic:
🔎 Suggested completion
# Cleanup echo "" echo "Cleaning up..." +# Uncomment below to remove test artifacts after run +# rm -rf "$STORE_DIR" +echo "Test artifacts preserved in $STORE_DIR for inspection"examples/s3_output/README.md (3)
112-114: Add language specifier to fenced code block.Per markdownlint MD040, fenced code blocks should have a language specified for proper syntax highlighting.
🔎 Suggested fix
-``` +```text Upload Trigger = upload_timeout reached OR total_file_size reached</details> --- `177-181`: **Add language specifier to fenced code block.** <details> <summary>🔎 Suggested fix</summary> ```diff -``` +```text s3://s3-bucket-kafka-sink/fluent-bit-logs/ ├── putobject/app/day=YYYYMMDD/ # Case 1: Small files via PutObject └── multipart/access/day=YYYYMMDD/ # Case 2: Large files via Multipart Upload</details> --- `270-275`: **Add language specifier to fenced code block.** <details> <summary>🔎 Suggested fix</summary> ```diff -``` +```text [parquet] Data quality summary for 53206 records: [parquet] Missing non-nullable fields (defaults used): [parquet] field='level' count=52506 [parquet] field='timestamp' count=52506</details> </blockquote></details> <details> <summary>cmake/parquet.cmake (1)</summary><blockquote> `149-171`: **Manual include-path search is fine but extra `parquet` suffix is probably unnecessary** The manual search for Parquet uses: ```cmake PATH_SUFFIXES ${INCLUDE_SUFFIXES} parquetGiven
INCLUDE_SUFFIXESis alreadyinclude, this will check both${prefix}/include/parquet/...(viainclude+ theparquet/api/...name) and${prefix}/parquet/..., which is a bit unusual. It’s not harmful, but simplifying to just${INCLUDE_SUFFIXES}would avoid probing odd${prefix}/parquettrees and be more in line with the Arrow search just above.Also applies to: 173-195
CMakeLists.txt (1)
679-687: Consider unifying the C++ standard to avoid C++11/C++17 mixThe simdutf block sets
CMAKE_CXX_STANDARD 11, while the Parquet block later sets it to 17. CMake will honor the last assignment (17 when Parquet is enabled), but this mix of settings is slightly confusing and could surprise future refactors.You might want to:
- Drop the explicit
set(CMAKE_CXX_STANDARD 11)in the simdutf block, or- Set it to 17 there as well when Parquet support is enabled.
Functionally it works today; this is mainly about clarity and avoiding subtle build surprises.
Also applies to: 1311-1316
plugins/out_s3/s3.c (2)
42-57: Avoid duplicate forward declarations forget_upload/create_upload
get_uploadandcreate_uploadare forward‑declared twice (once in the new “Forward declarations” block and again just before their definitions). This is harmless but unnecessary noise.You can keep the new declaration for
flb_pack_msgpack_extract_log_keyand drop the duplicatedget_upload/create_uploadprototypes to reduce clutter.Also applies to: 86-91
744-823: Format/compression config flow is solid; consider earlyschema_strvalidationThe three‑step flow in
cb_s3_init:
- Defaults
compressiontoNONE, parses thecompressionproperty, and treatsarrow|parquetas legacy values that map toformat=parquet(with compile‑time checks aroundFLB_HAVE_PARQUET_ENCODER).- Adds an explicit
formatoption (json/parquet) that overrides the legacy compression mapping.- Validates that
format=parquetis not allowed when Parquet support isn’t compiled in.This is a good cleanup and keeps legacy configs working.
However,
schema_stris only validated later insideconstruct_request_buffer()andflb_msgpack_raw_to_parquet, which means a misconfiguredformat=parquetwithoutschema_strwill only surface on first flush, failing every chunk.To fail fast and give clearer feedback, you could add an init‑time check like:
#ifdef FLB_HAVE_PARQUET_ENCODER if (ctx->format == FLB_S3_FORMAT_PARQUET && ctx->schema_str == NULL) { flb_plg_error(ctx->ins, "format=parquet requires 'schema_str' to be set"); return -1; } #endifright after the format/validation block.
Also applies to: 4210-4231
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (18)
.github/workflows/unit-tests.yamlCMakeLists.txtcmake/parquet.cmakeexamples/s3_output/README.mdexamples/s3_output/athena-queries.sqlexamples/s3_output/fluent-bit-s3-parquet.confexamples/s3_output/run-s3-parquet-test.shinclude/fluent-bit/aws/flb_aws_compress.hplugins/out_kinesis_firehose/firehose.cplugins/out_s3/s3.cplugins/out_s3/s3.hsrc/CMakeLists.txtsrc/aws/CMakeLists.txtsrc/aws/compression/CMakeLists.txtsrc/aws/compression/arrow/CMakeLists.txtsrc/aws/compression/arrow/compress.csrc/aws/compression/arrow/compress.hsrc/aws/flb_aws_compress.c
💤 Files with no reviewable changes (5)
- src/aws/compression/arrow/compress.c
- src/aws/compression/arrow/compress.h
- src/aws/compression/CMakeLists.txt
- src/aws/compression/arrow/CMakeLists.txt
- src/aws/CMakeLists.txt
🚧 Files skipped from review as they are similar to previous changes (1)
- examples/s3_output/athena-queries.sql
🧰 Additional context used
🧠 Learnings (16)
📓 Common learnings
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
📚 Learning: 2025-08-07T10:15:46.187Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10691
File: .github/workflows/unit-tests.yaml:94-101
Timestamp: 2025-08-07T10:15:46.187Z
Learning: Apache Arrow latest packages have changing checksums, making hardcoded checksum verification impractical for CI workflows that download the latest version.
Applied to files:
.github/workflows/unit-tests.yamlcmake/parquet.cmake
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
.github/workflows/unit-tests.yamlCMakeLists.txtcmake/parquet.cmakesrc/CMakeLists.txt
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.
Applied to files:
.github/workflows/unit-tests.yamlCMakeLists.txtcmake/parquet.cmakesrc/CMakeLists.txt
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
.github/workflows/unit-tests.yamlinclude/fluent-bit/aws/flb_aws_compress.hsrc/aws/flb_aws_compress.cCMakeLists.txtplugins/out_kinesis_firehose/firehose.csrc/CMakeLists.txtplugins/out_s3/s3.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Applied to files:
.github/workflows/unit-tests.yamlinclude/fluent-bit/aws/flb_aws_compress.hsrc/aws/flb_aws_compress.cCMakeLists.txtplugins/out_kinesis_firehose/firehose.csrc/CMakeLists.txtplugins/out_s3/s3.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.
Applied to files:
.github/workflows/unit-tests.yamlinclude/fluent-bit/aws/flb_aws_compress.hsrc/aws/flb_aws_compress.cCMakeLists.txtplugins/out_kinesis_firehose/firehose.csrc/CMakeLists.txtplugins/out_s3/s3.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.
Applied to files:
.github/workflows/unit-tests.yamlinclude/fluent-bit/aws/flb_aws_compress.hsrc/aws/flb_aws_compress.cCMakeLists.txtplugins/out_kinesis_firehose/firehose.csrc/CMakeLists.txtplugins/out_s3/s3.c
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.
Applied to files:
.github/workflows/unit-tests.yamlsrc/aws/flb_aws_compress.cCMakeLists.txtsrc/CMakeLists.txt
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.
Applied to files:
.github/workflows/unit-tests.yamlCMakeLists.txtsrc/CMakeLists.txt
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.
Applied to files:
examples/s3_output/README.md
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.
Applied to files:
include/fluent-bit/aws/flb_aws_compress.hsrc/aws/flb_aws_compress.cCMakeLists.txtsrc/CMakeLists.txtplugins/out_s3/s3.c
📚 Learning: 2025-12-22T05:39:02.291Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11250
File: lib/ripser-1.2.1/Makefile:1-18
Timestamp: 2025-12-22T05:39:02.291Z
Learning: In fluent/fluent-bit, the Makefile at lib/ripser-1.2.1/Makefile is imported from upstream Ripser and is not used in the actual build process. The project uses CMake for building (lib/ripser-1.2.1/CMakeLists.txt), so changes to the imported Makefile are not necessary.
Applied to files:
src/CMakeLists.txt
📚 Learning: 2025-09-14T09:46:09.531Z
Learnt from: aminvakil
Repo: fluent/fluent-bit PR: 10844
File: conf/fluent-bit:13-15
Timestamp: 2025-09-14T09:46:09.531Z
Learning: For fluent-bit Debian packaging, /opt/fluent-bit/bin/ is the appropriate installation path since the package may be installed from non-official Debian sources, making /opt compliant with FHS for optional software packages.
Applied to files:
src/CMakeLists.txt
📚 Learning: 2025-09-04T07:28:37.083Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10832
File: src/wasm/CMakeLists.txt:112-131
Timestamp: 2025-09-04T07:28:37.083Z
Learning: In fluent-bit CMake files, the user cosmo0920 prefers treating Git as a command rather than a package, emphasizing that Git is not a pkg-config retrievable package but just a command.
Applied to files:
src/CMakeLists.txt
📚 Learning: 2025-09-04T12:35:22.872Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:1339-1344
Timestamp: 2025-09-04T12:35:22.872Z
Learning: In the Fluent Bit S3 plugin, the user prefers to maintain current retry_limit behavior without special handling for FLB_OUT_RETRY_UNLIMITED (-1), as there's no documentation indicating -1 should be used for infinite retries and consistency with current logic is preferred.
Applied to files:
plugins/out_s3/s3.c
🧬 Code graph analysis (2)
plugins/out_s3/s3.h (1)
plugins/out_s3/s3.c (1)
s3_put_object(1663-1800)
plugins/out_s3/s3.c (6)
src/flb_output.c (1)
flb_output_get_property(1108-1111)src/aws/flb_aws_compress.c (2)
flb_aws_compression_get_type(80-97)flb_aws_compression_compress(99-116)src/flb_sds.c (2)
flb_sds_create_len(58-76)flb_sds_destroy(389-399)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)plugins/out_s3/s3_multipart.c (1)
upload_part(685-799)src/flb_pack.c (1)
flb_pack_msgpack_to_json_format(1169-1450)
🪛 Cppcheck (2.19.0)
plugins/out_s3/s3.c
[error] 791-791: failed to expand 'flb_plg_error', it is invalid to use a preprocessor directive as macro parameter
(syntaxError)
🪛 LanguageTool
examples/s3_output/README.md
[grammar] ~56-~56: Ensure spelling is correct
Context: ..., 201, 404, 500, 401) - Response times (2ms to 5000ms) - Client information (IP, us...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~465-~465: Ensure spelling is correct
Context: ...les uploaded when reaching 50MB (before 60min timeout) Verification Steps: 1. R...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🪛 markdownlint-cli2 (0.18.1)
examples/s3_output/README.md
112-112: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
177-177: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
270-270: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (18)
plugins/out_kinesis_firehose/firehose.c (1)
496-501: LGTM! Documentation updated to reflect new compression options.The help text correctly lists the now-supported compression algorithms:
gzip,snappy, andzstd. This aligns with the changes inflb_aws_compress.cthat add Snappy support and remove Arrow/Parquet compression options.src/aws/flb_aws_compress.c (2)
31-37: LGTM! Snappy wrapper correctly adapts the function signature.The wrapper properly casts
void *tochar *as required byflb_snappy_compress, and the return value is passed through directly.
57-78: Well-structured separation of compression algorithms from format converters.The updated
compression_optionstable cleanly includes only true compression algorithms (gzip, snappy, zstd), with clear documentation explaining that Arrow/Parquet are format converters handled separately via theformatparameter.include/fluent-bit/aws/flb_aws_compress.h (1)
24-46: Well-documented deprecation and backward compatibility approach.The header cleanly separates compression algorithms from format converters, retains deprecated values for backward compatibility, and provides clear guidance to use
format=parquetinstead ofcompression=parquet. Based on learnings, this follows the established pattern where Arrow/Parquet use conditional compilation guards unlike ZSTD which is always available.examples/s3_output/fluent-bit-s3-parquet.conf (2)
143-145: Parquet schema correctly matches the access log data structure.The
schema_strdefines fields that align with the dummy access log inputs:timestamp,level,message,method,path,status,duration_ms,client_ip, anduser_agent. Usingint64fortimestampandstatus, andfloat64forduration_msare appropriate choices for avoiding overflow issues as noted in the README.
1-117: Comprehensive and well-documented test configuration.The configuration provides excellent coverage of both upload scenarios (PutObject vs Multipart) with realistic traffic patterns. The inline comments clearly explain the rationale for each setting, making this a valuable reference for users.
.github/workflows/unit-tests.yaml (2)
56-57: LGTM! Build flag renamed to reflect Parquet encoder focus.The change from
-DFLB_ARROW=Onto-DFLB_PARQUET_ENCODER=Oncorrectly reflects the shift from Arrow-based compression to Parquet encoder support.
117-125: Apache Arrow C++ package names are correct.The packages
libarrow-devandlibparquet-devare the correct C++ development packages for Apache Arrow and Parquet, providing the required headers and libraries for theflb_parquet_impl.cppimplementation.plugins/out_s3/s3.h (3)
196-199: New Parquet format fields properly integrated into struct.The
formatandschema_strfields provide clean support for runtime format selection and user-defined Parquet schemas. The placement at the end of the struct with clear section comments improves readability.
201-203: LGTM! Format type macros clearly defined.The
FLB_S3_FORMAT_JSON(0) andFLB_S3_FORMAT_PARQUET(1) macros provide a clean enum-like interface for format selection.
233-235: Function signature change verified as consistent across the codebase.All
s3_put_objectcall sites (s3.c:1330 and s3.c:1494) have been properly updated to use the newflb_sds_t bufferparameter. The declaration in s3.h and implementation in s3.c both align with the new signature.src/CMakeLists.txt (2)
157-166: Parquet encoder source and dependency wiring look consistent
flb_parquet_impl.cppis only added whenFLB_PARQUET_ENCODERis enabled and is explicitly marked as C++ viaset_source_files_properties, with Arrow/Parquet libs added toFLB_DEPSunder the same flag. This keeps Parquet support fully optional and avoids impacting non-Parquet builds.Also applies to: 398-405
525-528: SASL linkage for Kafka/MSK is correctly scoped to SASL-enabled buildsConditionally linking
sasl2intofluent-bit-sharedandfluent-bit-binunderFLB_SASL_ENABLEDmatches how Kafka/MSK SASL is detected and avoids dragging SASL into builds that don’t need it.Also applies to: 600-606
CMakeLists.txt (1)
200-202: Parquet encoder option and C++17 enabling are wired correctly
FLB_PARQUET_ENCODERis now a dedicated option, enabled byFLB_ALL, and the Parquet block:
- Calls
enable_language(CXX)and setsCMAKE_CXX_STANDARD 17when the encoder is requested.- Delegates Arrow/Parquet discovery to
cmake/parquet.cmake.- Only defines
FLB_HAVE_PARQUET_ENCODERand adds include directories when both Arrow and Parquet are found, otherwise it warns and flipsFLB_PARQUET_ENCODERback toOFF.This keeps Parquet strictly opt‑in and prevents partial configuration when dependencies are missing.
Also applies to: 278-287, 1311-1329
plugins/out_s3/s3.c (3)
97-121: Header handling for compression and Parquet looks correctThe new
zstdContent-Encodingheader and the guards increate_headers()to skipContent-Encodingwhenctx->format == FLB_S3_FORMAT_PARQUETalign with the intended behavior:
- GZIP/ZSTD still get the appropriate header for JSON output.
- Parquet relies on its internal codec only, without an outer content‑encoding header.
Also applies to: 183-239
1516-1661: SDS-based request/flush pipeline is a nice cleanupThe move to
flb_sds_tfor all request buffers (construct_request_buffer(),upload_data(),send_upload_request(),put_all_chunks(),cb_s3_upload(),cb_s3_flush()) significantly simplifies memory management:
- All conversions (JSON / log_key / Parquet) produce SDS and are freed in a single place per call chain.
construct_request_buffer()now cleanly owns and frees the intermediate msgpack buffer and unlocks chunks on all error paths.- Callers always destroy the returned SDS (
buffer) afterupload_data()/s3_put_object()regardless of success or failure, avoiding mixedfree()vsflb_sds_destroy()semantics.This should make future format additions (or tweaks to Parquet encoding) much less error‑prone.
Also applies to: 1943-2015, 3440-3488, 3953-4068
4210-4231: Config map docs for compression/format/schema/upload_timeout/auth look coherentThe updated config map entries:
- Clarify that
compressionsupportsnone|gzip|snappy|zstdand thatarrow|parquetare deprecated legacy values.- Introduce an explicit
formatoption (json/parquet) andschema_strfor Parquet, with docs that match the runtime behavior.- Document
upload_timeoutsemantics more clearly and addauthorization_endpoint_bearer_token.These descriptions match the new logic in
cb_s3_init()andconstruct_request_buffer().Also applies to: 4382-4413
cmake/parquet.cmake (1)
249-258: No action needed—the URLs in the code already match the official Apache Arrow documentation.The URLs currently use
https://packages.apache.org/artifactory/arrow/…, which is the correct format per official Apache Arrow installation guides. The/artifactory/arrow/segment is not obsolete; it is the official repository path for both APT and RPM packages. Removing it would break the URLs.Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (4)
examples/s3_output/README.md (1)
112-114: Consider adding language specifiers to fenced code blocks.Several code blocks in this documentation lack language specifiers, which can affect syntax highlighting. Consider adding appropriate language identifiers (e.g.,
text,ini,json) for better readability.tests/internal/parquet.c (1)
27-31: Consider including the header instead of redefining compression constants.These compression type constants are duplicated from
flb_aws_compress.h. If the values change in the source header, these tests will silently use incorrect values.🔎 Proposed fix
-/* Compression types from flb_aws_compress.h */ -#define FLB_AWS_COMPRESS_NONE 0 -#define FLB_AWS_COMPRESS_GZIP 1 -#define FLB_AWS_COMPRESS_SNAPPY 2 -#define FLB_AWS_COMPRESS_ZSTD 3 +#include <fluent-bit/aws/flb_aws_compress.h>If the header cannot be included due to dependencies, consider adding a comment noting that these values must be kept in sync with the source file.
src/flb_parquet_impl.cpp (2)
178-197: Unnecessary try/catch block around strtol.The
try/catchblock on lines 195-197 is unnecessary sincestrtolis a C function that does not throw exceptions. Theerrnoandendptrchecks already handle all error cases. The same pattern appears in other string parsing functions.🔎 Proposed fix
else if (obj->type == MSGPACK_OBJECT_STR) { - try { std::string str(obj->via.str.ptr, obj->via.str.size); char* endptr; errno = 0; long val = std::strtol(str.c_str(), &endptr, 10); if (endptr == str.c_str() || *endptr != '\0' || errno == ERANGE) { return arrow::Status::Invalid("Cannot parse string to int32"); } if (val > INT32_MAX || val < INT32_MIN) { int32_overflow_by_field[field_name]++; return builder->Append(val > INT32_MAX ? INT32_MAX : INT32_MIN); } string_parsed_to_int_by_field[field_name]++; return builder->Append(static_cast<int32_t>(val)); - } catch (...) { - return arrow::Status::Invalid("Cannot parse string to int32"); - } }Note: The
std::stringconstructor can throwstd::bad_alloc, so if you want to be defensive against memory allocation failures, you could keep a narrower catch or let it propagate to the outer try/catch inflb_msgpack_raw_to_parquet.
619-622: Prefer ValueUnsafe() or check status before ValueOrDie().
ValueOrDie()will terminate the program if the status is not OK, but here the status is already checked on line 618. UsingValueUnsafe()orMoveValueUnsafe()would be more semantically correct since you've already verified success.🔎 Proposed fix
if (!schema_result.ok()) { flb_error("[parquet] Failed to parse schema: %s\n", schema_result.status().ToString().c_str()); return NULL; } - auto schema = schema_result.ValueOrDie(); + auto schema = std::move(schema_result).ValueUnsafe();This pattern should be applied to other
ValueOrDie()calls where the status is already checked (lines 635, 859, 877, 899, 922).
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
CMakeLists.txtcmake/kafka.cmakeexamples/s3_output/README.mdlib/avro/CMakeLists.txtlib/jansson-e23f558/CMakeLists.txtsrc/flb_parquet_impl.cpptests/internal/CMakeLists.txttests/internal/parquet.c
🧰 Additional context used
🧠 Learnings (11)
📓 Common learnings
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.
Applied to files:
examples/s3_output/README.md
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
CMakeLists.txtcmake/kafka.cmake
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
CMakeLists.txtcmake/kafka.cmakesrc/flb_parquet_impl.cpp
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.
Applied to files:
CMakeLists.txt
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.
Applied to files:
CMakeLists.txtcmake/kafka.cmake
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Applied to files:
CMakeLists.txtcmake/kafka.cmakesrc/flb_parquet_impl.cpp
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.
Applied to files:
CMakeLists.txt
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.
Applied to files:
CMakeLists.txt
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.
Applied to files:
CMakeLists.txt
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.
Applied to files:
CMakeLists.txt
🧬 Code graph analysis (1)
tests/internal/parquet.c (1)
src/flb_parquet_impl.cpp (2)
flb_msgpack_raw_to_parquet(579-952)flb_msgpack_raw_to_parquet(579-582)
🪛 LanguageTool
examples/s3_output/README.md
[grammar] ~56-~56: Ensure spelling is correct
Context: ..., 201, 404, 500, 401) - Response times (2ms to 5000ms) - Client information (IP, us...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~737-~737: Ensure spelling is correct
Context: ...les uploaded when reaching 50MB (before 60min timeout) Verification Steps: 1. R...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🪛 markdownlint-cli2 (0.18.1)
examples/s3_output/README.md
112-112: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
177-177: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
286-286: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
294-294: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
305-305: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
315-315: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
328-328: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
344-344: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
362-362: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
436-436: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
447-447: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
542-542: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (10)
lib/avro/CMakeLists.txt (1)
19-19: CMake version bump aligns with project-wide requirements.The minimum CMake version increase to 3.12 is consistent with modern CMake practices and aligns with the Parquet/Arrow integration requirements introduced in this PR.
lib/jansson-e23f558/CMakeLists.txt (1)
567-569: Test target disabled with documented rationale.Commenting out the
checktarget to avoid conflicts with other libraries is acceptable. The comment explains the rationale clearly.cmake/kafka.cmake (1)
35-42: SASL configuration logic is well-structured.The changes correctly enable built-in SASL features (PLAIN/SCRAM/OAUTHBEARER) unconditionally while making Cyrus-SASL optional for GSSAPI/Kerberos. The comments clearly explain the rationale for each setting.
tests/internal/parquet.c (1)
74-103: Comprehensive test coverage with good structure.The test suite is well-organized into logical categories with good coverage of edge cases, type conversions, and error handling. The consistent pattern of setup/execute/verify/cleanup is followed throughout.
CMakeLists.txt (2)
284-287: Good documentation for FLB_PARQUET_ENCODER exclusion from FLB_ALL.The comment explaining that Parquet encoder requires explicit enabling due to external dependencies is helpful for users and maintainers.
1337-1355: Parquet encoder integration is well-structured.The Parquet encoder block properly:
- Enables C++ with C++17 standard (required for Arrow/Parquet)
- Delegates discovery to
cmake/parquet.cmake, which comprehensively handles detection across platforms with three fallback methods (pkg-config, CMake Config, manual search)- Gracefully disables the feature with clear warnings if dependencies are not found
The
cmake/parquet.cmakemodule correctly exports all required variables (ARROW_FOUND,PARQUET_FOUND,ARROW_INCLUDE_DIRS,PARQUET_INCLUDE_DIRS,ARROW_LIBRARIES,PARQUET_LIBRARIES) to the parent scope, with robust platform-specific library search paths, version validation, and helpful installation instructions.src/flb_parquet_impl.cpp (4)
39-42: Forward declaration approach is reasonable.The forward declaration of
flb_msgpack_to_json_strto avoid C++ compilation conflicts with mpack.h is a pragmatic solution. Consider adding a brief comment explaining why the full header cannot be included.
650-727: Record processing logic is well-designed.The on-the-fly processing approach addresses the previous review comment about dangling pointers. The lambda properly handles:
- Non-map records by appending nulls
- Missing fields with nullable vs non-nullable handling
- Conversion failures with appropriate fallbacks
- Statistics tracking for data quality monitoring
939-951: Exception handling properly cleans up resources.The catch blocks correctly free
output_bufferif it was allocated before the exception. However, note thatoutput_bufferis only assigned on line 926, so the null check is appropriate but the variable could be initialized tonullptrfor clarity (which it is on line 586).
863-865: RecordBatch creation uses correct row count.The
record_countvariable correctly tracks the number of processed records and is used when creating the RecordBatch, ensuring consistency between the arrays and the batch metadata.
a409e9f to
37a602a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
examples/s3_output/README.md (1)
486-486: Typo already flagged in previous review.This issue was previously identified: "2020-203c" should be "2020-2030".
tests/internal/CMakeLists.txt (1)
136-141: Missing Parquet library linkage will cause linker errors.The Parquet test file is correctly added to
UNIT_TESTS_FILES, but theprepare_unit_testsfunction (lines 234-236) only links Avro libraries whenFLB_AVRO_ENCODERis enabled. There's no corresponding linkage forARROW_LIBRARIESandPARQUET_LIBRARIESwhenFLB_PARQUET_ENCODERis enabled.🔎 Proposed fix in prepare_unit_tests function
Add the following after line 236:
if(FLB_AVRO_ENCODER) target_link_libraries(${source_file_we} avro-static jansson) endif() + + if(FLB_PARQUET_ENCODER) + target_link_libraries(${source_file_we} ${ARROW_LIBRARIES} ${PARQUET_LIBRARIES}) + endif() add_test(NAME ${source_file_we}
🧹 Nitpick comments (5)
examples/s3_output/README.md (2)
465-465: Fix spacing in time reference.Line 465 uses "60min" which should be "60-minute" or "60 minutes" for consistency with similar references elsewhere in the document (e.g., "5 minutes" at line 127).
🔎 Proposed fix
- **Fallback**: `upload_timeout=60m` (acts as safety net) + **Fallback**: `upload_timeout=60m` (acts as safety net for 60 minutes)
112-112: Add language identifiers to fenced code blocks.Markdown linting expects language specifications on all code blocks for syntax highlighting and consistency. Add language identifiers to the three blocks without them:
- Line 112: Plain-text block showing a formula/logic
- Line 177: S3 file path tree structure
- Line 270: Log output example
🔎 Proposed fixes
At line 112, specify a language (or use plain):
-``` +```text Upload Trigger = upload_timeout reached OR total_file_size reached -``` +```At line 177, specify a language for the tree structure:
-``` +``` s3://s3-bucket-kafka-sink/fluent-bit-logs/ ├── putobject/app/day=YYYYMMDD/ # Case 1: Small files via PutObject └── multipart/access/day=YYYYMMDD/ # Case 2: Large files via Multipart Upload -``` +```At line 270, specify a language for log output:
-``` +```log [parquet] Data quality summary for 53206 records: [parquet] Missing non-nullable fields (defaults used): [parquet] field='level' count=52506 [parquet] field='timestamp' count=52506 -``` +```Also applies to: 177-177, 270-270
include/fluent-bit/flb_parquet.h (1)
23-25: Consider removing unused includes.The
flb_sds.hinclude appears unnecessary since the API usesvoid*andsize_trather thanflb_sds_t. Themsgpack.hinclude is also not required in the public header since input is typed asconst void*. Onlystdlib.h(forsize_t) is strictly needed.Removing unused includes reduces header dependencies and compile times.
🔎 Proposed fix
-#include <fluent-bit/flb_sds.h> -#include <msgpack.h> #include <stdlib.h>cmake/parquet.cmake (1)
222-228: Unicode symbols may not render on all terminals.The checkmarks (
✓,✗) in status messages are visually helpful but may not display correctly on terminals without UTF-8 support or in CI logs. Consider using ASCII alternatives like[OK]/[FAIL]for broader compatibility.🔎 Alternative using ASCII
if(ARROW_FOUND AND PARQUET_FOUND) - message(STATUS "✓ Arrow found: ${ARROW_LIBRARIES}") + message(STATUS "[OK] Arrow found: ${ARROW_LIBRARIES}") message(STATUS " Include dirs: ${ARROW_INCLUDE_DIRS}") - message(STATUS "✓ Parquet found: ${PARQUET_LIBRARIES}") + message(STATUS "[OK] Parquet found: ${PARQUET_LIBRARIES}") message(STATUS " Include dirs: ${PARQUET_INCLUDE_DIRS}") else() - message(STATUS "✗ Arrow/Parquet not found") + message(STATUS "[NOT FOUND] Arrow/Parquet not found")src/flb_parquet_impl.cpp (1)
51-61: Consider documenting the buffer size parameter.The
256parameter inflb_msgpack_to_json_strappears to be an initial buffer size hint. Adding a brief comment or using a named constant would improve code clarity.🔎 Suggested improvement
std::string msgpack_object_to_json_string(const msgpack_object* obj) { /* Use Fluent Bit's built-in msgpack to JSON converter */ - char *json_str = flb_msgpack_to_json_str(256, obj, FLB_FALSE); + /* Initial buffer size hint for JSON conversion */ + char *json_str = flb_msgpack_to_json_str(256, obj, FLB_FALSE); if (!json_str) { return "{}";
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (25)
.github/workflows/unit-tests.yamlCMakeLists.txtcmake/kafka.cmakecmake/parquet.cmakeexamples/s3_output/README.mdexamples/s3_output/athena-queries.sqlexamples/s3_output/fluent-bit-s3-parquet.confexamples/s3_output/run-s3-parquet-test.shinclude/fluent-bit/aws/flb_aws_compress.hinclude/fluent-bit/flb_parquet.hlib/avro/CMakeLists.txtlib/jansson-e23f558/CMakeLists.txtplugins/out_kinesis_firehose/firehose.cplugins/out_s3/s3.cplugins/out_s3/s3.hsrc/CMakeLists.txtsrc/aws/CMakeLists.txtsrc/aws/compression/CMakeLists.txtsrc/aws/compression/arrow/CMakeLists.txtsrc/aws/compression/arrow/compress.csrc/aws/compression/arrow/compress.hsrc/aws/flb_aws_compress.csrc/flb_parquet_impl.cpptests/internal/CMakeLists.txttests/internal/parquet.c
💤 Files with no reviewable changes (5)
- src/aws/CMakeLists.txt
- src/aws/compression/arrow/CMakeLists.txt
- src/aws/compression/arrow/compress.c
- src/aws/compression/arrow/compress.h
- src/aws/compression/CMakeLists.txt
🚧 Files skipped from review as they are similar to previous changes (6)
- examples/s3_output/fluent-bit-s3-parquet.conf
- .github/workflows/unit-tests.yaml
- examples/s3_output/athena-queries.sql
- plugins/out_kinesis_firehose/firehose.c
- src/CMakeLists.txt
- examples/s3_output/run-s3-parquet-test.sh
🧰 Additional context used
🧠 Learnings (13)
📓 Common learnings
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.
Applied to files:
include/fluent-bit/flb_parquet.hexamples/s3_output/README.md
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
src/aws/flb_aws_compress.cinclude/fluent-bit/aws/flb_aws_compress.hcmake/kafka.cmakeplugins/out_s3/s3.cCMakeLists.txt
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Applied to files:
src/aws/flb_aws_compress.cinclude/fluent-bit/aws/flb_aws_compress.hsrc/flb_parquet_impl.cppplugins/out_s3/s3.cCMakeLists.txt
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.
Applied to files:
src/aws/flb_aws_compress.cinclude/fluent-bit/aws/flb_aws_compress.hplugins/out_s3/s3.cCMakeLists.txt
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.
Applied to files:
src/aws/flb_aws_compress.cinclude/fluent-bit/aws/flb_aws_compress.hplugins/out_s3/s3.cCMakeLists.txt
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.
Applied to files:
src/aws/flb_aws_compress.cinclude/fluent-bit/aws/flb_aws_compress.hplugins/out_s3/s3.cCMakeLists.txt
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.
Applied to files:
src/aws/flb_aws_compress.cCMakeLists.txt
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.
Applied to files:
cmake/kafka.cmakeCMakeLists.txtcmake/parquet.cmake
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
cmake/kafka.cmakeCMakeLists.txtcmake/parquet.cmake
📚 Learning: 2025-09-04T12:35:22.872Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:1339-1344
Timestamp: 2025-09-04T12:35:22.872Z
Learning: In the Fluent Bit S3 plugin, the user prefers to maintain current retry_limit behavior without special handling for FLB_OUT_RETRY_UNLIMITED (-1), as there's no documentation indicating -1 should be used for infinite retries and consistency with current logic is preferred.
Applied to files:
plugins/out_s3/s3.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.
Applied to files:
CMakeLists.txt
📚 Learning: 2025-08-07T10:15:46.187Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10691
File: .github/workflows/unit-tests.yaml:94-101
Timestamp: 2025-08-07T10:15:46.187Z
Learning: Apache Arrow latest packages have changing checksums, making hardcoded checksum verification impractical for CI workflows that download the latest version.
Applied to files:
cmake/parquet.cmake
🧬 Code graph analysis (6)
plugins/out_s3/s3.h (1)
plugins/out_s3/s3.c (1)
s3_put_object(1663-1800)
include/fluent-bit/flb_parquet.h (1)
src/flb_parquet_impl.cpp (2)
flb_msgpack_raw_to_parquet(579-952)flb_msgpack_raw_to_parquet(579-582)
src/aws/flb_aws_compress.c (2)
src/flb_gzip.c (1)
flb_gzip_compress(157-252)src/flb_zstd.c (1)
flb_zstd_compress(33-57)
src/flb_parquet_impl.cpp (1)
include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
plugins/out_s3/s3.c (5)
src/aws/flb_aws_compress.c (2)
flb_aws_compression_get_type(80-97)flb_aws_compression_compress(99-116)src/flb_sds.c (2)
flb_sds_create_len(58-76)flb_sds_destroy(389-399)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)plugins/out_s3/s3_multipart.c (1)
upload_part(685-799)src/flb_pack.c (1)
flb_pack_msgpack_to_json_format(1169-1450)
tests/internal/parquet.c (2)
src/flb_parquet_impl.cpp (2)
flb_msgpack_raw_to_parquet(579-952)flb_msgpack_raw_to_parquet(579-582)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
🪛 Cppcheck (2.19.0)
plugins/out_s3/s3.c
[error] 791-791: failed to expand 'flb_plg_error', it is invalid to use a preprocessor directive as macro parameter
(syntaxError)
🪛 LanguageTool
examples/s3_output/README.md
[grammar] ~56-~56: Ensure spelling is correct
Context: ..., 201, 404, 500, 401) - Response times (2ms to 5000ms) - Client information (IP, us...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~465-~465: Ensure spelling is correct
Context: ...les uploaded when reaching 50MB (before 60min timeout) Verification Steps: 1. R...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🪛 markdownlint-cli2 (0.18.1)
examples/s3_output/README.md
112-112: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
177-177: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
270-270: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (23)
examples/s3_output/README.md (1)
1-512: Excellent comprehensive documentation for S3 Parquet testing.This new README provides a thorough, well-structured guide covering realistic test scenarios (two input sources, two output configurations), upload trigger mechanics, schema design best practices, data quality considerations, and Athena integration. The pedagogical tone helps users understand why decisions matter (e.g., nullable vs non-nullable fields, PutObject vs Multipart). The inclusion of expected behavior verification steps and troubleshooting guidance adds practical value. This fulfills the
docs-requiredlabel requirement.src/aws/flb_aws_compress.c (3)
31-37: LGTM!The wrapper function correctly adapts the
flb_snappy_compresssignature to match the generic compression interface. The pointer casts are safe and the return value is properly propagated.
66-70: The FLB_AWS_COMPRESS_SNAPPY constant is properly defined.The constant is defined in
include/fluent-bit/aws/flb_aws_compress.h:30with value2, and theflb_snappy_compress_wrapperfunction is correctly defined insrc/aws/flb_aws_compress.c:32. The array entry is correctly integrated.
26-26: No action needed. The include statement is correct as written.Snappy is a core bundled dependency (located at
lib/snappy-fef67ac/) built unconditionally in CMakeLists.txt. Unlike Arrow/Parquet which required#ifdef FLB_HAVE_ARROWguards as optional components, Snappy has noFLB_HAVE_SNAPPYconditional guards anywhere in the codebase and is included unconditionally across 12 existing files (http_server, flb_snappy.c, multiple plugins, etc.). The include at line 26 follows the established pattern.lib/avro/CMakeLists.txt (1)
19-19: CMake version bump looks appropriate.Bumping to CMake 3.12 aligns with the broader build system changes in this PR and enables modern CMake features. This is a reasonable minimum version for 2024+ projects.
lib/jansson-e23f558/CMakeLists.txt (2)
1-1: CMake version bump consistent with the PR.Aligns with the 3.12 minimum version used across other CMake files in this PR.
567-569: Reasonable approach to avoid target name collisions.Disabling the
checktarget prevents conflicts with other libraries (e.g., nghttp2) that may define the same target name. The comment adequately explains the rationale.plugins/out_s3/s3.h (2)
195-203: Clean addition of format support.The new
formatfield andschema_strpointer cleanly separate output format concerns from compression. TheFLB_S3_FORMAT_JSON/FLB_S3_FORMAT_PARQUETconstants follow a clear naming convention consistent with Fluent Bit patterns.
233-234: Function prototype aligns with SDS-based buffer handling.The use of
flb_sds_t bufferis consistent with Fluent Bit's string handling patterns. Note that the parameter names differ slightly from the implementation in s3.c (create_timevsfile_first_log_time,buffervsbody), but this is cosmetic and doesn't affect functionality.include/fluent-bit/flb_parquet.h (1)
27-85: Excellent documentation and clean API design.The header provides comprehensive documentation including:
- Clear parameter descriptions
- Schema format specification with supported types
- Usage examples
- Memory ownership semantics (caller frees with
flb_free)This level of documentation is valuable for a public API.
cmake/parquet.cmake (1)
269-278: Variable exports look correct.All discovered variables are properly exported to the parent scope, including conditional export of
ARROW_VERSION. This ensures the discovery results are available to the main CMakeLists.txt.cmake/kafka.cmake (2)
9-48: Clean simplification of SASL configuration.The refactoring correctly recognizes that librdkafka has built-in support for SASL/PLAIN, SASL/SCRAM, and SASL/OAUTHBEARER without external dependencies. Only SASL/GSSAPI (Kerberos) requires cyrus-sasl.
Key improvements:
FLB_SASL_ENABLEDis always ON (correct for built-in mechanisms)FLB_SASL_OAUTHBEARER_ENABLEDis always ON (built into librdkafka)FLB_KAFKA_MSK_IAM_ENABLEDis always ON (depends on OAuth Bearer)WITH_SASL_CYRUSis conditional on finding cyrus-sasl (for Kerberos only)This removes unnecessary complexity while maintaining full functionality.
77-81: Helpful feature summary for build diagnostics.The summary block clearly shows which Kafka features are enabled, aiding troubleshooting of build configurations.
include/fluent-bit/aws/flb_aws_compress.h (1)
24-46: Well-documented enum reorganization with clear deprecation guidance.The separation of compression algorithms (NONE=0, GZIP=1, SNAPPY=2, ZSTD=3) from format conversions (ARROW=4, PARQUET=5) improves semantic clarity. The deprecation comments provide clear migration paths.
The codebase exclusively uses symbolic constants (
FLB_AWS_COMPRESS_*) rather than hardcoded numeric values, so the enum value changes are transparent to all callers and pose no breaking change risk.Likely an incorrect or invalid review comment.
CMakeLists.txt (2)
1091-1111: Well-designed Homebrew libyaml detection for macOS.The automatic detection of libyaml via Homebrew (
brew --prefix libyaml) is a helpful developer experience improvement for macOS users. The graceful fallback sequence (pkg-config → Homebrew → compile test) ensures broad compatibility.
1337-1355: Parquet encoder integration looks correct.The Parquet encoder setup properly:
- Enables C++ and sets C++17 standard (addressing past review feedback)
- Includes cmake/parquet.cmake for library detection
- Guards all configuration behind
FLB_PARQUET_ENCODER- Provides clear warnings when dependencies are missing
tests/internal/parquet.c (2)
33-66: Excellent test organization and coverage.The test suite provides comprehensive coverage across 38 tests in 9 well-documented categories. The structured organization with clear category headers makes the test suite maintainable and easy to navigate.
74-103: Consistent and correct test pattern.All tests follow a clean pattern: construct msgpack input, invoke the conversion function, assert success, and cleanup resources. The consistent use of
flb_freefor output buffers ensures no memory leaks.src/flb_parquet_impl.cpp (3)
649-727: Immediate record processing correctly avoids dangling pointers.Processing each record immediately during the msgpack unpack loop (rather than storing
msgpack_object*pointers for later) correctly addresses the lifetime issue flagged in previous reviews. Themsgpack_unpackedzone is reused on each iteration, so this approach ensures data integrity.
579-952: Robust implementation with comprehensive error handling.The main conversion function demonstrates solid engineering:
- Input validation and compression mapping
- Proper Arrow/Parquet API usage with status checking
- Immediate record processing to avoid lifetime issues
- Statistics tracking for data quality monitoring
- Clean exception handling with resource cleanup
- Correct memory management between Arrow and Fluent Bit allocators
775-847: Fix missing newlines in debug log statements.All
flb_debugcalls in the statistics output section are missing trailing\ncharacters, which will cause log messages to run together incorrectly. Eachflb_debugcall should end with a newline.🔎 Proposed fix for debug log formatting
if (has_issues) { - flb_debug("[parquet] Data quality summary for %zu records:", record_count); + flb_debug("[parquet] Data quality summary for %zu records:\n", record_count); if (!converter.complex_to_string_by_field.empty()) { - flb_debug("[parquet] Complex types serialized to JSON string:"); + flb_debug("[parquet] Complex types serialized to JSON string:\n"); for (const auto& pair : converter.complex_to_string_by_field) { - flb_debug("[parquet] field='%s' count=%zu", - pair.first.c_str(), pair.second); + flb_debug("[parquet] field='%s' count=%zu\n", + pair.first.c_str(), pair.second); } } if (!converter.string_parsed_to_int_by_field.empty()) { - flb_debug("[parquet] Strings parsed to integers:"); + flb_debug("[parquet] Strings parsed to integers:\n"); for (const auto& pair : converter.string_parsed_to_int_by_field) { - flb_debug("[parquet] field='%s' count=%zu", - pair.first.c_str(), pair.second); + flb_debug("[parquet] field='%s' count=%zu\n", + pair.first.c_str(), pair.second); } } /* Apply similar fixes to remaining flb_debug calls in this section */Apply this pattern to all remaining
flb_debugcalls through line 847.Likely an incorrect or invalid review comment.
plugins/out_s3/s3.c (2)
744-824: Verify legacy Parquet compression behavior.The legacy
compression=arrow/parquetpath setsformat=parquetbut leavesctx->compression = FLB_AWS_COMPRESS_NONE(initialized at line 745). A previous review comment suggested this should default to GZIP for backward compatibility. The comment at line 764 says "Keep default no compression," but if the old Arrow/Parquet path applied GZIP by default, this represents a breaking change.Please verify whether the legacy compression path should default to GZIP when
format=parquetis set via the deprecatedcompression=arrow/parquetvalues.Based on the previous review comment at lines 744-824.
42-44: LGTM: Parquet integration is well-implemented.The Parquet format support is thoughtfully designed:
Clean separation: Parquet uses internal compression (passed to
flb_msgpack_raw_to_parquet), while outer compression is correctly skipped for Parquet format throughout the codebase.Efficient buffering: The architectural change to store raw msgpack data (lines 3953-3962) instead of pre-converting to JSON is a smart optimization that:
- Avoids repeated conversions when buffering
- Enables efficient batching for columnar formats
- Defers format conversion until upload time (line 1598)
Proper memory management: All SDS buffers are consistently managed with
flb_sds_create_lenandflb_sds_destroy. Arrow-allocated Parquet buffers are correctly freed withflb_free(line 1617) after copying to SDS.Good error handling: Schema validation (lines 1588-1596), conversion error checking (lines 1605-1613), and cleanup on all error paths are properly implemented.
Also applies to: 1520-1661, 3953-3962
37a602a to
7f32dd4
Compare
f52b618 to
b5a7522
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 11
🤖 Fix all issues with AI agents
In @plugins/out_s3/s3_auth.c:
- Around line 380-385: The allocated buffer full_url created via
flb_sds_create_size can leak if flb_sds_printf fails; update the failure path
after calling flb_sds_printf to destroy full_url (and url_path) before returning
-1. Specifically, in the block handling flb_sds_printf returning NULL, call
flb_sds_destroy(full_url) and flb_sds_destroy(url_path) and then return -1 so
the original allocation is freed; keep the rest of the logic around
flb_sds_printf, full_url, and url_path unchanged.
In @plugins/out_s3/s3_queue.h:
- Around line 89-97: The comment above s3_queue_recover_all lists phases as
"Phase 1, Phase 2, Phase 3" but the implementation uses phases numbered 0, 1, 2;
update the documentation in the block comment to use the same zero-based
numbering as the function (Phase 0: Cleanup..., Phase 1: State transitions...,
Phase 2: Queue rebuild...) or alternatively change the implementation's phase
constants to 1..3 to match the prose — ensure the numbering in the comment and
in s3_queue_recover_all are consistent and reference the same phase identifiers.
In @plugins/out_s3/s3_store.c:
- Around line 360-383: The stream name formats differ between branches; make
them consistent by using the Windows-safe format with hyphens for time
separators ("%Y-%m-%dT%H-%M-%S") in both the FLB_SYSTEM_WINDOWS and POSIX code
paths inside create_stream_name, updating the strftime call in the POSIX branch
(and leave the existing Windows branch format) so generated stream names are
identical across platforms while still avoiding colons on Windows.
In @plugins/out_s3/s3_stream.c:
- Around line 443-458: When allocation or initialization fails after partially
setting up the stream context (e.g., failure of flb_malloc for ctx->read_buffer
or msgpack_unpacker_init), ensure you clean up any already-created resources
(free ctx->read_buffer, unlink/free temp_path, close temp_fp and msgpack_fp if
opened) and deinitialize msgpack state; call or emulate stream_context_destroy
behavior before returning error and correctly set ctx->unpacker_initialized and
ctx->result_initialized only when those inits succeed so the cleanup path knows
what to tear down.
In @src/aws/flb_aws_compress.c:
- Around line 32-43: The comment above flb_snappy_compress_wrapper incorrectly
cites "RFC 7093"; update the comment to remove the RFC reference and instead
reference the correct Snappy framing specification used by
flb_snappy_compress_framed_data (e.g., "Snappy framing format as documented by
Google / framing_format.txt"). Ensure the comment still explains that the framed
format supports streaming/concatenation and is safe for chunked compression, but
replace "RFC 7093" with the correct Snappy framing spec reference and wording.
In @src/flb_snappy.c:
- Around line 119-131: The comment block above FLB_SNAPPY_MAX_BLOCK_SIZE
incorrectly cites "RFC 7093" as the Snappy framing format; update the header
comment in flb_snappy.c to reference the correct spec (Google's Snappy framing
format / Snappy framing format documentation) and remove or replace the RFC 7093
mention, keeping the rest of the description (stream identifier, frame layout,
max block size) intact so the comment accurately documents the source and
format.
In @tests/include/s3_test_helpers.h:
- Around line 340-365: The s3_test_is_safe_temp_path function fails to accept
Windows-style paths because it only checks for '/' after the temp dir prefix and
in fallback prefixes; update the checks in s3_test_is_safe_temp_path to also
accept backslash ('\\') as a separator (and consider the case where the prefix
is the entire path without a trailing separator) and extend the fallback
comparisons for "/tmp/" and "/var/tmp/" to also match "\\tmp\\" and
"\\var\\tmp\\" (and their variants without trailing separators) so Windows temp
paths are treated as safe.
In @tests/internal/parquet.c:
- Line 1119: Remove the unused local variable `ret` from the test function
`test_destructive_invalid_compression`; locate the declaration `int ret;` in
that function and delete it (or if it was intended to capture a return value,
replace its usage with direct assertions or use the appropriate variable),
ensuring the function compiles without unused variable warnings.
- Around line 54-57: The local duplicate defines for FLB_AWS_COMPRESS_NONE,
FLB_AWS_COMPRESS_GZIP, FLB_AWS_COMPRESS_SNAPPY, and FLB_AWS_COMPRESS_ZSTD should
be removed and the shared definitions used instead: add an #include
<fluent-bit/aws/flb_aws_compress.h> to the top of the file and delete the local
#define lines so the code uses the canonical constants from flb_aws_compress.h.
🧹 Nitpick comments (18)
src/flb_blob_db.c (1)
308-341: Schema migration handles the common case well but could be more robust.The migration logic correctly detects and ignores the "duplicate column name" error for existing databases. However, the error message string comparison is locale-dependent and could fail on SQLite builds with different language settings.
Consider using a more robust approach that queries
PRAGMA table_info(blob_files)first to check if the column exists, avoiding reliance on error message text.♻️ Alternative approach using PRAGMA table_info
/* * Schema upgrade for existing databases: add s3_key column if not exists. - * This ensures backward compatibility with databases created before the s3_key - * column was added. We only ignore the "duplicate column name" error; - * other errors (permissions, disk space, etc.) are propagated. */ { char *errmsg = NULL; + int column_exists = 0; + sqlite3_stmt *stmt = NULL; + + /* Check if column already exists */ + result = sqlite3_prepare_v2(db->handler, + "PRAGMA table_info(blob_files);", + -1, &stmt, NULL); + if (result == SQLITE_OK) { + while (sqlite3_step(stmt) == SQLITE_ROW) { + const char *col_name = (const char *)sqlite3_column_text(stmt, 1); + if (col_name && strcmp(col_name, "s3_key") == 0) { + column_exists = 1; + break; + } + } + sqlite3_finalize(stmt); + } + + if (!column_exists) { result = sqlite3_exec(db->handler, "ALTER TABLE blob_files ADD COLUMN s3_key TEXT DEFAULT '';", NULL, NULL, &errmsg); - if (result != SQLITE_OK) { - /* - * SQLITE_ERROR with "duplicate column name" is expected when - * the column already exists - this is the success case for upgrades. - * Any other error indicates a real problem. - */ - if (result == SQLITE_ERROR && errmsg != NULL && - strstr(errmsg, "duplicate column name") != NULL) { - /* Column already exists - this is expected, ignore */ - } - else { - /* Real error - clean up and return */ + if (result != SQLITE_OK) { if (errmsg != NULL) { sqlite3_free(errmsg); } flb_sqldb_close(db); return FLB_BLOB_DB_ERROR_FILE_TABLE_CREATION; } + if (errmsg != NULL) { + sqlite3_free(errmsg); + } } - if (errmsg != NULL) { - sqlite3_free(errmsg); - } }tests/include/aws_client_mock.c (1)
296-301: Redundant condition in macro expansion.The condition
&& response_config->config_parameter != FLB_AWS_CLIENT_MOCK_SET_DATAon line 298 is redundant because thiselse ifblock will only be reached if none of the previous conditions matched, andFLB_AWS_CLIENT_MOCK_SET_DATAis already handled in the precedingelse ifblock (lines 260-289).While this doesn't cause incorrect behavior, it adds unnecessary runtime overhead.
Suggested fix
#define EXPAND_CLIENT_RESPONSE_PARAMETER(lower, UPPER, type) \ - else if (response_config->config_parameter == FLB_AWS_CLIENT_MOCK_SET_##UPPER \ - && response_config->config_parameter != FLB_AWS_CLIENT_MOCK_SET_DATA) \ + else if (response_config->config_parameter == FLB_AWS_CLIENT_MOCK_SET_##UPPER) \ { \ c->resp.lower = CONVERT_##type((char *)val1); \ }Note: If the X-macro definition file includes a
DATAentry that would generateFLB_AWS_CLIENT_MOCK_SET_DATA, the condition may be intentional to prevent the macro from applying to DATA. In that case, keep the condition but add a comment explaining why.tests/include/s3_test_helpers.h (1)
407-420: Buffer size may be insufficient for long Windows paths.The buffer size of 300 bytes may be too small for Windows paths, which can exceed 260 characters with long temp directory paths plus the suffix. Consider using a larger buffer or
MAX_PATHon Windows.Suggested fix
static inline void s3_test_cleanup_temp_db(const char *db_path) { - char buf[300]; + char buf[512]; /* Match template_path size in other functions */ if (!db_path) { return; }tests/runtime/out_s3_edge_cases.c (1)
118-123: Redundant assertion after teardown.The
TEST_CHECK(ret == 0)at line 123 re-checksretwhich was already verified at line 112. Sinceflb_stop()andflb_destroy()don't modifyret, this check is misleading—it doesn't verify that teardown succeeded.This pattern repeats throughout the file (lines 210, 294, 377, 463, 548, 636, 719, 805, 891, 976, 1057, 1123). While not a bug, these could be removed or replaced with meaningful assertions about cleanup success.
tests/internal/fuzzers/aws_util_fuzzer.c (1)
72-76: API call updated correctly for new signature.Passing
NULLfor the newfile_pathparameter is appropriate since this fuzzer focuses on format string, tag, and delimiter parsing. The implementation correctly handlesNULLby skipping file path substitutions.Consider extending fuzzer coverage to also test the
file_pathparameter with fuzzer-generated paths to ensure robustness of$FILE_PATHand$FILE_NAMEsubstitutions.tests/internal/parquet.c (1)
916-927: Useflb_malloc/flb_freefor consistency.Fluent Bit codebase conventions use
flb_mallocandflb_freewrappers for memory allocation. This allows for consistent memory tracking and potential future instrumentation.♻️ Suggested fix
- char *long_str = (char *)malloc(long_str_size); + char *long_str = (char *)flb_malloc(long_str_size);And at line 944:
- free(long_str); + flb_free(long_str);plugins/out_s3/s3_auth.c (1)
230-230: Consider using DEBUG level for successful URL retrieval.Logging at INFO level for every presigned URL retrieval could be verbose in production environments with high throughput. Consider using DEBUG level for success messages.
♻️ Suggested change
- flb_plg_info(ctx->ins, "Pre-signed URL retrieved successfully"); + flb_plg_debug(ctx->ins, "Pre-signed URL retrieved successfully");include/fluent-bit/flb_parquet.h (1)
23-25: Minor:stdlib.hmay be redundant.The
size_ttype is already available frommsgpack.horflb_sds.h. Consider removingstdlib.hunless it's needed for other definitions.tests/runtime/out_s3_queue.c (1)
133-135: Consider checkingflb_lib_pushreturn values for consistency.These calls don't check the return value, unlike other tests in the same file (e.g., lines 69-70, 198-199). While test failures would eventually be caught by mock verification, consistent error checking improves debuggability.
Suggested fix
/* Push multiple chunks to test worker distribution */ - flb_lib_push(ctx, in_ffd, (char *)JSON_TD, sizeof(JSON_TD) - 1); - flb_lib_push(ctx, in_ffd, (char *)JSON_TD, sizeof(JSON_TD) - 1); - flb_lib_push(ctx, in_ffd, (char *)JSON_TD, sizeof(JSON_TD) - 1); + ret = flb_lib_push(ctx, in_ffd, (char *)JSON_TD, sizeof(JSON_TD) - 1); + TEST_CHECK(ret >= 0); + ret = flb_lib_push(ctx, in_ffd, (char *)JSON_TD, sizeof(JSON_TD) - 1); + TEST_CHECK(ret >= 0); + ret = flb_lib_push(ctx, in_ffd, (char *)JSON_TD, sizeof(JSON_TD) - 1); + TEST_CHECK(ret >= 0);plugins/out_s3/s3_blob.c (1)
429-435: Minor indentation issue.The
flb_plg_erroron line 430 has inconsistent indentation.Suggested fix
ret = flb_blob_db_file_insert(&ctx->blob_db, event_chunk->tag, source, ctx->endpoint, file_path, file_size); if (ret == -1) { - flb_plg_error(ctx->ins, "cannot insert blob file: %s (size=%zu)", - file_path, file_size); + flb_plg_error(ctx->ins, "cannot insert blob file: %s (size=%zu)", + file_path, file_size); cfl_sds_destroy(file_path);src/aws/flb_aws_util.c (1)
1063-1071: Minor optimization opportunity.The
strlen(tag_delimiter)is called on each loop iteration. While not a performance concern for typical use, hoisting it outside the loop would be cleaner.Suggested optimization
+ size_t delimiter_len = strlen(tag_delimiter); /* Check if delimiter(s) specified exists in the tag. */ - for (i = 0; i < strlen(tag_delimiter); i++) { + for (i = 0; i < delimiter_len; i++) { if (strchr(tag, tag_delimiter[i])) { ret = 1; break; } }src/flb_parquet_impl.cpp (1)
913-921: Consider documenting whytotal_file_sizeis unused.The parameter is silenced with
(void)total_file_size, but it's unclear why it's part of the API signature if unused. Consider adding a brief comment explaining the intended future use or removing the parameter if not needed.plugins/out_s3/s3_stream.c (1)
81-123: Variablecurrent_offsetis unused.The variable
current_offsetis incremented but never actually used for any logic. Theremainingvariable already tracks the bytes left to process.🔧 Suggested fix
- current_offset = offset_start; remaining = (offset_end > 0) ? (offset_end - offset_start) : -1; while (1) { @@ -113,9 +112,7 @@ compressed_chunk = NULL; if (remaining > 0) { remaining -= bytes_read; - current_offset += bytes_read; - if (remaining <= 0 || current_offset >= offset_end) { + if (remaining <= 0) { break; } }plugins/out_s3/s3_store.c (1)
244-272: Potential integer underflow in size calculation.In
calculate_chunk_data_size, the subtraction on line 271 could theoretically underflow ifst.st_sizeis less thanCIO_FILE_HEADER_MIN + meta_sizebut greater thanCIO_FILE_HEADER_MIN + meta_sizeindividually (unlikely given the check on line 267, but the types involved are mixed signed/unsigned). The check on line 267 guards against this, but the comparison mixesoff_t(signed) withsize_t(unsigned).🔧 Suggested fix for type safety
- if (st.st_size <= CIO_FILE_HEADER_MIN + meta_size) { + if (st.st_size < 0 || (size_t)st.st_size <= CIO_FILE_HEADER_MIN + meta_size) { return 0; } - return st.st_size - CIO_FILE_HEADER_MIN - meta_size; + return (size_t)st.st_size - CIO_FILE_HEADER_MIN - meta_size;plugins/out_s3/s3_queue.c (1)
343-349: Zero-size file check could indicate real issues.The
is_queue_entry_validfunction rejects entries whereupload_file->size <= 0(line 343), but this check uses<=which would also reject size 0. A zero-size file might be a valid edge case (empty log file), or it might indicate a bug. Consider whether this should log at error level rather than warn.plugins/out_s3/s3_multipart.c (2)
1166-1187: Exponential backoff has overly complex overflow handling.The backoff calculation logic is hard to follow. The check
if (shift_exp > 30)on line 1174 after computingshift_exp = 1ULL << (part_attempt - 1)is confusing becauseshift_expwill already be a large value, not> 30. The intent seems to be capping at 30 seconds, but the logic is convoluted.🔧 Simplified backoff calculation
/* Apply exponential backoff before retry */ if (part_attempt < max_attempts) { - uint64_t shift_exp; - /* Prevent overflow in shift operation */ - if (part_attempt - 1 >= 30) { - backoff_ms = 30000; /* Max out immediately if exponent too large */ - } - else { - shift_exp = 1ULL << (part_attempt - 1); - if (shift_exp > 30) { - backoff_ms = 30000; - } - else { - backoff_ms = 1000 * (int)shift_exp; - if (backoff_ms > 30000) { - backoff_ms = 30000; /* Cap at 30 seconds */ - } - } - } + /* Exponential backoff: 1s, 2s, 4s, 8s, ... capped at 30s */ + int exponent = (part_attempt - 1 < 5) ? part_attempt - 1 : 5; + backoff_ms = 1000 * (1 << exponent); + if (backoff_ms > 30000) { + backoff_ms = 30000; + } flb_plg_info(ctx->ins, "Retrying part upload after %d ms", backoff_ms); sleep_ms(backoff_ms); }
1256-1262: Size mismatch check uses mixed types.The comparison
m_upload->bytes != (size_t)file_stat.st_sizecomparessize_twith a cast ofoff_t. On platforms whereoff_tis 64-bit andsize_tis 32-bit, this could truncate large file sizes. However, since this runs after successfully uploading all parts, the practical impact is minimal.plugins/out_s3/s3.h (1)
180-181: Format constants should be an enum for type safety.Using
#definefor format constants works but anenumwould provide better type safety and IDE support.🔧 Suggested improvement
-#define FLB_S3_FORMAT_JSON 0 -#define FLB_S3_FORMAT_PARQUET 1 +enum flb_s3_format { + FLB_S3_FORMAT_JSON = 0, + FLB_S3_FORMAT_PARQUET = 1 +};
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (51)
.github/workflows/unit-tests.yamlCMakeLists.txtinclude/fluent-bit/aws/flb_aws_compress.hinclude/fluent-bit/flb_aws_util.hinclude/fluent-bit/flb_blob_db.hinclude/fluent-bit/flb_parquet.hinclude/fluent-bit/flb_snappy.hplugins/in_blob/blob_file.cplugins/out_kinesis_firehose/firehose.cplugins/out_s3/CMakeLists.txtplugins/out_s3/s3.cplugins/out_s3/s3.hplugins/out_s3/s3_auth.cplugins/out_s3/s3_auth.hplugins/out_s3/s3_blob.cplugins/out_s3/s3_blob.hplugins/out_s3/s3_multipart.cplugins/out_s3/s3_multipart.hplugins/out_s3/s3_queue.cplugins/out_s3/s3_queue.hplugins/out_s3/s3_store.cplugins/out_s3/s3_store.hplugins/out_s3/s3_stream.cplugins/out_s3/s3_stream.hsrc/CMakeLists.txtsrc/aws/CMakeLists.txtsrc/aws/compression/CMakeLists.txtsrc/aws/flb_aws_compress.csrc/aws/flb_aws_util.csrc/flb_blob_db.csrc/flb_parquet_impl.cppsrc/flb_signv4.csrc/flb_snappy.ctests/include/aws_client_mock.ctests/include/aws_client_mock.htests/include/s3_test_helpers.htests/internal/CMakeLists.txttests/internal/aws_compress.ctests/internal/aws_util.ctests/internal/fuzzers/aws_util_fuzzer.ctests/internal/parquet.ctests/runtime/CMakeLists.txttests/runtime/out_s3.ctests/runtime/out_s3_config.ctests/runtime/out_s3_edge_cases.ctests/runtime/out_s3_error_handling.ctests/runtime/out_s3_format.ctests/runtime/out_s3_multipart.ctests/runtime/out_s3_queue.ctests/runtime/out_s3_recovery.ctests/runtime/out_s3_special_features.c
💤 Files with no reviewable changes (3)
- src/aws/compression/CMakeLists.txt
- tests/runtime/out_s3.c
- src/aws/CMakeLists.txt
🚧 Files skipped from review as they are similar to previous changes (7)
- src/flb_signv4.c
- include/fluent-bit/flb_snappy.h
- include/fluent-bit/aws/flb_aws_compress.h
- plugins/in_blob/blob_file.c
- tests/runtime/out_s3_config.c
- plugins/out_s3/CMakeLists.txt
- plugins/out_s3/s3_blob.h
🧰 Additional context used
🧠 Learnings (18)
📚 Learning: 2025-08-07T10:15:46.187Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10691
File: .github/workflows/unit-tests.yaml:94-101
Timestamp: 2025-08-07T10:15:46.187Z
Learning: Apache Arrow latest packages have changing checksums, making hardcoded checksum verification impractical for CI workflows that download the latest version.
Applied to files:
.github/workflows/unit-tests.yamlsrc/flb_parquet_impl.cpp
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
.github/workflows/unit-tests.yamlCMakeLists.txttests/internal/aws_compress.csrc/flb_parquet_impl.cppsrc/CMakeLists.txttests/internal/CMakeLists.txt
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.
Applied to files:
.github/workflows/unit-tests.yamlCMakeLists.txtsrc/CMakeLists.txt
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
.github/workflows/unit-tests.yamlCMakeLists.txtsrc/aws/flb_aws_compress.csrc/flb_snappy.cplugins/out_kinesis_firehose/firehose.ctests/internal/aws_compress.cplugins/out_s3/s3_stream.csrc/flb_parquet_impl.cppsrc/CMakeLists.txt
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Applied to files:
.github/workflows/unit-tests.yamlCMakeLists.txtsrc/aws/flb_aws_compress.cplugins/out_kinesis_firehose/firehose.ctests/internal/aws_compress.csrc/flb_parquet_impl.cppsrc/CMakeLists.txt
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.
Applied to files:
.github/workflows/unit-tests.yamlCMakeLists.txtsrc/aws/flb_aws_compress.csrc/flb_snappy.cplugins/out_kinesis_firehose/firehose.ctests/internal/aws_compress.cplugins/out_s3/s3_stream.csrc/CMakeLists.txt
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.
Applied to files:
.github/workflows/unit-tests.yamlCMakeLists.txtsrc/aws/flb_aws_compress.csrc/flb_snappy.cplugins/out_kinesis_firehose/firehose.ctests/internal/aws_compress.cplugins/out_s3/s3_stream.csrc/CMakeLists.txt
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.
Applied to files:
.github/workflows/unit-tests.yamlCMakeLists.txtsrc/aws/flb_aws_compress.ctests/internal/aws_compress.csrc/flb_parquet_impl.cppsrc/CMakeLists.txt
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.
Applied to files:
.github/workflows/unit-tests.yamlCMakeLists.txttests/internal/aws_compress.cplugins/out_s3/s3_store.csrc/flb_parquet_impl.cppsrc/CMakeLists.txttests/include/s3_test_helpers.h
📚 Learning: 2025-09-04T12:35:22.872Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:1339-1344
Timestamp: 2025-09-04T12:35:22.872Z
Learning: In the Fluent Bit S3 plugin, the user prefers to maintain current retry_limit behavior without special handling for FLB_OUT_RETRY_UNLIMITED (-1), as there's no documentation indicating -1 should be used for infinite retries and consistency with current logic is preferred.
Applied to files:
plugins/out_s3/s3_auth.cplugins/out_s3/s3_multipart.cplugins/out_s3/s3_queue.c
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.
Applied to files:
include/fluent-bit/flb_aws_util.hinclude/fluent-bit/flb_parquet.h
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.
Applied to files:
CMakeLists.txtsrc/aws/flb_aws_compress.csrc/flb_snappy.ctests/internal/aws_compress.csrc/CMakeLists.txt
📚 Learning: 2025-09-04T12:35:36.904Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:3275-3282
Timestamp: 2025-09-04T12:35:36.904Z
Learning: The out_s3 plugin intentionally uses a simple numeric comparison for retry_limit (chunk->failures >= ctx->ins->retry_limit) rather than the standard Fluent Bit pattern that checks for FLB_OUT_RETRY_UNLIMITED (-1). The maintainer wants to keep this current behavior for consistency within the plugin.
Applied to files:
plugins/out_s3/s3_multipart.cplugins/out_s3/s3_queue.cplugins/out_s3/s3_stream.c
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
Applied to files:
plugins/out_s3/s3_stream.csrc/aws/flb_aws_util.c
📚 Learning: 2025-09-04T07:28:37.083Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10832
File: src/wasm/CMakeLists.txt:112-131
Timestamp: 2025-09-04T07:28:37.083Z
Learning: In fluent-bit CMake files, the user cosmo0920 prefers treating Git as a command rather than a package, emphasizing that Git is not a pkg-config retrievable package but just a command.
Applied to files:
src/flb_parquet_impl.cppsrc/CMakeLists.txt
📚 Learning: 2025-12-22T05:39:02.291Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11250
File: lib/ripser-1.2.1/Makefile:1-18
Timestamp: 2025-12-22T05:39:02.291Z
Learning: In fluent/fluent-bit, the Makefile at lib/ripser-1.2.1/Makefile is imported from upstream Ripser and is not used in the actual build process. The project uses CMake for building (lib/ripser-1.2.1/CMakeLists.txt), so changes to the imported Makefile are not necessary.
Applied to files:
src/CMakeLists.txt
📚 Learning: 2025-09-14T09:46:09.531Z
Learnt from: aminvakil
Repo: fluent/fluent-bit PR: 10844
File: conf/fluent-bit:13-15
Timestamp: 2025-09-14T09:46:09.531Z
Learning: For fluent-bit Debian packaging, /opt/fluent-bit/bin/ is the appropriate installation path since the package may be installed from non-official Debian sources, making /opt compliant with FHS for optional software packages.
Applied to files:
src/CMakeLists.txt
📚 Learning: 2025-12-17T18:45:02.283Z
Learnt from: edsiper
Repo: fluent/fluent-bit PR: 11286
File: src/flb_http_client.c:1657-1715
Timestamp: 2025-12-17T18:45:02.283Z
Learning: In the Fluent Bit OAuth2 implementation (src/flb_oauth2.c), the function flb_oauth2_get_access_token returns a pointer to the cached token stored in the OAuth2 context (ctx->access_token), not a new allocation. The OAuth2 context manages the token lifecycle, so callers must not free the returned token pointer.
Applied to files:
plugins/out_s3/s3.h
🧬 Code graph analysis (23)
plugins/out_s3/s3_stream.h (1)
plugins/out_s3/s3_stream.c (4)
stream_compress_file(142-213)stream_process_msgpack_file(561-629)stream_json_processor(631-659)stream_log_key_processor(734-763)
include/fluent-bit/flb_aws_util.h (1)
src/aws/flb_aws_util.c (2)
flb_get_s3_key(1262-1344)flb_aws_uri_encode_path(1445-1481)
tests/internal/parquet.c (3)
src/flb_parquet_impl.cpp (1)
field_name(548-548)src/flb_pack.c (1)
flb_pack_json(530-535)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
tests/runtime/out_s3_multipart.c (1)
tests/include/aws_client_mock.c (1)
flb_aws_client_mock_clear_generator_instance(61-64)
include/fluent-bit/flb_parquet.h (1)
src/flb_parquet_impl.cpp (8)
flb_parquet_validate_schema(853-876)flb_parquet_validate_schema(853-855)flb_parquet_schema_create(878-903)flb_parquet_schema_create(878-880)flb_parquet_schema_destroy(905-911)flb_parquet_schema_destroy(905-905)flb_msgpack_to_parquet_streaming(913-1239)flb_msgpack_to_parquet_streaming(913-918)
tests/runtime/out_s3_format.c (2)
tests/include/s3_test_helpers.h (5)
s3_test_next(428-436)s3_test_create_temp_db_path(137-194)s3_test_create_temp_store_dir(204-257)s3_test_cleanup_temp_db(407-420)s3_test_cleanup_temp_store_dir(374-399)tests/include/aws_client_mock.c (3)
flb_aws_client_mock_configure_generator(32-37)flb_aws_client_get_mock_generator(127-130)flb_aws_client_mock_clear_generator_instance(61-64)
plugins/out_s3/s3_auth.h (1)
plugins/out_s3/s3_auth.c (3)
s3_auth_init_endpoint(79-128)s3_auth_request_presigned_url(173-260)s3_auth_fetch_presigned_url(359-400)
plugins/out_s3/s3_store.h (1)
plugins/out_s3/s3_store.c (1)
flb_chunk_file_open(533-569)
src/flb_snappy.c (1)
include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
plugins/out_s3/s3_blob.c (7)
plugins/out_s3/s3_multipart.c (2)
s3_multipart_abort(737-799)s3_multipart_check_upload_exists(805-879)plugins/out_s3/s3_auth.c (1)
s3_auth_fetch_presigned_url(359-400)src/flb_input_blob.c (1)
flb_input_blob_delivery_notification_destroy(222-231)src/flb_blob_db.c (1)
flb_blob_file_update_remote_id(586-617)src/aws/flb_aws_util.c (1)
flb_get_s3_key(1262-1344)src/flb_log_event_decoder.c (1)
flb_log_event_decoder_next(309-456)plugins/out_s3/s3_queue.c (1)
s3_queue_add_pending_file(260-273)
plugins/out_s3/s3_multipart.c (3)
include/fluent-bit/flb_compat.h (1)
usleep(129-133)plugins/out_s3/s3.c (1)
write_seq_index(127-147)plugins/out_s3/s3_auth.c (1)
s3_auth_fetch_presigned_url(359-400)
plugins/out_s3/s3_queue.h (1)
plugins/out_s3/s3_queue.c (11)
s3_queue_remove(308-316)s3_queue_buffer_chunk(356-378)s3_queue_add_file(117-131)s3_queue_add_part(133-203)s3_queue_add_pending_file(260-273)s3_queue_add_file_unlocked(59-112)s3_queue_add_pending_file_unlocked(209-255)s3_queue_process_entry(748-807)s3_queue_recover_all(850-895)s3_queue_recover_from_database(988-1076)s3_queue_recover_from_fstore(1141-1201)
plugins/out_s3/s3_stream.c (3)
src/aws/flb_aws_compress.c (1)
flb_aws_compression_compress(109-126)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)plugins/out_s3/s3_store.c (1)
flb_chunk_file_open(533-569)
plugins/out_s3/s3_store.c (4)
src/flb_sds.c (4)
flb_sds_create_size(92-95)flb_sds_printf(336-387)flb_sds_destroy(389-399)flb_sds_create(78-90)src/flb_fstore.c (8)
flb_fstore_file_delete(259-277)flb_fstore_file_create(153-190)flb_fstore_file_meta_set(74-108)flb_fstore_file_append(298-329)flb_fstore_create(479-525)flb_fstore_destroy(527-564)flb_fstore_stream_create(335-402)flb_fstore_file_inactive(238-256)include/fluent-bit/flb_mem.h (2)
flb_free(126-128)flb_calloc(84-96)include/fluent-bit/flb_compat.h (1)
gmtime_r(76-82)
src/aws/flb_aws_util.c (4)
src/flb_http_client.c (1)
flb_http_client_destroy(1821-1828)src/flb_sds.c (3)
flb_sds_create_size(92-95)flb_sds_cat(120-141)flb_sds_destroy(389-399)include/fluent-bit/flb_compat.h (1)
gmtime_r(76-82)include/fluent-bit/flb_mem.h (1)
flb_calloc(84-96)
src/flb_parquet_impl.cpp (1)
include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
plugins/out_s3/s3.h (1)
plugins/out_s3/s3.c (4)
s3_format_chunk(1196-1337)s3_upload_file(1015-1120)s3_initiate_multipart_upload(898-1013)s3_plugin_under_test(73-79)
tests/internal/fuzzers/aws_util_fuzzer.c (1)
src/aws/flb_aws_util.c (1)
flb_get_s3_key(1262-1344)
plugins/out_s3/s3_multipart.h (1)
plugins/out_s3/s3_multipart.c (13)
flb_s3_calculate_optimal_part_size(164-216)s3_multipart_get_md5_base64(218-248)s3_multipart_create_headers(357-417)s3_multipart_upload_new(450-499)s3_multipart_upload_destroy(419-444)s3_multipart_initiate(501-558)s3_multipart_upload_part(560-652)s3_multipart_complete(654-735)s3_multipart_abort(737-799)s3_multipart_abort_with_url(1021-1042)s3_multipart_check_upload_exists(805-879)s3_multipart_upload_file_part(893-1015)s3_multipart_upload_file(1202-1293)
tests/internal/aws_util.c (1)
src/aws/flb_aws_util.c (1)
flb_get_s3_key(1262-1344)
include/fluent-bit/flb_blob_db.h (1)
src/flb_blob_db.c (18)
flb_blob_file_update_s3_key(619-650)flb_blob_file_update_s3_key(2053-2058)flb_blob_db_file_get_next_aborted(822-953)flb_blob_db_file_get_next_aborted(1934-1945)flb_blob_db_file_fetch_part_ids(1479-1553)flb_blob_db_file_fetch_part_ids(2026-2033)flb_blob_db_file_fetch_part_count(1555-1584)flb_blob_db_file_fetch_part_count(2035-2039)flb_blob_db_file_fetch_all_parts(1586-1672)flb_blob_db_file_fetch_all_parts(2060-2069)flb_blob_db_file_get_next_pending(1678-1806)flb_blob_db_file_get_next_pending(2041-2051)flb_blob_db_file_part_check_uploaded(1812-1847)flb_blob_db_file_part_check_uploaded(2071-2076)flb_blob_db_file_parts_in_progress(1098-1133)flb_blob_db_file_parts_in_progress(2078-2083)flb_blob_db_reset_zombie_parts(1850-1874)flb_blob_db_reset_zombie_parts(2085-2088)
src/flb_blob_db.c (3)
src/flb_sqldb.c (2)
flb_sqldb_lock(173-178)flb_sqldb_unlock(180-185)lib/cfl/src/cfl_sds.c (2)
cfl_sds_create(113-125)cfl_sds_destroy(127-137)include/fluent-bit/flb_mem.h (2)
flb_calloc(84-96)flb_free(126-128)
tests/include/s3_test_helpers.h (2)
plugins/out_s3/s3.c (1)
setenv(66-69)tests/include/aws_client_mock.c (1)
flb_aws_client_mock_generator_count_unused_requests(133-137)
b5a7522 to
56d51f6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/aws/flb_aws_util.c (1)
1011-1340:$FILE_NAME/$FILE_PATHhandling should be Windows-aware (\separator + leading\).
Right now stripping and filename extraction only consider/. On Windows log paths,$FILE_NAMEwill likely not work, and$FILE_PATHmay keep leading backslashes.Suggested direction
@@ - while (*valid_file_path == '.' || *valid_file_path == '/') { + while (*valid_file_path == '.' || *valid_file_path == '/' || *valid_file_path == '\\') { valid_file_path++; } @@ - file_name = strrchr(valid_file_path, '/'); + file_name = strrchr(valid_file_path, '/'); + if (!file_name) { + file_name = strrchr(valid_file_path, '\\'); + }
🤖 Fix all issues with AI agents
In @plugins/out_s3/s3_queue.h:
- Line 26: The header declares s3_queue_remove as void but the implementation in
s3_queue.c defines int s3_queue_remove, causing a type mismatch; update the
declaration in s3_queue.h to match the implementation's return type (change void
to int for s3_queue_remove(struct flb_s3 *ctx, struct upload_queue *entry)) or
alternatively change the implementation to return void—ensure both the prototype
in s3_queue.h and the function signature in s3_queue.c for s3_queue_remove use
the same return type.
In @src/aws/flb_aws_util.c:
- Around line 851-1009: The offset parsing in process_timezone_patterns uses
atoi and then multiplies offset_hours by 3600 which can overflow; replace this
with strtol parsing of bracket_start..bracket_end, verify the entire offset
substring was consumed, check for errors/ERANGE, then clamp the parsed long to a
safe range (e.g., -48..+48 or -24..+24 per policy) before converting to an
integer and computing seconds; update variables offset_str/offset_hours to use
the validated value and on parse error fall back to the existing invalid-pattern
behavior (logging and appending the raw pattern) to avoid UB.
In @src/flb_parquet_impl.cpp:
- Around line 511-610: parse_schema_from_json currently silently coerces unknown
type names to arrow::utf8() via TYPE_FACTORY_MAP fallback, allowing typos like
"int" to produce incorrect schemas; change this to treat unknown types as errors
by checking TYPE_FACTORY_MAP.find(type_name) and if not found return
arrow::Status::Invalid with a clear message including the unknown type name
instead of assigning arrow::utf8(), and apply the same change to the analogous
parsing logic located in the other block (lines ~853-903) so both parsing paths
fail on unrecognized types rather than defaulting to utf8.
- Around line 913-1239: The error cleanup path currently unlinks output_file
without first closing the parquet::arrow::FileWriter (writer) and the
arrow::io::OutputStream (output_stream); update the catch/cleanup section to
check if writer and output_stream are non-null and call writer->Close() and
output_stream->Close() (logging any errors) before calling unlink(output_file),
mirroring the successful-path close logic (use close_status/stream_close_status
checks and flb_error messages) so Windows can release file handles prior to
unlinking.
In @tests/runtime/out_s3_blob.c:
- Around line 136-155: The immediate stat(db_path) after flb_start(ctx) can race
with async DB creation; wrap the stat check in a short retry loop (e.g., up to a
few attempts with small sleeps/usleep) before invoking TEST_CHECK to avoid
flakes: after calling flb_start(ctx) repeatedly call stat(db_path, &st) until it
returns 0 or the retry limit is reached, then use TEST_CHECK(ret == 0) and keep
the existing TEST_MSG; reference the existing flb_start, stat, db_path, and
TEST_CHECK symbols when implementing the loop.
In @tests/runtime/out_s3_error_handling.c:
- Around line 6-7: The file currently includes POSIX-only headers `<sys/stat.h>`
and `<unistd.h>` which break Windows builds; replace these direct includes by
either using the cross-platform helpers/macros provided in `s3_test_helpers.h`
(import that header and use its macros/functions instead of direct POSIX calls)
or wrap the existing includes and any usages in `#ifndef _WIN32` / `#endif`
guards so Windows compiles without them; update any code that relied on
POSIX-only functions to use the helper abstractions or guarded alternatives
(look for the include lines and usages in
tests/runtime/out_s3_error_handling.c).
In @tests/runtime/out_s3_multipart.c:
- Around line 438-493: The loop calling flb_lib_push in
flb_test_multiple_data_pushes currently ignores its return value; update the
loop to capture the result of flb_lib_push and assert/validate it (e.g., using
TEST_CHECK or equivalent) each iteration, so any failed push is detected
immediately (use the existing JSON_TD and S3_TEST_SLEEP_MS calls unchanged); if
a push fails, fail the test or break the loop to avoid confusing "mock not
exhausted" outcomes.
🧹 Nitpick comments (25)
tests/runtime/out_s3_edge_cases.c (6)
8-11: Including.cfile directly may cause linker issues.Line 9 includes
aws_client_mock.cdirectly. This is unconventional and can cause duplicate symbol errors if this translation unit is linked with others that also include or compile that.cfile. Consider compilingaws_client_mock.cseparately and linking it, or ensure the included.cusesstaticlinkage for all symbols.#include "../include/aws_client_mock.h" -#include "../include/aws_client_mock.c" #include "../include/s3_test_helpers.h"Then ensure
aws_client_mock.cis added to the test build target in CMakeLists.
88-93: Redundant assertion —retis unchanged since the earlier check.
retwas assigned at line 81 and verified at line 82. By line 93,rethasn't been modified, making this secondTEST_CHECK(ret == 0)redundant. The same pattern appears in nearly all other test functions.If the intent is to confirm the test reached the cleanup phase without crashing, a simple comment suffices — or assert something meaningful like verifying the mock chain was fully consumed.
♻️ Suggested improvement
- /* Verify no crash occurred - reaching here is the test */ - TEST_CHECK(ret == 0); + /* Test passed: no crash occurred during empty flush */
170-180: Consider asserting that the mock chain was actually consumed.The test pushes data and waits via
S3_TEST_WAIT_MOCK_EXHAUSTED, but the only assertion is the reusedret == 0from startup. Adding an explicit assertion that the expected HTTP calls occurred would strengthen the test.For example, if the mock framework exposes a call count or exhaustion status, assert on it:
/* After S3_TEST_WAIT_MOCK_EXHAUSTED */ TEST_CHECK(flb_aws_client_mock_is_exhausted() == FLB_TRUE);
246-251: Test claims "max concurrent uploads" but doesn't stress concurrency.The test sets
"workers", "10"and pushes a singleJSON_TDrecord. This configuration verifies the plugin accepts the setting, but doesn't exercise concurrent upload behavior. Consider pushing multiple distinct data batches rapidly or adding commentary that this is a configuration-acceptance test only.
836-848: "Maximum" boundary test uses modest values, not actual S3 maximums.S3 supports up to 10,000 parts per multipart upload and 5 GB per part. Using
"total_file_size", "100M"and"upload_chunk_size", "50M"tests acceptance of larger-than-default values but doesn't approach true limits. Consider renaming to "config_boundary_large" or documenting that actual S3 maximums aren't tested here.
957-1099:flb_test_rapid_start_stopeffectively tests two sequential configurations, not rapid cycling.The test runs two complete start/stop cycles with full cleanup between them. A true rapid start/stop stress test would loop many times (e.g., 10–50 iterations) with minimal delay. Consider renaming to
flb_test_sequential_config_changeor adding a loop for actual rapid cycling if that's the intent.plugins/out_s3/s3_auth.h (1)
42-47: Consider usings3_presigned_url_type_tfor type safety.The
url_typeparameter is declared asintbut accepts values from thes3_presigned_url_type_tenum defined above. Using the enum type directly would improve type safety and self-documentation.Suggested fix
/* Unified presigned URL fetcher - works for both standard and blob uploads */ int s3_auth_fetch_presigned_url(struct flb_s3 *ctx, flb_sds_t *result_url, - int url_type, + s3_presigned_url_type_t url_type, const char *s3_key, const char *upload_id, int part_number);tests/include/s3_test_helpers.h (3)
5-6: Copyright year inconsistency.The copyright year here is
2015-2024, but other files in this PR (e.g.,s3_auth.h) use2015-2026. Consider aligning for consistency.
417-430: Potential buffer overflow with long paths.The buffer size of 300 bytes may be insufficient when
db_pathis already close toPATH_MAX(typically 4096). Ifdb_pathis near the limit, appending-journal(8 chars) could cause truncation.Suggested fix
static inline void s3_test_cleanup_temp_db(const char *db_path) { - char buf[300]; + char buf[PATH_MAX]; if (!db_path) { return; }
464-480: External function dependency not declared in this header.
flb_aws_client_mock_generator_count_unused_requests()is called but not declared in this header. This relies on the including translation unit to also includeaws_client_mock.hor similar.Consider adding a forward declaration or documenting this dependency in the header comment.
tests/runtime/out_s3_error_handling.c (1)
9-10: Including a .c file directly is unconventional.Including
aws_client_mock.cdirectly compiles the mock implementation into this translation unit. While this pattern works, it's unusual and can cause issues with multiple definitions if the same .c file is included elsewhere. Consider linking the mock as a separate compilation unit if not already handled by the build system.plugins/out_s3/s3_store.c (1)
511-519: Silent early return may mask data corruption.When
fsf->data != s3_file, the function returns 0 (success) without any logging. This could hide bugs where thefsf->datapointer was corrupted or incorrectly reassigned. Consider adding a warning log.Suggested fix
if (fsf->data != s3_file) { + flb_plg_warn(ctx->ins, "s3_file mismatch: fsf->data=%p, s3_file=%p", + fsf->data, s3_file); return 0; }plugins/out_s3/s3.h (1)
82-82: Consider documenting the etags array sizing.The
etags[10000]array matchesS3_AWS_MAX_PARTS(10000), which aligns with AWS S3's maximum parts per multipart upload. This consumes ~80KB per active upload (10000 × 8 bytes). Consider adding a brief comment linking this to the constraint.Suggested documentation
- flb_sds_t etags[10000]; + flb_sds_t etags[10000]; /* Sized to match S3_AWS_MAX_PARTS */plugins/out_s3/s3_auth.c (2)
155-171: Consider checking return values from header functions.The function ignores return values from
flb_http_add_header,flb_http_basic_auth, andflb_http_bearer_auth. While failures are rare, propagating errors could aid debugging.
230-230: Consider reducing log verbosity for routine operations.Logging at
infolevel for every successful presigned URL retrieval may be noisy in production with frequent uploads. Consider usingdebuglevel instead.Suggested change
- flb_plg_info(ctx->ins, "Pre-signed URL retrieved successfully"); + flb_plg_debug(ctx->ins, "Pre-signed URL retrieved successfully");plugins/out_s3/s3_blob.c (1)
429-435: Minor indentation inconsistency.The
flb_plg_errorcall at line 430 appears to have inconsistent indentation with the surrounding code block.Suggested fix
if (ret == -1) { - flb_plg_error(ctx->ins, "cannot insert blob file: %s (size=%zu)", - file_path, file_size); + flb_plg_error(ctx->ins, "cannot insert blob file: %s (size=%zu)", + file_path, file_size); cfl_sds_destroy(file_path); cfl_sds_destroy(source); continue;tests/runtime/out_s3_multipart.c (1)
17-507: Tests are useful, but many only assert “mock exhausted”; add at least one state assertion for failure paths.
E.g., for create/upload/complete failure tests, consider asserting expected blob-db state (aborted flag / delivery_attempts / part_count) or store_dir cleanup, otherwise regressions could still “exhaust the mock” but leave the system in a bad state.tests/runtime/out_s3_blob.c (1)
157-570: Delivery-attempt / freshness / chunk-size tests currently only validate parsing; consider adding one behavioral assertion.
A small follow-up could force a specific failure (e.g., UploadPart 500) and then assert delivery_attempts increments / aborted transitions / retry selection from DB. That would make these tests regression-resistant.src/aws/flb_aws_util.c (2)
176-231: Credential-refresh retry is good, but ensure no retry-storm interactions withretry_requests.
Now a request can be attempted: initial → auto-retry (ifretry_requests) → refresh+retry (for auth error). That might be desired, but please confirm it won’t amplify load in high-error scenarios.
1435-1477:flb_aws_uri_encode_path()is correct in spirit, but per-charflb_sds_printf()is expensive.
Consider a small hex-lookup + directflb_sds_cat()of 3 bytes for encoded chars to avoid repeatedvsnprintfcalls.src/flb_parquet_impl.cpp (1)
644-851:validate_parquet_file()float/double comparisons viastd::to_stringcan be flaky.
If tests compare textual float values, minor representation differences can fail across platforms/libstdc++ versions. Consider numeric comparison with tolerance when expected_type is float/double.tests/internal/aws_util.c (1)
47-77: Add a Windows-path test case (C:\...or\var\log\...) for$FILE_NAMEextraction.
Given current implementation only searches for/, this is likely to regress on Windows; having an internal unit test will lock in expected behavior once fixed.Also applies to: 402-533
tests/runtime/out_s3_format.c (1)
19-902: Consider asserting at least one content property (compressed magic / extracted key-only payload) instead of only “upload happened”.
Right now, most cases would pass as long as the multipart sequence occurs, even if the payload is wrong.include/fluent-bit/flb_blob_db.h (1)
208-219:SQL_GET_NEXT_PENDING_FILEuses bothDISTINCTandGROUP BY; one is likely redundant.
Not a correctness issue here (blob_files columns are functionally dependent onf.id), but simplifying would reduce confusion.src/flb_blob_db.c (1)
1849-1874: Consider using a prepared statement for consistency.The function uses
sqlite3_execwith a raw SQL string, which differs from the prepared statement pattern used elsewhere in this file. While functionally correct and acceptable for a startup/recovery operation, a prepared statement would provide:
- Consistency with the rest of the codebase
- Marginally better performance if called multiple times
This is a minor suggestion and not blocking.
♻️ Optional: Use prepared statement
Add to
prepare_stmts():/* reset zombie parts */ result = sqlite3_prepare_v2(context->db->handler, "UPDATE blob_parts SET in_progress = 0 " "WHERE uploaded = 0 AND in_progress = 1", -1, &context->stmt_reset_zombie_parts, NULL); if (result != SQLITE_OK) { return FLB_BLOB_DB_ERROR_PREPARING_STATEMENT_RESET_ZOMBIE_PARTS; }Then update the function:
int flb_blob_db_reset_zombie_parts(struct flb_blob_db *context) { - const char *sql; + sqlite3_stmt *statement; int result; if (!context || !context->db) { return FLB_BLOB_DB_ERROR_INVALID_BLOB_DB_CONTEXT; } - sql = "UPDATE blob_parts SET in_progress = 0 " - "WHERE uploaded = 0 AND in_progress = 1"; + statement = context->stmt_reset_zombie_parts; flb_sqldb_lock(context->db); - result = sqlite3_exec(context->db->handler, sql, NULL, NULL, NULL); + result = sqlite3_step(statement); + sqlite3_reset(statement); flb_sqldb_unlock(context->db); - if (result != SQLITE_OK) { + if (result != SQLITE_DONE) { context->last_error = result; return -1; } return FLB_BLOB_DB_SUCCESS; }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (47)
include/fluent-bit/aws/flb_aws_compress.hinclude/fluent-bit/flb_aws_util.hinclude/fluent-bit/flb_blob_db.hinclude/fluent-bit/flb_parquet.hinclude/fluent-bit/flb_snappy.hplugins/in_blob/blob_file.cplugins/out_kinesis_firehose/firehose.cplugins/out_s3/CMakeLists.txtplugins/out_s3/s3.cplugins/out_s3/s3.hplugins/out_s3/s3_auth.cplugins/out_s3/s3_auth.hplugins/out_s3/s3_blob.cplugins/out_s3/s3_blob.hplugins/out_s3/s3_multipart.cplugins/out_s3/s3_multipart.hplugins/out_s3/s3_queue.cplugins/out_s3/s3_queue.hplugins/out_s3/s3_store.cplugins/out_s3/s3_store.hplugins/out_s3/s3_stream.cplugins/out_s3/s3_stream.hsrc/aws/flb_aws_compress.csrc/aws/flb_aws_util.csrc/flb_blob_db.csrc/flb_parquet_impl.cppsrc/flb_signv4.csrc/flb_snappy.ctests/include/aws_client_mock.ctests/include/aws_client_mock.htests/include/s3_test_helpers.htests/internal/CMakeLists.txttests/internal/aws_compress.ctests/internal/aws_util.ctests/internal/fuzzers/aws_util_fuzzer.ctests/internal/parquet.ctests/runtime/CMakeLists.txttests/runtime/out_s3.ctests/runtime/out_s3_blob.ctests/runtime/out_s3_config.ctests/runtime/out_s3_edge_cases.ctests/runtime/out_s3_error_handling.ctests/runtime/out_s3_format.ctests/runtime/out_s3_multipart.ctests/runtime/out_s3_queue.ctests/runtime/out_s3_recovery.ctests/runtime/out_s3_special_features.c
💤 Files with no reviewable changes (1)
- tests/runtime/out_s3.c
✅ Files skipped from review due to trivial changes (1)
- tests/runtime/out_s3_config.c
🚧 Files skipped from review as they are similar to previous changes (9)
- tests/internal/CMakeLists.txt
- plugins/in_blob/blob_file.c
- tests/internal/fuzzers/aws_util_fuzzer.c
- plugins/out_s3/s3_stream.h
- tests/include/aws_client_mock.h
- src/flb_signv4.c
- tests/runtime/CMakeLists.txt
- plugins/out_s3/CMakeLists.txt
- plugins/out_kinesis_firehose/firehose.c
👮 Files not reviewed due to content moderation or server errors (7)
- plugins/out_s3/s3_stream.c
- plugins/out_s3/s3_queue.c
- tests/runtime/out_s3_queue.c
- plugins/out_s3/s3_multipart.h
- plugins/out_s3/s3_multipart.c
- include/fluent-bit/flb_parquet.h
- tests/internal/parquet.c
🧰 Additional context used
🧠 Learnings (15)
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
src/aws/flb_aws_compress.csrc/flb_parquet_impl.cppinclude/fluent-bit/aws/flb_aws_compress.hsrc/flb_snappy.ctests/internal/aws_compress.ctests/internal/parquet.cinclude/fluent-bit/flb_snappy.hplugins/out_s3/s3_stream.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like #ifdef FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.
Applied to files:
src/aws/flb_aws_compress.cinclude/fluent-bit/aws/flb_aws_compress.hsrc/flb_snappy.ctests/internal/aws_compress.ctests/internal/parquet.cinclude/fluent-bit/flb_snappy.hplugins/out_s3/s3_stream.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use #ifdef FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Applied to files:
src/aws/flb_aws_compress.csrc/flb_parquet_impl.cppinclude/fluent-bit/aws/flb_aws_compress.htests/internal/aws_compress.ctests/internal/parquet.cinclude/fluent-bit/flb_snappy.hplugins/out_s3/s3_stream.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.
Applied to files:
src/aws/flb_aws_compress.cinclude/fluent-bit/aws/flb_aws_compress.hsrc/flb_snappy.ctests/internal/aws_compress.ctests/internal/parquet.cinclude/fluent-bit/flb_snappy.hplugins/out_s3/s3_stream.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.
Applied to files:
src/aws/flb_aws_compress.cinclude/fluent-bit/aws/flb_aws_compress.hsrc/flb_snappy.ctests/internal/aws_compress.ctests/internal/parquet.cinclude/fluent-bit/flb_snappy.h
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.
Applied to files:
src/aws/flb_aws_compress.csrc/flb_parquet_impl.cpptests/internal/aws_compress.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.
Applied to files:
src/flb_parquet_impl.cppinclude/fluent-bit/flb_parquet.htests/internal/aws_compress.ctests/include/s3_test_helpers.hplugins/out_s3/s3_store.c
📚 Learning: 2025-08-07T10:15:46.187Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10691
File: .github/workflows/unit-tests.yaml:94-101
Timestamp: 2025-08-07T10:15:46.187Z
Learning: Apache Arrow latest packages have changing checksums, making hardcoded checksum verification impractical for CI workflows that download the latest version.
Applied to files:
src/flb_parquet_impl.cpp
📚 Learning: 2025-09-04T07:28:37.083Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10832
File: src/wasm/CMakeLists.txt:112-131
Timestamp: 2025-09-04T07:28:37.083Z
Learning: In fluent-bit CMake files, the user cosmo0920 prefers treating Git as a command rather than a package, emphasizing that Git is not a pkg-config retrievable package but just a command.
Applied to files:
src/flb_parquet_impl.cpp
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
src/flb_parquet_impl.cpptests/internal/aws_compress.c
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.
Applied to files:
include/fluent-bit/flb_aws_util.hinclude/fluent-bit/flb_snappy.h
📚 Learning: 2025-09-04T12:35:22.872Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:1339-1344
Timestamp: 2025-09-04T12:35:22.872Z
Learning: In the Fluent Bit S3 plugin, the user prefers to maintain current retry_limit behavior without special handling for FLB_OUT_RETRY_UNLIMITED (-1), as there's no documentation indicating -1 should be used for infinite retries and consistency with current logic is preferred.
Applied to files:
plugins/out_s3/s3_auth.cplugins/out_s3/s3_queue.cplugins/out_s3/s3_multipart.c
📚 Learning: 2025-12-17T18:45:02.283Z
Learnt from: edsiper
Repo: fluent/fluent-bit PR: 11286
File: src/flb_http_client.c:1657-1715
Timestamp: 2025-12-17T18:45:02.283Z
Learning: In the Fluent Bit OAuth2 implementation (src/flb_oauth2.c), the function flb_oauth2_get_access_token returns a pointer to the cached token stored in the OAuth2 context (ctx->access_token), not a new allocation. The OAuth2 context manages the token lifecycle, so callers must not free the returned token pointer.
Applied to files:
plugins/out_s3/s3_auth.cplugins/out_s3/s3.h
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
Applied to files:
src/aws/flb_aws_util.cplugins/out_s3/s3_stream.c
📚 Learning: 2025-09-04T12:35:36.904Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:3275-3282
Timestamp: 2025-09-04T12:35:36.904Z
Learning: The out_s3 plugin intentionally uses a simple numeric comparison for retry_limit (chunk->failures >= ctx->ins->retry_limit) rather than the standard Fluent Bit pattern that checks for FLB_OUT_RETRY_UNLIMITED (-1). The maintainer wants to keep this current behavior for consistency within the plugin.
Applied to files:
plugins/out_s3/s3_queue.cplugins/out_s3/s3_multipart.cplugins/out_s3/s3_stream.c
🧬 Code graph analysis (20)
plugins/out_s3/s3_auth.h (1)
plugins/out_s3/s3_auth.c (3)
s3_auth_init_endpoint(79-128)s3_auth_request_presigned_url(173-260)s3_auth_fetch_presigned_url(359-402)
plugins/out_s3/s3_store.h (1)
plugins/out_s3/s3_store.c (1)
flb_chunk_file_open(534-570)
src/aws/flb_aws_compress.c (3)
src/flb_snappy.c (1)
flb_snappy_compress_framed_data(134-255)src/flb_gzip.c (1)
flb_gzip_compress(157-252)src/flb_zstd.c (1)
flb_zstd_compress(33-57)
src/flb_parquet_impl.cpp (1)
include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
tests/internal/aws_util.c (1)
src/aws/flb_aws_util.c (1)
flb_get_s3_key(1262-1340)
include/fluent-bit/flb_parquet.h (1)
src/flb_parquet_impl.cpp (8)
flb_parquet_validate_schema(853-876)flb_parquet_validate_schema(853-855)flb_parquet_schema_create(878-903)flb_parquet_schema_create(878-880)flb_parquet_schema_destroy(905-911)flb_parquet_schema_destroy(905-905)flb_msgpack_to_parquet_streaming(913-1239)flb_msgpack_to_parquet_streaming(913-918)
plugins/out_s3/s3_auth.c (7)
include/fluent-bit/flb_mem.h (1)
flb_free(126-128)src/flb_utils.c (1)
flb_utils_url_split(1531-1667)src/tls/flb_tls.c (1)
flb_tls_create(183-232)src/flb_upstream.c (2)
flb_upstream_create_url(416-481)flb_upstream_conn_get(711-844)src/flb_http_client.c (5)
flb_http_add_header(1015-1047)flb_http_basic_auth(1394-1398)flb_http_bearer_auth(1406-1441)flb_http_client(866-911)flb_http_do(1645-1705)src/flb_sds.c (5)
flb_sds_copy(260-281)flb_sds_create_len(58-76)flb_sds_destroy(389-399)flb_sds_create_size(92-95)flb_sds_printf(336-387)src/aws/flb_aws_util.c (1)
flb_aws_uri_encode_path(1441-1477)
src/aws/flb_aws_util.c (3)
src/flb_http_client.c (1)
flb_http_client_destroy(1821-1828)src/flb_sds.c (4)
flb_sds_create_size(92-95)flb_sds_cat(120-141)flb_sds_create_len(58-76)flb_sds_printf(336-387)include/fluent-bit/flb_compat.h (1)
gmtime_r(76-82)
tests/internal/aws_compress.c (3)
src/flb_snappy.c (2)
flb_snappy_compress_framed_data(134-255)flb_snappy_uncompress_framed_data(257-481)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)src/aws/flb_aws_compress.c (1)
flb_aws_compression_get_type(87-108)
plugins/out_s3/s3_queue.c (9)
include/fluent-bit/flb_mem.h (2)
flb_calloc(84-96)flb_free(126-128)plugins/out_s3/s3_store.c (3)
s3_store_buffer_put(207-242)s3_store_file_unlock(579-584)s3_store_file_lock(572-577)plugins/out_s3/s3_auth.c (1)
s3_auth_fetch_presigned_url(359-402)plugins/out_s3/s3_multipart.c (1)
s3_multipart_upload_file_part(893-1015)src/flb_blob_db.c (2)
flb_blob_db_file_part_update_remote_id(1258-1289)flb_blob_file_update_remote_id(586-617)plugins/out_s3/s3.c (1)
s3_format_chunk(1196-1337)lib/cfl/src/cfl_sds.c (2)
cfl_sds_len(88-91)cfl_sds_destroy(127-137)plugins/out_s3/s3_blob.c (1)
s3_blob_notify_delivery(77-114)src/aws/flb_aws_util.c (1)
flb_get_s3_key(1262-1340)
include/fluent-bit/flb_blob_db.h (1)
src/flb_blob_db.c (12)
flb_blob_file_update_s3_key(619-650)flb_blob_file_update_s3_key(2053-2058)flb_blob_db_file_get_next_aborted(822-953)flb_blob_db_file_get_next_aborted(1934-1945)flb_blob_db_file_fetch_part_count(1555-1584)flb_blob_db_file_fetch_part_count(2035-2039)flb_blob_db_file_fetch_all_parts(1586-1672)flb_blob_db_file_fetch_all_parts(2060-2069)flb_blob_db_file_get_next_pending(1678-1806)flb_blob_db_file_get_next_pending(2041-2051)flb_blob_db_reset_zombie_parts(1850-1874)flb_blob_db_reset_zombie_parts(2085-2088)
tests/internal/parquet.c (3)
src/flb_parquet_impl.cpp (1)
field_name(548-548)src/flb_pack.c (1)
flb_pack_json(530-535)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
plugins/out_s3/s3.h (1)
plugins/out_s3/s3.c (5)
cb_s3_upload(1340-1607)s3_format_chunk(1196-1337)s3_upload_file(1015-1120)s3_initiate_multipart_upload(898-1013)s3_plugin_under_test(73-79)
tests/include/s3_test_helpers.h (2)
plugins/out_s3/s3.c (1)
setenv(66-69)tests/include/aws_client_mock.c (1)
flb_aws_client_mock_generator_count_unused_requests(143-147)
include/fluent-bit/flb_snappy.h (1)
src/flb_snappy.c (1)
flb_snappy_compress_framed_data(134-255)
plugins/out_s3/s3_multipart.h (1)
plugins/out_s3/s3_multipart.c (13)
flb_s3_calculate_optimal_part_size(164-216)s3_multipart_get_md5_base64(218-248)s3_multipart_create_headers(357-417)s3_multipart_upload_new(450-499)s3_multipart_upload_destroy(419-444)s3_multipart_initiate(501-558)s3_multipart_upload_part(560-652)s3_multipart_complete(654-735)s3_multipart_abort(737-799)s3_multipart_abort_with_url(1021-1042)s3_multipart_check_upload_exists(805-879)s3_multipart_upload_file_part(893-1015)s3_multipart_upload_file(1202-1293)
tests/runtime/out_s3_queue.c (4)
tests/include/s3_test_helpers.h (4)
s3_test_next(438-446)s3_test_create_temp_store_dir(204-257)s3_test_cleanup_temp_db(417-430)s3_test_cleanup_temp_store_dir(384-409)tests/include/aws_client_mock.c (3)
flb_aws_client_mock_configure_generator(42-47)flb_aws_client_get_mock_generator(137-140)flb_aws_client_mock_clear_generator_instance(71-74)src/flb_lib.c (6)
flb_create(143-225)flb_service_set(652-678)flb_input(266-276)flb_lib_push(843-870)flb_stop(1011-1055)flb_destroy(228-263)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
plugins/out_s3/s3_store.c (4)
src/flb_sds.c (4)
flb_sds_create_size(92-95)flb_sds_printf(336-387)flb_sds_destroy(389-399)flb_sds_create(78-90)src/flb_fstore.c (8)
flb_fstore_file_delete(259-277)flb_fstore_file_create(153-190)flb_fstore_file_meta_set(74-108)flb_fstore_file_append(298-329)flb_fstore_create(479-525)flb_fstore_destroy(527-564)flb_fstore_stream_create(335-402)flb_fstore_file_inactive(238-256)include/fluent-bit/flb_mem.h (2)
flb_free(126-128)flb_calloc(84-96)include/fluent-bit/flb_compat.h (1)
gmtime_r(76-82)
src/flb_blob_db.c (3)
src/flb_sqldb.c (3)
flb_sqldb_close(129-149)flb_sqldb_lock(173-178)flb_sqldb_unlock(180-185)lib/cfl/src/cfl_sds.c (2)
cfl_sds_create(113-125)cfl_sds_destroy(127-137)include/fluent-bit/flb_mem.h (2)
flb_calloc(84-96)flb_free(126-128)
plugins/out_s3/s3_queue.h (1)
plugins/out_s3/s3_queue.c (11)
s3_queue_remove(307-315)s3_queue_buffer_chunk(355-377)s3_queue_add_file(116-130)s3_queue_add_part(132-202)s3_queue_add_pending_file(259-272)s3_queue_add_file_unlocked(58-111)s3_queue_add_pending_file_unlocked(208-254)s3_queue_process_entry(747-806)s3_queue_recover_all(849-894)s3_queue_recover_from_database(987-1075)s3_queue_recover_from_fstore(1140-1200)
56d51f6 to
0e0ad38
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cmake/kafka.cmake (1)
37-55: Non-Windows OAuth Bearer is forced ON without checking TLS; this violates Kafka security best practices and the misleading comment.Kafka's OAuth Bearer should be used over TLS (SASL_SSL) in production—tokens must be protected in transit. The current code forces
FLB_SASL_OAUTHBEARER_ENABLED ONon non-Windows without checkingFLB_TLS, and the comment at line 40–42 (claiming "needs both SSL and Cyrus SASL") contradicts the actual code behavior. Additionally,FLB_KAFKA_MSK_IAM_ENABLEDis unconditionally set toON, even if OAUTHBEARER is disabled.Gate OAUTHBEARER and MSK IAM on TLS availability to align with security guidance:
Proposed patch
else() - # Non-Windows platforms: OAuth Bearer is built-in, always enabled - set(FLB_SASL_OAUTHBEARER_ENABLED ON) + # Non-Windows platforms: OAuth Bearer requires TLS for secure token transport + if(FLB_TLS) + set(FLB_SASL_OAUTHBEARER_ENABLED ON) + else() + set(FLB_SASL_OAUTHBEARER_ENABLED OFF) + endif() endif() -# MSK IAM requires OAuth Bearer support (which is always available now) -set(FLB_KAFKA_MSK_IAM_ENABLED ON) +# MSK IAM requires OAuth Bearer support +set(FLB_KAFKA_MSK_IAM_ENABLED ${FLB_SASL_OAUTHBEARER_ENABLED})
🤖 Fix all issues with AI agents
In `@include/fluent-bit/aws/flb_aws_compress.h`:
- Around line 29-30: The comment incorrectly cites "RFC 7093 Framing Format";
update the comment near flb_snappy_compress_framed_data() to reference the
correct Snappy framing specification (the Google Snappy "framing_format.txt" /
Snappy framing format) instead of RFC 7093, e.g., change the phrase to "Snappy
framing format (Google Snappy framing_format.txt)" and keep the note that
streaming/concatenation is handled via flb_snappy_compress_framed_data().
In `@plugins/out_s3/s3_store.c`:
- Around line 198-233: On append failure in s3_store_buffer_put (when
flb_fstore_file_append returns non-zero), delete/cleanup the partially created
buffer file to avoid leaving a stuck entry: call the existing cleanup function
(e.g., s3_store_file_delete(ctx, s3_file) or cleanup_failed_file(ctx, s3_file))
before returning -1, and ensure any associated resources are released; do not
adjust ctx->current_buffer_size (since size was not yet incremented), but ensure
s3_file->fsf and s3_file references are properly removed so the file is not
retried.
In `@src/flb_parquet_impl.cpp`:
- Around line 959-960: The outer variables output_stream
(std::shared_ptr<arrow::io::FileOutputStream>) and writer
(std::unique_ptr<parquet::arrow::FileWriter>) are being shadowed by local auto
declarations when assigning the results of output_stream_result.ValueOrDie() and
writer_result.ValueOrDie(), so the cleanup path still sees null outer variables
and never closes the handles; fix by removing the `auto` shadowing and assign
directly to the existing outer variables (e.g., output_stream =
std::move(output_stream_result).ValueOrDie(); and writer =
std::move(writer_result).ValueOrDie();) ensuring the types match, so the error
cleanup that calls writer->Close() and output_stream->Close() operates on the
actual objects.
In `@tests/runtime/out_s3_blob.c`:
- Around line 138-151: The test uses the POSIX-only usleep(100000) inside the
retry loop in out_s3_blob.c; replace that call with the cross-platform
S3_TEST_SLEEP_MS(100) macro (from s3_test_helpers.h) to sleep 100 ms, and ensure
s3_test_helpers.h is included at the top of the file if not already so the macro
is available.
In `@tests/runtime/out_s3_format.c`:
- Around line 17-264: Several tests call flb_lib_push() and ignore its return
value; capture the return from each flb_lib_push(...) call in
flb_test_json_format_basic, flb_test_json_with_date_key, and
flb_test_json_streaming_conversion and assert success using the existing test
helper (e.g., TEST_CHECK(push_ret == 0) or TEST_CHECK(push_ret >= 0) as
appropriate). Locate all flb_lib_push invocations in those functions and add a
local int (e.g., push_ret) to store the result, then add a corresponding
TEST_CHECK immediately after each call to fail the test on push errors.
In `@tests/runtime/out_s3_multipart.c`:
- Around line 125-174: The tests call flb_lib_push(...) but do not check its
return value, which can hide failures and lead to misleading "mock not
exhausted" errors; update each test (e.g., flb_test_upload_part_success and the
other test functions referenced) to capture the return from flb_lib_push into a
local int (e.g., ret or rc) and assert it succeeded via TEST_CHECK(ret == 0) (or
appropriate check) immediately after the call, ensuring every flb_lib_push
invocation is validated.
♻️ Duplicate comments (5)
tests/internal/parquet.c (1)
289-333: Temp file paths are collision-prone (/tmp, predictable names). This can break parallel test runs and leave stale artifacts.Consider using
mkstemp()(or a shared Fluent Bit test temp helper, if available) instead of hard-coded/tmp/flb_test_*.{msgpack,parquet}.Proposed direction (make filenames unique)
@@ static int init_test_context(test_context *ctx, const char *test_name) { @@ - snprintf(ctx->msgpack_file, sizeof(ctx->msgpack_file), - "/tmp/flb_test_%s.msgpack", test_name); - snprintf(ctx->parquet_file, sizeof(ctx->parquet_file), - "/tmp/flb_test_%s.parquet", test_name); + snprintf(ctx->msgpack_file, sizeof(ctx->msgpack_file), + "/tmp/flb_test_%s_%ld.msgpack", test_name, (long) getpid()); + snprintf(ctx->parquet_file, sizeof(ctx->parquet_file), + "/tmp/flb_test_%s_%ld.parquet", test_name, (long) getpid()); @@ }Also applies to: 751-786, 1178-1222, 1424-1506, 1508-1603, 1605-1647, 1649-1704, 1706-1774, 1776-1870, 1872-1939, 1941-2016
tests/runtime/out_s3_multipart.c (1)
6-12: Avoid includingaws_client_mock.cdirectly (possible duplicate symbols depending on test linking).If these runtime tests are linked into a single executable, including the
.cfile in multiple test translation units will cause duplicate definitions. Prefer compilingaws_client_mock.conce and including only the header.#!/bin/bash # Verify whether aws_client_mock.c is being compiled as a separate TU and/or included by multiple tests. rg -n --fixed-strings '#include "../include/aws_client_mock.c"' tests/ || true rg -n "aws_client_mock\.c" -S tests/ CMakeLists.txt cmake/ || truetests/runtime/out_s3_config.c (2)
6-12: Avoid includingaws_client_mock.cdirectly (possible duplicate symbols depending on test linking).#!/bin/bash rg -n --fixed-strings '#include "../include/aws_client_mock.c"' tests/ || true rg -n "add_executable|add_library|target_sources" tests/ CMakeLists.txt cmake/ -S | rg -n "out_s3_config|aws_client_mock" || true
18-83: These tests don’t validates3_key_format/endpoint output without URI/path assertions.Right now they mostly prove “multipart happened”, not that the configured key/endpoint was used. Add
expect(URI, ...)/ host assertions (as supported by the mock framework) so regressions in expansion don’t slip through.Also applies to: 85-150, 152-230, 364-429
tests/runtime/out_s3_format.c (1)
8-12: Avoid includingaws_client_mock.cdirectly (possible duplicate symbols depending on test linking).#!/bin/bash rg -n --fixed-strings '#include "../include/aws_client_mock.c"' tests/ || true rg -n "aws_client_mock\.c" -S tests/ CMakeLists.txt cmake/ || true
🧹 Nitpick comments (10)
plugins/out_s3/s3_multipart.c (1)
36-65: Makesleep_ms()overflow-safe and type-appropriate across platforms.On POSIX,
usleep(milliseconds * 1000)can overflowintmath; on WindowsSleep()takesDWORD. Since callers can growmillisecondswith retry settings, preferuint32_t+ a capped conversion (or usenanosleepon POSIX).Proposed diff
-static inline void sleep_ms(int milliseconds) { - usleep(milliseconds * 1000); -} +static inline void sleep_ms(uint32_t milliseconds) { + /* Cap to avoid overflow and overly long sleeps */ + if (milliseconds > 30000) { + milliseconds = 30000; + } + usleep((useconds_t) milliseconds * 1000u); +}plugins/out_s3/s3_auth.c (1)
155-171: Check return codes from header/auth helpers (silent auth/header failures).
flb_http_add_header,flb_http_basic_auth, andflb_http_bearer_authreturn status; currently ignored. If any fail, you’ll send requests without expected headers/auth but still log “retrieved successfully” on 200s.Consider making
setup_http_client_headers()fail fast if any call returns-1.Also applies to: 173-260
plugins/out_s3/s3_blob.c (1)
124-143: Avoid network calls while holding the blob DB lock.
s3_blob_recover_state()holdsflb_blob_db_lock()(Line 132) whilerecover_stale_files()/handle_aborted_files()can callabort_multipart_upload()(which does HTTP). This can block DB operations for extended periods and risks lock contention during startup.Prefer: fetch N rows + needed fields → unlock → perform abort/validation calls → re-lock to update state.
Also applies to: 145-343
tests/runtime/out_s3_queue.c (1)
7-8: Including .c file directly is unusual but acceptable for tests.While including
.cfiles directly can cause symbol conflicts in larger builds, this pattern is acceptable for isolated test files where the mock implementation needs to be compiled together with the test. Consider verifying that this doesn't cause issues if multiple test files are linked together.If you encounter linker errors about duplicate symbols when running multiple S3 tests together, consider compiling
aws_client_mock.cas a separate object file and linking it.tests/internal/parquet.c (3)
69-136: ChunkIO header CRC fields are always zero; tests may get brittle if validation is enabled.If ChunkIO validation/CRC checks get turned on for tests in the future, this helper could start failing unexpectedly. At minimum, consider documenting that CRC checks are disabled, or compute/write CRC32 for the “content section” to match the layout described in the comment.
251-286: Surfaceerror_msgon schema parse failures to make test failures actionable.Right now schema parse failures just return
-1with no context; printing/loggingerror_msg(or returning it to the caller) would significantly reduce debugging time.
350-2079: Cleanup isn’t guaranteed ifTEST_CHECKaborts the test function.If
TEST_CHECK()short-circuits (common in test frameworks), failures may leak temp files and buffers. Consider agoto cleanup;pattern per test (or a helper macro) so cleanup runs on all paths.tests/runtime/out_s3_multipart.c (1)
14-67: Make cleanup robust on failures (temp db/dir + mock generator).Consider a
cleanup:label that always clears the mock generator and deletes temp artifacts even if aTEST_CHECKaborts early.Also applies to: 69-123, 227-327, 329-380, 382-434, 436-492
tests/runtime/out_s3_config.c (1)
152-230: Avoid fixed sleeps (S3_TEST_SLEEP_MS(2000))—slow and can still be flaky.If the goal is to force a second upload for
$INDEX, prefer forcing rotation deterministically (e.g., smallertotal_file_size, or explicitly waiting for the first multipart cycle to complete before pushing again).tests/runtime/out_s3_format.c (1)
432-509: Makeformatexplicit where the test intent is “JSON + compression/log_key”.E.g.
flb_test_compression_snappy,flb_test_log_key_extraction, andflb_test_log_key_with_compressiondon’t set"format", "json". If the plugin default ever changes, these tests may start validating the wrong behavior.Also applies to: 511-679
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (54)
.github/workflows/unit-tests.yamlCMakeLists.txtcmake/kafka.cmakecmake/parquet.cmakeinclude/fluent-bit/aws/flb_aws_compress.hinclude/fluent-bit/flb_aws_util.hinclude/fluent-bit/flb_blob_db.hinclude/fluent-bit/flb_parquet.hinclude/fluent-bit/flb_snappy.hplugins/in_blob/blob_file.cplugins/out_kinesis_firehose/firehose.cplugins/out_s3/CMakeLists.txtplugins/out_s3/s3.cplugins/out_s3/s3.hplugins/out_s3/s3_auth.cplugins/out_s3/s3_auth.hplugins/out_s3/s3_blob.cplugins/out_s3/s3_blob.hplugins/out_s3/s3_multipart.cplugins/out_s3/s3_multipart.hplugins/out_s3/s3_queue.cplugins/out_s3/s3_queue.hplugins/out_s3/s3_store.cplugins/out_s3/s3_store.hplugins/out_s3/s3_stream.cplugins/out_s3/s3_stream.hsrc/CMakeLists.txtsrc/aws/CMakeLists.txtsrc/aws/compression/CMakeLists.txtsrc/aws/flb_aws_compress.csrc/aws/flb_aws_util.csrc/flb_blob_db.csrc/flb_parquet_impl.cppsrc/flb_signv4.csrc/flb_snappy.ctests/include/aws_client_mock.ctests/include/aws_client_mock.htests/include/s3_test_helpers.htests/internal/CMakeLists.txttests/internal/aws_compress.ctests/internal/aws_util.ctests/internal/fuzzers/aws_util_fuzzer.ctests/internal/parquet.ctests/runtime/CMakeLists.txttests/runtime/out_s3.ctests/runtime/out_s3_blob.ctests/runtime/out_s3_config.ctests/runtime/out_s3_edge_cases.ctests/runtime/out_s3_error_handling.ctests/runtime/out_s3_format.ctests/runtime/out_s3_multipart.ctests/runtime/out_s3_queue.ctests/runtime/out_s3_recovery.ctests/runtime/out_s3_special_features.c
💤 Files with no reviewable changes (3)
- tests/runtime/out_s3.c
- src/aws/CMakeLists.txt
- src/aws/compression/CMakeLists.txt
🚧 Files skipped from review as they are similar to previous changes (11)
- plugins/in_blob/blob_file.c
- tests/include/aws_client_mock.c
- tests/include/aws_client_mock.h
- plugins/out_kinesis_firehose/firehose.c
- cmake/parquet.cmake
- tests/internal/fuzzers/aws_util_fuzzer.c
- tests/include/s3_test_helpers.h
- plugins/out_s3/s3_blob.h
- plugins/out_s3/s3_auth.h
- tests/runtime/out_s3_error_handling.c
- include/fluent-bit/flb_parquet.h
🧰 Additional context used
🧠 Learnings (18)
📚 Learning: 2025-08-07T10:15:46.187Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10691
File: .github/workflows/unit-tests.yaml:94-101
Timestamp: 2025-08-07T10:15:46.187Z
Learning: Apache Arrow latest packages have changing checksums, making hardcoded checksum verification impractical for CI workflows that download the latest version.
Applied to files:
.github/workflows/unit-tests.yamlsrc/flb_parquet_impl.cpp
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
.github/workflows/unit-tests.yamlsrc/CMakeLists.txttests/internal/CMakeLists.txtsrc/flb_parquet_impl.cppcmake/kafka.cmaketests/internal/aws_compress.cCMakeLists.txt
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit, the correct CMake flag for using system librdkafka is `FLB_PREFER_SYSTEM_LIB_KAFKA=ON`.
Applied to files:
.github/workflows/unit-tests.yamlsrc/CMakeLists.txtcmake/kafka.cmakeCMakeLists.txt
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
.github/workflows/unit-tests.yamlinclude/fluent-bit/flb_snappy.hsrc/CMakeLists.txtsrc/aws/flb_aws_compress.csrc/flb_snappy.csrc/flb_parquet_impl.cppcmake/kafka.cmaketests/internal/parquet.ctests/internal/aws_compress.cinclude/fluent-bit/aws/flb_aws_compress.hplugins/out_s3/s3_stream.cCMakeLists.txt
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use `#ifdef` FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Applied to files:
.github/workflows/unit-tests.yamlinclude/fluent-bit/flb_snappy.hsrc/CMakeLists.txtsrc/aws/flb_aws_compress.csrc/flb_parquet_impl.cppcmake/kafka.cmaketests/internal/parquet.ctests/internal/aws_compress.cinclude/fluent-bit/aws/flb_aws_compress.hplugins/out_s3/s3_stream.cCMakeLists.txt
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.
Applied to files:
.github/workflows/unit-tests.yamlinclude/fluent-bit/flb_snappy.hsrc/CMakeLists.txtsrc/aws/flb_aws_compress.csrc/flb_snappy.ccmake/kafka.cmaketests/internal/parquet.ctests/internal/aws_compress.cinclude/fluent-bit/aws/flb_aws_compress.hplugins/out_s3/s3_stream.cCMakeLists.txt
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like `#ifdef` FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.
Applied to files:
.github/workflows/unit-tests.yamlinclude/fluent-bit/flb_snappy.hsrc/CMakeLists.txtsrc/aws/flb_aws_compress.csrc/flb_snappy.ccmake/kafka.cmaketests/internal/parquet.ctests/internal/aws_compress.cinclude/fluent-bit/aws/flb_aws_compress.hplugins/out_s3/s3_stream.cCMakeLists.txt
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.
Applied to files:
.github/workflows/unit-tests.yamlsrc/CMakeLists.txtsrc/aws/flb_aws_compress.csrc/flb_parquet_impl.cppcmake/kafka.cmaketests/internal/aws_compress.cCMakeLists.txt
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.
Applied to files:
.github/workflows/unit-tests.yamlsrc/CMakeLists.txtsrc/flb_parquet_impl.cpptests/internal/aws_compress.cplugins/out_s3/s3_store.cCMakeLists.txt
📚 Learning: 2025-09-04T12:35:22.872Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:1339-1344
Timestamp: 2025-09-04T12:35:22.872Z
Learning: In the Fluent Bit S3 plugin, the user prefers to maintain current retry_limit behavior without special handling for FLB_OUT_RETRY_UNLIMITED (-1), as there's no documentation indicating -1 should be used for infinite retries and consistency with current logic is preferred.
Applied to files:
plugins/out_s3/s3_auth.cplugins/out_s3/s3_queue.cplugins/out_s3/s3_multipart.c
📚 Learning: 2025-12-17T18:45:02.283Z
Learnt from: edsiper
Repo: fluent/fluent-bit PR: 11286
File: src/flb_http_client.c:1657-1715
Timestamp: 2025-12-17T18:45:02.283Z
Learning: In the Fluent Bit OAuth2 implementation (src/flb_oauth2.c), the function flb_oauth2_get_access_token returns a pointer to the cached token stored in the OAuth2 context (ctx->access_token), not a new allocation. The OAuth2 context manages the token lifecycle, so callers must not free the returned token pointer.
Applied to files:
plugins/out_s3/s3_auth.cplugins/out_s3/s3.h
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.
Applied to files:
include/fluent-bit/flb_snappy.hinclude/fluent-bit/flb_aws_util.h
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.
Applied to files:
include/fluent-bit/flb_snappy.hsrc/CMakeLists.txtsrc/aws/flb_aws_compress.csrc/flb_snappy.ctests/internal/parquet.ctests/internal/aws_compress.cinclude/fluent-bit/aws/flb_aws_compress.hCMakeLists.txt
📚 Learning: 2025-12-22T05:39:02.291Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11250
File: lib/ripser-1.2.1/Makefile:1-18
Timestamp: 2025-12-22T05:39:02.291Z
Learning: In fluent/fluent-bit, the Makefile at lib/ripser-1.2.1/Makefile is imported from upstream Ripser and is not used in the actual build process. The project uses CMake for building (lib/ripser-1.2.1/CMakeLists.txt), so changes to the imported Makefile are not necessary.
Applied to files:
src/CMakeLists.txt
📚 Learning: 2025-09-04T07:28:37.083Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10832
File: src/wasm/CMakeLists.txt:112-131
Timestamp: 2025-09-04T07:28:37.083Z
Learning: In fluent-bit CMake files, the user cosmo0920 prefers treating Git as a command rather than a package, emphasizing that Git is not a pkg-config retrievable package but just a command.
Applied to files:
src/CMakeLists.txtsrc/flb_parquet_impl.cpp
📚 Learning: 2025-09-14T09:46:09.531Z
Learnt from: aminvakil
Repo: fluent/fluent-bit PR: 10844
File: conf/fluent-bit:13-15
Timestamp: 2025-09-14T09:46:09.531Z
Learning: For fluent-bit Debian packaging, /opt/fluent-bit/bin/ is the appropriate installation path since the package may be installed from non-official Debian sources, making /opt compliant with FHS for optional software packages.
Applied to files:
src/CMakeLists.txt
📚 Learning: 2025-09-04T12:35:36.904Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:3275-3282
Timestamp: 2025-09-04T12:35:36.904Z
Learning: The out_s3 plugin intentionally uses a simple numeric comparison for retry_limit (chunk->failures >= ctx->ins->retry_limit) rather than the standard Fluent Bit pattern that checks for FLB_OUT_RETRY_UNLIMITED (-1). The maintainer wants to keep this current behavior for consistency within the plugin.
Applied to files:
plugins/out_s3/s3_stream.cplugins/out_s3/s3_queue.cplugins/out_s3/s3_multipart.c
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
Applied to files:
plugins/out_s3/s3_stream.csrc/aws/flb_aws_util.c
🧬 Code graph analysis (21)
plugins/out_s3/s3_auth.c (6)
include/fluent-bit/flb_mem.h (1)
flb_free(126-128)src/flb_utils.c (1)
flb_utils_url_split(1531-1667)src/flb_upstream.c (1)
flb_upstream_create_url(416-481)src/flb_output.c (1)
flb_output_upstream_set(1634-1725)src/flb_http_client.c (4)
flb_http_add_header(1015-1047)flb_http_basic_auth(1394-1398)flb_http_bearer_auth(1406-1441)flb_http_client(866-911)src/aws/flb_aws_util.c (1)
flb_aws_uri_encode_path(1490-1526)
include/fluent-bit/flb_snappy.h (1)
src/flb_snappy.c (1)
flb_snappy_compress_framed_data(134-255)
plugins/out_s3/s3_stream.h (1)
plugins/out_s3/s3_stream.c (4)
stream_compress_file(142-213)stream_process_msgpack_file(563-631)stream_json_processor(633-661)stream_log_key_processor(736-765)
src/flb_signv4.c (1)
src/flb_sds.c (2)
flb_sds_destroy(389-399)flb_sds_create_len(58-76)
src/aws/flb_aws_compress.c (1)
src/flb_snappy.c (1)
flb_snappy_compress_framed_data(134-255)
tests/runtime/out_s3_blob.c (2)
tests/include/s3_test_helpers.h (5)
s3_test_next(438-446)s3_test_create_temp_db_path(137-194)s3_test_create_temp_store_dir(204-257)s3_test_cleanup_temp_db(417-430)s3_test_cleanup_temp_store_dir(384-409)tests/include/aws_client_mock.c (3)
flb_aws_client_mock_configure_generator(42-47)flb_aws_client_get_mock_generator(137-140)flb_aws_client_mock_clear_generator_instance(71-74)
src/flb_snappy.c (1)
include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
tests/runtime/out_s3_edge_cases.c (3)
tests/include/s3_test_helpers.h (5)
s3_test_next(438-446)s3_test_create_temp_db_path(137-194)s3_test_create_temp_store_dir(204-257)s3_test_cleanup_temp_db(417-430)s3_test_cleanup_temp_store_dir(384-409)tests/include/aws_client_mock.c (3)
flb_aws_client_mock_configure_generator(42-47)flb_aws_client_get_mock_generator(137-140)flb_aws_client_mock_clear_generator_instance(71-74)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
src/flb_parquet_impl.cpp (1)
include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
tests/internal/parquet.c (3)
src/flb_parquet_impl.cpp (1)
field_name(548-548)src/flb_pack.c (1)
flb_pack_json(530-535)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
include/fluent-bit/flb_aws_util.h (1)
src/aws/flb_aws_util.c (2)
flb_get_s3_key(1311-1389)flb_aws_uri_encode_path(1490-1526)
tests/internal/aws_compress.c (2)
include/fluent-bit/flb_mem.h (1)
flb_free(126-128)src/aws/flb_aws_compress.c (1)
flb_aws_compression_get_type(87-108)
plugins/out_s3/s3_multipart.h (1)
plugins/out_s3/s3_multipart.c (13)
flb_s3_calculate_optimal_part_size(164-216)s3_multipart_get_md5_base64(218-248)s3_multipart_create_headers(357-417)s3_multipart_upload_new(450-499)s3_multipart_upload_destroy(419-444)s3_multipart_initiate(501-558)s3_multipart_upload_part(560-652)s3_multipart_complete(654-735)s3_multipart_abort(737-799)s3_multipart_abort_with_url(1021-1042)s3_multipart_check_upload_exists(805-879)s3_multipart_upload_file_part(893-1015)s3_multipart_upload_file(1202-1293)
src/aws/flb_aws_util.c (2)
src/flb_sds.c (5)
flb_sds_create_size(92-95)flb_sds_cat(120-141)flb_sds_destroy(389-399)flb_sds_create_len(58-76)flb_sds_printf(336-387)include/fluent-bit/flb_compat.h (1)
gmtime_r(76-82)
plugins/out_s3/s3_store.h (1)
plugins/out_s3/s3_store.c (1)
flb_chunk_file_open(521-557)
plugins/out_s3/s3_store.c (3)
src/flb_sds.c (4)
flb_sds_create_size(92-95)flb_sds_printf(336-387)flb_sds_destroy(389-399)flb_sds_create(78-90)src/flb_fstore.c (4)
flb_fstore_file_create(153-190)flb_fstore_file_append(298-329)flb_fstore_create(479-525)flb_fstore_file_inactive(238-256)include/fluent-bit/flb_compat.h (1)
gmtime_r(76-82)
include/fluent-bit/flb_blob_db.h (1)
src/flb_blob_db.c (18)
flb_blob_file_update_s3_key(619-650)flb_blob_file_update_s3_key(2053-2058)flb_blob_db_file_get_next_aborted(822-953)flb_blob_db_file_get_next_aborted(1934-1945)flb_blob_db_file_fetch_part_ids(1479-1553)flb_blob_db_file_fetch_part_ids(2026-2033)flb_blob_db_file_fetch_part_count(1555-1584)flb_blob_db_file_fetch_part_count(2035-2039)flb_blob_db_file_fetch_all_parts(1586-1672)flb_blob_db_file_fetch_all_parts(2060-2069)flb_blob_db_file_get_next_pending(1678-1806)flb_blob_db_file_get_next_pending(2041-2051)flb_blob_db_file_part_check_uploaded(1812-1847)flb_blob_db_file_part_check_uploaded(2071-2076)flb_blob_db_file_parts_in_progress(1098-1133)flb_blob_db_file_parts_in_progress(2078-2083)flb_blob_db_reset_zombie_parts(1850-1874)flb_blob_db_reset_zombie_parts(2085-2088)
plugins/out_s3/s3_blob.c (4)
plugins/out_s3/s3_multipart.c (2)
s3_multipart_upload_new(450-499)s3_multipart_abort(737-799)plugins/out_s3/s3_auth.c (1)
s3_auth_fetch_presigned_url(359-402)src/flb_input_blob.c (1)
flb_input_blob_file_get_info(31-112)src/aws/flb_aws_util.c (1)
flb_get_s3_key(1311-1389)
plugins/out_s3/s3_multipart.c (5)
plugins/out_s3/s3.c (1)
write_seq_index(127-147)src/flb_sds.c (6)
flb_sds_create_len(58-76)flb_sds_create(78-90)flb_sds_create_size(92-95)flb_sds_destroy(389-399)flb_sds_printf(336-387)flb_sds_cat(120-141)src/aws/flb_aws_util.c (4)
flb_aws_uri_encode_path(1490-1526)flb_get_s3_key(1311-1389)flb_aws_xml_get_val(623-655)flb_aws_print_xml_error(586-610)src/flb_http_client.c (1)
flb_http_client_destroy(1821-1828)plugins/out_s3/s3_auth.c (1)
s3_auth_fetch_presigned_url(359-402)
plugins/out_s3/s3_queue.h (1)
plugins/out_s3/s3_queue.c (11)
s3_queue_remove(307-316)s3_queue_buffer_chunk(356-378)s3_queue_add_file(116-130)s3_queue_add_part(132-202)s3_queue_add_pending_file(259-272)s3_queue_add_file_unlocked(58-111)s3_queue_add_pending_file_unlocked(208-254)s3_queue_process_entry(748-807)s3_queue_recover_all(850-895)s3_queue_recover_from_database(988-1076)s3_queue_recover_from_fstore(1141-1201)
src/flb_blob_db.c (3)
src/flb_sqldb.c (2)
flb_sqldb_lock(173-178)flb_sqldb_unlock(180-185)lib/cfl/src/cfl_sds.c (2)
cfl_sds_create(113-125)cfl_sds_destroy(127-137)include/fluent-bit/flb_mem.h (2)
flb_calloc(84-96)flb_free(126-128)
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
Signed-off-by: Arbin Cheng <arbin.cheng@coins.ph> Signed-off-by: Arbin <arbin.cheng@coins.ph>
Signed-off-by: Arbin Cheng <arbin.cheng@coins.ph> Signed-off-by: Arbin <arbin.cheng@coins.ph>
Signed-off-by: Arbin Cheng <arbin.cheng@coins.ph> Signed-off-by: Arbin <arbin.cheng@coins.ph>
Signed-off-by: Arbin Cheng <arbin.cheng@coins.ph> Signed-off-by: Arbin <arbin.cheng@coins.ph>
Signed-off-by: Arbin Cheng <arbin.cheng@coins.ph> Signed-off-by: Arbin <arbin.cheng@coins.ph>
Signed-off-by: Arbin Cheng <arbin.cheng@coins.ph> Signed-off-by: Arbin <arbin.cheng@coins.ph>
Signed-off-by: Arbin Cheng <arbin.cheng@coins.ph> Signed-off-by: Arbin <arbin.cheng@coins.ph>
0e0ad38 to
2aa4b4b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/aws/flb_aws_util.c (1)
796-832: BUG:replace_uri_tokens()returns SDS with incorrect length header (likely 0).
replace_uri_tokens()allocates withflb_sds_create_size()but then writes bytes directly and returns without callingflb_sds_len_set(). Any consumer usingflb_sds_len(result)(common pattern) will see the wrong length and can truncate/omit content.Proposed fix
static char* replace_uri_tokens(const char* original_string, const char* current_word, const char* new_word) { char *result; int i = 0; @@ result[i] = '\0'; + /* Keep SDS header in sync with the actual string length */ + flb_sds_len_set(result, i); return result; }
🤖 Fix all issues with AI agents
In `@plugins/out_s3/s3_auth.c`:
- Around line 393-394: The concatenation using flb_sds_printf into full_url with
ctx->authorization_endpoint_url and url_path can produce a double slash when
authorization_endpoint_url ends with '/' because url_path (from
build_presigned_url_path) always begins with '/'; update the logic in the code
around the flb_sds_printf call (referencing tmp, full_url,
ctx->authorization_endpoint_url, url_path, and build_presigned_url_path) to
ensure there is exactly one slash between the endpoint and path — either by
trimming a trailing '/' from ctx->authorization_endpoint_url before formatting
or by skipping the leading '/' on url_path when concatenating so flb_sds_printf
never forms "//".
In `@plugins/out_s3/s3_blob.c`:
- Around line 433-504: s3_blob_process_events can leak allocated cfl_sds_t
values when flb_input_blob_file_get_info() returns -1 but has already allocated
source or file_path; update the error branch after the
flb_input_blob_file_get_info call to defensively free any non-NULL source and
file_path using cfl_sds_destroy(source) and cfl_sds_destroy(file_path) (and then
set them to NULL) before continuing, so partial allocations are not leaked.
In `@plugins/out_s3/s3_multipart.c`:
- Around line 408-412: The storage_class header block leaves the header count
variable n unchanged, which will leave s3_headers/insertion state inconsistent;
in the conditional where ctx->storage_class is assigned to s3_headers[n] (using
storage_class_header and setting val/val_len), increment n (e.g., n++)
immediately after that assignment so the header index is advanced for subsequent
headers and keeps s3_headers/n consistent.
In `@plugins/out_s3/s3_queue.c`:
- Around line 911-917: The code uses access(file_path, F_OK) in s3_queue.c
(checking file existence) which lacks Windows portability; add a compatibility
wrapper in flb_compat.h (similar to existing realpath/basename adapters) or wrap
the call with `#ifdef` FLB_SYSTEM_WINDOWS to call _access() on Windows toolchains,
then replace the direct access(...) usage in the block that logs
file_path/file_id and calls flb_blob_db_file_delete(&ctx->blob_db, file_id) to
use that wrapper (e.g., flb_compat_access or conditional _access) so older
MSVC/UCRT toolchains work correctly.
In `@plugins/out_s3/s3_store.h`:
- Line 32: Replace all accesses to the renamed struct field file_path with the
new name stream_path in s3_queue.c (update expressions like store->file_path,
ctx->file_path, etc. to store->stream_path or ctx->stream_path). Ensure you
update every occurrence that references the field (all member accesses and any
string operations using that member), keep the flb_sds_t usage intact, and
rebuild to confirm there are no remaining references to file_path.
In `@plugins/out_s3/s3_stream.c`:
- Around line 379-440: The rename() call moves temp_template to final_path while
ctx->temp_fp remains open, which breaks on Windows; close the FILE*
(ctx->temp_fp) before calling rename(), set ctx->temp_fp to NULL, then perform
the rename and only after a successful rename set ctx->temp_path to final_path
(and reopen the file for further writes only if later logic requires keeping it
open); ensure error-handling paths still close the stream and clean up
temp_template/final_path (use the same flb_plg_error / flb_errno cleanup
pattern) and reference the existing symbols: ctx->temp_fp, temp_template,
final_path, rename(), ctx->temp_path, s3_unlink, flb_free, flb_sds_destroy.
In `@tests/include/s3_test_helpers.h`:
- Around line 137-194: The comment incorrectly calls mkstemps() "POSIX"; update
the comment near the mkstemps usage to remove the POSIX label and instead
describe mkstemps as a BSD/Glibc extension (available on Linux glibc and macOS),
and make the same change in the comment inside s3_test_create_temp_db_path
(tests/include/s3_test_helpers.h) so both references accurately state mkstemps
is a BSD/Glibc extension rather than POSIX.
In `@tests/internal/aws_compress.c`:
- Around line 77-85: The test comment incorrectly cites "RFC 7093" as the Snappy
framing spec; update the comment block in tests/internal/aws_compress.c to
reference the correct specification consistent with src/flb_snappy.c (e.g., the
Snappy framing format spec or the correct RFC/URL used there) so that the
description for flb_snappy_compress_framed_data and
flb_snappy_uncompress_framed_data matches the corrected wording in
src/flb_snappy.c.
In `@tests/runtime/out_s3_edge_cases.c`:
- Around line 7-11: Remove the direct inclusion of aws_client_mock.c from the
test and replace it with a proper header-based interface: create
aws_client_mock.h that contains declarations for every public function and any
needed types/extern variables that are defined in aws_client_mock.c (the same
symbols the tests call), add include guards, and have tests include
aws_client_mock.h (not the .c); then update the test build rules so
aws_client_mock.c is compiled once and linked into the test binary alongside the
test object (keep the existing includes of s3_test_helpers.h and s3.h
unchanged). Ensure function names and types declared in aws_client_mock.h
exactly match the definitions in aws_client_mock.c so symbols like the mock
init/teardown and any mocked S3 call functions resolve at link time.
♻️ Duplicate comments (7)
src/flb_parquet_impl.cpp (1)
981-991: CRITICAL:arrow::MakeBuilder()argument order likely wrong (compile failure on Arrow 11+).You call
arrow::MakeBuilder(field->type(), arrow::default_memory_pool())(Line 984, Line 1114). For Arrow 11+, the common/expected order is pool first (or a different overload).Proposed fix
- auto builder_result = arrow::MakeBuilder(field->type(), arrow::default_memory_pool()); + auto builder_result = arrow::MakeBuilder(arrow::default_memory_pool(), field->type()); @@ - auto builder_result = arrow::MakeBuilder(field->type(), - arrow::default_memory_pool()); + auto builder_result = arrow::MakeBuilder(arrow::default_memory_pool(), + field->type());Apache Arrow C++: what is the signature / overload set for arrow::MakeBuilder in Arrow 11+ and Arrow 12+?Also applies to: 1110-1120
tests/runtime/out_s3_multipart.c (1)
7-8: Stop includingaws_client_mock.cdirectly.Same issue as other test files - include only the header to avoid duplicate symbol definitions.
tests/runtime/out_s3_format.c (1)
9-10: Stop includingaws_client_mock.cdirectly.Same issue as other test files.
tests/runtime/out_s3_blob.c (1)
7-8: Stop includingaws_client_mock.cdirectly.Same issue as other test files.
tests/runtime/out_s3_config.c (1)
7-8: Stop includingaws_client_mock.cdirectly.Same issue as other test files.
tests/runtime/out_s3_error_handling.c (1)
44-101: Environment variable cleanup may still leak across tests (FLB_S3_PLUGIN_UNDER_TEST).
If tests share a process, leaving it set can affect subsequent suites. Consider adding a helper to unset it in teardown.Also applies to: 104-161, 164-234, 237-294, 297-354, 357-422, 425-484, 487-623, 625-691
tests/internal/parquet.c (1)
289-333: Temp paths are collision-prone (/tmp/flb_test_<name>.*) — switch tomkstemp()/PID suffix.
Current naming will collide under parallel test runs and is not portable.Proposed direction (mkstemp-based unique files)
static int init_test_context(test_context *ctx, const char *test_name) { @@ - snprintf(ctx->msgpack_file, sizeof(ctx->msgpack_file), - "/tmp/flb_test_%s.msgpack", test_name); - snprintf(ctx->parquet_file, sizeof(ctx->parquet_file), - "/tmp/flb_test_%s.parquet", test_name); + /* Prefer mkstemp to avoid collisions (or add getpid() at minimum). */ + snprintf(ctx->msgpack_file, sizeof(ctx->msgpack_file), + "/tmp/flb_test_%s_msgpack_XXXXXX", test_name); + snprintf(ctx->parquet_file, sizeof(ctx->parquet_file), + "/tmp/flb_test_%s_parquet_XXXXXX", test_name); + int fd1 = mkstemp(ctx->msgpack_file); + if (fd1 != -1) { close(fd1); } + int fd2 = mkstemp(ctx->parquet_file); + if (fd2 != -1) { close(fd2); }Also applies to: 752-786, 1872-1939
🧹 Nitpick comments (12)
tests/internal/fuzzers/aws_util_fuzzer.c (1)
73-73: Consider fuzzing the newfile_pathparameter for better coverage.The API update is correct. However, passing only
NULLfor the newfile_pathparameter limits fuzz coverage. Theflb_get_s3_keyfunction has significant logic for$FILE_PATHand$FILE_NAMEtemplate substitution that won't be exercised.♻️ Suggested enhancement
+ char *file_path = NULL; + if (size > 250) { + file_path = get_null_terminated(50, &data, &size); + } + if (format && tag && tag_delimiter) { if (!initialization_crutch()) { flb_sds_t s3_key_format = NULL; - s3_key_format = flb_get_s3_key(format, t, tag, tag_delimiter, 0, NULL); + s3_key_format = flb_get_s3_key(format, t, tag, tag_delimiter, 0, file_path); if (s3_key_format) { flb_sds_destroy(s3_key_format); }Don't forget to free
file_pathin the cleanup section.plugins/out_s3/s3_auth.c (1)
230-230: Consider using debug log level for routine success messages.
flb_plg_infoon every successful presigned URL retrieval may flood logs in high-throughput scenarios. Considerflb_plg_debugfor routine operations.♻️ Suggested change
- flb_plg_info(ctx->ins, "Pre-signed URL retrieved successfully"); + flb_plg_debug(ctx->ins, "Pre-signed URL retrieved successfully");plugins/out_s3/s3_multipart.h (1)
52-107: Prefer const-correctness for read-only buffers/URLs.Several APIs accept
char *pre_signed_url/char *bodybut appear read-only; consider switching toconst char *(andconst char *body) to prevent accidental mutation and clarify ownership.plugins/out_s3/s3_multipart.c (2)
62-64: Integer overflow risk insleep_msfor POSIX.The expression
milliseconds * 1000can overflow a 32-bitintwhenmilliseconds > 2147483(about 35 minutes). While current usage with backoff_ms capped at 30000ms is safe, this is a latent bug if the function is reused elsewhere.🔧 Proposed fix to use safer types
/* Cross-platform sleep wrapper */ static inline void sleep_ms(int milliseconds) { - usleep(milliseconds * 1000); + usleep((useconds_t)milliseconds * 1000); }Alternatively, since
flb_compat.halready provides a cross-platformusleep()for Windows (lines 128-132 in the relevant snippet), consider using that directly instead of defining a new wrapper.
117-138:strchrdoes not respect size bounds if response is not NUL-terminated.The function receives
sizebutstrchrat lines 118, 124, and 135 will scan until NUL regardless ofsize. Ifresponseis not NUL-terminated, this could read out of bounds. The bounds checks at lines 130-132 and 136-138 help but only afterstrchrhas already scanned pastsize.Consider using
memchrinstead ofstrchrto respect the size parameter:🔧 Proposed fix using memchr
/* Find "ETag:" header in response */ - etag_header = strstr(response, "ETag:"); + etag_header = flb_strnstr(response, "ETag:", size); if (!etag_header) { return NULL; } + size_t remaining = size - (etag_header - response); + /* Find the opening quote after "ETag:" */ - start_quote = strchr(etag_header, '\"'); + start_quote = memchr(etag_header, '\"', remaining); if (!start_quote) { return NULL; }tests/runtime/out_s3_queue.c (1)
7-8: Stop includingaws_client_mock.cdirectly.Including a
.cfile creates duplicate symbol definitions if these tests are ever compiled into a single binary or linked together. This pattern is repeated across all S3 test files in this PR.Include only the header and ensure the mock implementation is compiled/linked separately.
♻️ Proposed fix
`#include` "../include/aws_client_mock.h" -#include "../include/aws_client_mock.c"tests/runtime/out_s3_multipart.c (1)
387-440: Note:flb_test_abort_multipart_successandflb_test_complete_multipart_failurehave identical mock chains.Both tests use the same sequence: Create → UploadPart → Complete(500) → Abort(DELETE 204). While this correctly tests abort behavior, consider adding a comment to
flb_test_complete_multipart_failureclarifying it also validates the abort path, or differentiate the tests more clearly.tests/internal/parquet.c (2)
69-136: ChunkIO header CRC32 fields are hardcoded to 0; consider computing or using ChunkIO helpers.
If ChunkIO validation is enabled (now or later), these tests may become brittle.
407-441: Very large record-count tests may cause CI timeouts; consider gating or trimming.
The 70k/131k loops are valuable, but they’re expensive. Consider an env-gated “long tests” mode or reducing counts while still covering boundaries.Also applies to: 1872-1939
plugins/out_s3/s3_blob.c (1)
183-235: Consider skipping MPU abort for stale files whens3_keyis unknown.
Regenerating the key for time-based formats is explicitly warned as potentially wrong; abort requests will likely fail and create noise. Prefer “best effort” only whens3_keyis present (or rely on S3 MPU expiration).tests/runtime/out_s3_edge_cases.c (1)
18-98: Makeflb_test_empty_data_uploaddeterministic: assert mock chain state instead of sleeping.
Right now it’s “no crash” only and timing-based. Consider asserting that no requests were consumed (or don’t install a chain at all).src/flb_blob_db.c (1)
1849-1874: Consider capturing error details fromsqlite3_exec.The function uses
sqlite3_execbut ignores the error message parameter. While functionally correct, capturing the error message could aid debugging.Optional enhancement for better error diagnostics
int flb_blob_db_reset_zombie_parts(struct flb_blob_db *context) { const char *sql; + char *errmsg = NULL; int result; if (!context || !context->db) { return FLB_BLOB_DB_ERROR_INVALID_BLOB_DB_CONTEXT; } sql = "UPDATE blob_parts SET in_progress = 0 " "WHERE uploaded = 0 AND in_progress = 1"; flb_sqldb_lock(context->db); - result = sqlite3_exec(context->db->handler, sql, NULL, NULL, NULL); + result = sqlite3_exec(context->db->handler, sql, NULL, NULL, &errmsg); flb_sqldb_unlock(context->db); if (result != SQLITE_OK) { context->last_error = result; + if (errmsg != NULL) { + sqlite3_free(errmsg); + } return -1; } return FLB_BLOB_DB_SUCCESS; }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (47)
include/fluent-bit/aws/flb_aws_compress.hinclude/fluent-bit/flb_aws_util.hinclude/fluent-bit/flb_blob_db.hinclude/fluent-bit/flb_parquet.hinclude/fluent-bit/flb_snappy.hplugins/in_blob/blob_file.cplugins/out_kinesis_firehose/firehose.cplugins/out_s3/CMakeLists.txtplugins/out_s3/s3.cplugins/out_s3/s3.hplugins/out_s3/s3_auth.cplugins/out_s3/s3_auth.hplugins/out_s3/s3_blob.cplugins/out_s3/s3_blob.hplugins/out_s3/s3_multipart.cplugins/out_s3/s3_multipart.hplugins/out_s3/s3_queue.cplugins/out_s3/s3_queue.hplugins/out_s3/s3_store.cplugins/out_s3/s3_store.hplugins/out_s3/s3_stream.cplugins/out_s3/s3_stream.hsrc/aws/flb_aws_compress.csrc/aws/flb_aws_util.csrc/flb_blob_db.csrc/flb_parquet_impl.cppsrc/flb_signv4.csrc/flb_snappy.ctests/include/aws_client_mock.ctests/include/aws_client_mock.htests/include/s3_test_helpers.htests/internal/CMakeLists.txttests/internal/aws_compress.ctests/internal/aws_util.ctests/internal/fuzzers/aws_util_fuzzer.ctests/internal/parquet.ctests/runtime/CMakeLists.txttests/runtime/out_s3.ctests/runtime/out_s3_blob.ctests/runtime/out_s3_config.ctests/runtime/out_s3_edge_cases.ctests/runtime/out_s3_error_handling.ctests/runtime/out_s3_format.ctests/runtime/out_s3_multipart.ctests/runtime/out_s3_queue.ctests/runtime/out_s3_recovery.ctests/runtime/out_s3_special_features.c
💤 Files with no reviewable changes (1)
- tests/runtime/out_s3.c
✅ Files skipped from review due to trivial changes (1)
- plugins/out_s3/s3.h
🚧 Files skipped from review as they are similar to previous changes (5)
- include/fluent-bit/flb_snappy.h
- tests/include/aws_client_mock.h
- plugins/out_kinesis_firehose/firehose.c
- plugins/out_s3/s3_auth.h
- plugins/out_s3/s3_blob.h
🧰 Additional context used
🧠 Learnings (15)
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
tests/internal/parquet.csrc/flb_snappy.csrc/flb_parquet_impl.cppsrc/aws/flb_aws_compress.ctests/internal/aws_compress.cinclude/fluent-bit/aws/flb_aws_compress.hplugins/out_s3/s3_stream.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.
Applied to files:
tests/internal/parquet.csrc/flb_snappy.csrc/aws/flb_aws_compress.ctests/internal/aws_compress.cinclude/fluent-bit/aws/flb_aws_compress.h
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like `#ifdef` FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.
Applied to files:
tests/internal/parquet.csrc/flb_snappy.csrc/aws/flb_aws_compress.ctests/internal/aws_compress.cinclude/fluent-bit/aws/flb_aws_compress.hplugins/out_s3/s3_stream.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use `#ifdef` FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Applied to files:
tests/internal/parquet.csrc/flb_snappy.csrc/flb_parquet_impl.cppsrc/aws/flb_aws_compress.ctests/internal/aws_compress.cinclude/fluent-bit/aws/flb_aws_compress.hplugins/out_s3/s3_stream.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.
Applied to files:
tests/internal/parquet.csrc/flb_snappy.csrc/aws/flb_aws_compress.ctests/internal/aws_compress.cinclude/fluent-bit/aws/flb_aws_compress.hplugins/out_s3/s3_stream.c
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
Applied to files:
src/aws/flb_aws_util.cplugins/out_s3/s3_stream.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.
Applied to files:
tests/runtime/out_s3_error_handling.csrc/flb_parquet_impl.cppplugins/out_s3/s3_store.ctests/include/s3_test_helpers.htests/internal/aws_compress.c
📚 Learning: 2025-08-07T10:15:46.187Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10691
File: .github/workflows/unit-tests.yaml:94-101
Timestamp: 2025-08-07T10:15:46.187Z
Learning: Apache Arrow latest packages have changing checksums, making hardcoded checksum verification impractical for CI workflows that download the latest version.
Applied to files:
src/flb_parquet_impl.cpp
📚 Learning: 2025-09-04T07:28:37.083Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10832
File: src/wasm/CMakeLists.txt:112-131
Timestamp: 2025-09-04T07:28:37.083Z
Learning: In fluent-bit CMake files, the user cosmo0920 prefers treating Git as a command rather than a package, emphasizing that Git is not a pkg-config retrievable package but just a command.
Applied to files:
src/flb_parquet_impl.cpp
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.
Applied to files:
src/flb_parquet_impl.cppsrc/aws/flb_aws_compress.ctests/internal/aws_compress.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
src/flb_parquet_impl.cpptests/internal/CMakeLists.txttests/internal/aws_compress.c
📚 Learning: 2025-09-04T12:35:22.872Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:1339-1344
Timestamp: 2025-09-04T12:35:22.872Z
Learning: In the Fluent Bit S3 plugin, the user prefers to maintain current retry_limit behavior without special handling for FLB_OUT_RETRY_UNLIMITED (-1), as there's no documentation indicating -1 should be used for infinite retries and consistency with current logic is preferred.
Applied to files:
plugins/out_s3/s3_auth.cplugins/out_s3/s3_queue.cplugins/out_s3/s3_multipart.c
📚 Learning: 2025-12-17T18:45:02.283Z
Learnt from: edsiper
Repo: fluent/fluent-bit PR: 11286
File: src/flb_http_client.c:1657-1715
Timestamp: 2025-12-17T18:45:02.283Z
Learning: In the Fluent Bit OAuth2 implementation (src/flb_oauth2.c), the function flb_oauth2_get_access_token returns a pointer to the cached token stored in the OAuth2 context (ctx->access_token), not a new allocation. The OAuth2 context manages the token lifecycle, so callers must not free the returned token pointer.
Applied to files:
plugins/out_s3/s3_auth.c
📚 Learning: 2025-09-04T12:35:36.904Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:3275-3282
Timestamp: 2025-09-04T12:35:36.904Z
Learning: The out_s3 plugin intentionally uses a simple numeric comparison for retry_limit (chunk->failures >= ctx->ins->retry_limit) rather than the standard Fluent Bit pattern that checks for FLB_OUT_RETRY_UNLIMITED (-1). The maintainer wants to keep this current behavior for consistency within the plugin.
Applied to files:
plugins/out_s3/s3_queue.cplugins/out_s3/s3_stream.cplugins/out_s3/s3_multipart.c
📚 Learning: 2025-11-21T06:23:29.770Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11171
File: include/fluent-bit/flb_lib.h:52-53
Timestamp: 2025-11-21T06:23:29.770Z
Learning: In Fluent Bit core (fluent/fluent-bit repository), function descriptions/documentation are not required for newly added functions in header files.
Applied to files:
include/fluent-bit/flb_aws_util.h
🧬 Code graph analysis (23)
tests/internal/parquet.c (3)
src/flb_parquet_impl.cpp (1)
field_name(548-548)src/flb_pack.c (1)
flb_pack_json(530-535)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
tests/runtime/out_s3_multipart.c (2)
tests/include/s3_test_helpers.h (5)
s3_test_next(438-446)s3_test_create_temp_db_path(137-194)s3_test_create_temp_store_dir(204-257)s3_test_cleanup_temp_db(417-430)s3_test_cleanup_temp_store_dir(384-409)tests/include/aws_client_mock.c (3)
flb_aws_client_mock_configure_generator(42-47)flb_aws_client_get_mock_generator(137-140)flb_aws_client_mock_clear_generator_instance(71-74)
src/flb_signv4.c (1)
src/flb_sds.c (2)
flb_sds_destroy(389-399)flb_sds_create_len(58-76)
plugins/in_blob/blob_file.c (1)
plugins/in_blob/blob_db.c (1)
blob_db_file_insert(164-196)
include/fluent-bit/flb_parquet.h (1)
src/flb_parquet_impl.cpp (8)
flb_parquet_validate_schema(860-883)flb_parquet_validate_schema(860-862)flb_parquet_schema_create(885-910)flb_parquet_schema_create(885-887)flb_parquet_schema_destroy(912-918)flb_parquet_schema_destroy(912-912)flb_msgpack_to_parquet_streaming(920-1274)flb_msgpack_to_parquet_streaming(920-925)
tests/runtime/out_s3_error_handling.c (2)
tests/include/s3_test_helpers.h (5)
s3_test_next(438-446)s3_test_create_temp_db_path(137-194)s3_test_create_temp_store_dir(204-257)s3_test_cleanup_temp_db(417-430)s3_test_cleanup_temp_store_dir(384-409)tests/include/aws_client_mock.c (3)
flb_aws_client_mock_configure_generator(42-47)flb_aws_client_get_mock_generator(137-140)flb_aws_client_mock_clear_generator_instance(71-74)
src/flb_snappy.c (1)
include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
src/flb_parquet_impl.cpp (1)
include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
plugins/out_s3/s3_stream.h (1)
plugins/out_s3/s3_stream.c (4)
stream_compress_file(142-213)stream_process_msgpack_file(563-631)stream_json_processor(633-661)stream_log_key_processor(736-765)
plugins/out_s3/s3_auth.c (5)
include/fluent-bit/flb_mem.h (1)
flb_free(126-128)src/flb_utils.c (1)
flb_utils_url_split(1531-1667)src/flb_upstream.c (1)
flb_upstream_create_url(416-481)src/flb_http_client.c (4)
flb_http_add_header(1015-1047)flb_http_basic_auth(1394-1398)flb_http_bearer_auth(1406-1441)flb_http_client(866-911)src/aws/flb_aws_util.c (1)
flb_aws_uri_encode_path(1490-1526)
tests/runtime/out_s3_edge_cases.c (4)
tests/include/s3_test_helpers.h (5)
s3_test_next(438-446)s3_test_create_temp_db_path(137-194)s3_test_create_temp_store_dir(204-257)s3_test_cleanup_temp_db(417-430)s3_test_cleanup_temp_store_dir(384-409)tests/include/aws_client_mock.c (3)
flb_aws_client_mock_configure_generator(42-47)flb_aws_client_get_mock_generator(137-140)flb_aws_client_mock_clear_generator_instance(71-74)src/flb_lib.c (9)
flb_create(143-225)flb_service_set(652-678)flb_input(266-276)flb_input_set(305-335)flb_output(279-289)flb_output_set(520-551)flb_start(983-994)flb_stop(1011-1055)flb_destroy(228-263)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
plugins/out_s3/s3_queue.c (5)
include/fluent-bit/flb_mem.h (2)
flb_calloc(84-96)flb_free(126-128)src/flb_sds.c (3)
flb_sds_create_len(58-76)flb_sds_create(78-90)flb_sds_destroy(389-399)plugins/out_s3/s3_store.c (2)
s3_store_buffer_put(199-240)s3_store_file_delete(508-533)plugins/out_s3/s3_auth.c (1)
s3_auth_fetch_presigned_url(359-407)lib/cfl/src/cfl_sds.c (2)
cfl_sds_len(88-91)cfl_sds_destroy(127-137)
tests/runtime/out_s3_blob.c (2)
tests/include/s3_test_helpers.h (5)
s3_test_next(438-446)s3_test_create_temp_db_path(137-194)s3_test_create_temp_store_dir(204-257)s3_test_cleanup_temp_db(417-430)s3_test_cleanup_temp_store_dir(384-409)tests/include/aws_client_mock.c (3)
flb_aws_client_mock_configure_generator(42-47)flb_aws_client_get_mock_generator(137-140)flb_aws_client_mock_clear_generator_instance(71-74)
src/aws/flb_aws_compress.c (1)
src/flb_snappy.c (1)
flb_snappy_compress_framed_data(134-255)
tests/include/s3_test_helpers.h (2)
plugins/out_s3/s3.c (1)
setenv(66-69)tests/include/aws_client_mock.c (1)
flb_aws_client_mock_generator_count_unused_requests(143-147)
tests/internal/fuzzers/aws_util_fuzzer.c (1)
src/aws/flb_aws_util.c (1)
flb_get_s3_key(1311-1389)
plugins/out_s3/s3_queue.h (1)
plugins/out_s3/s3_queue.c (11)
s3_queue_remove(307-316)s3_queue_buffer_chunk(356-378)s3_queue_add_file(116-130)s3_queue_add_part(132-202)s3_queue_add_pending_file(259-272)s3_queue_add_file_unlocked(58-111)s3_queue_add_pending_file_unlocked(208-254)s3_queue_process_entry(748-807)s3_queue_recover_all(850-895)s3_queue_recover_from_database(988-1076)s3_queue_recover_from_fstore(1141-1201)
tests/internal/aws_util.c (1)
src/aws/flb_aws_util.c (1)
flb_get_s3_key(1311-1389)
include/fluent-bit/flb_aws_util.h (1)
src/aws/flb_aws_util.c (2)
flb_get_s3_key(1311-1389)flb_aws_uri_encode_path(1490-1526)
tests/internal/aws_compress.c (1)
src/flb_snappy.c (2)
flb_snappy_compress_framed_data(134-255)flb_snappy_uncompress_framed_data(257-481)
plugins/out_s3/s3_stream.c (4)
src/aws/flb_aws_compress.c (1)
flb_aws_compression_compress(110-127)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)src/flb_sds.c (4)
flb_sds_create_size(92-95)flb_sds_cat(120-141)flb_sds_create(78-90)flb_sds_printf(336-387)plugins/out_s3/s3_store.c (1)
flb_chunk_file_open(535-571)
src/flb_blob_db.c (3)
src/flb_sqldb.c (3)
flb_sqldb_close(129-149)flb_sqldb_lock(173-178)flb_sqldb_unlock(180-185)lib/cfl/src/cfl_sds.c (2)
cfl_sds_create(113-125)cfl_sds_destroy(127-137)include/fluent-bit/flb_mem.h (1)
flb_calloc(84-96)
plugins/out_s3/s3_multipart.c (4)
include/fluent-bit/flb_compat.h (1)
usleep(129-133)plugins/out_s3/s3.c (1)
write_seq_index(127-147)include/fluent-bit/flb_mem.h (2)
flb_calloc(84-96)flb_free(126-128)plugins/out_s3/s3_auth.c (1)
s3_auth_fetch_presigned_url(359-407)
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
2aa4b4b to
01929d6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🤖 Fix all issues with AI agents
In `@plugins/out_s3/s3_blob.c`:
- Around line 442-531: flb_blob_db_file_insert() returns int64_t so avoid
truncation and incorrect error checks: replace the current int ret =
flb_blob_db_file_insert(...) usage with assigning its return directly to the
existing int64_t file_id (i.e. file_id = flb_blob_db_file_insert(...)) and
change the error check from (ret == -1) to (file_id < 0); keep using the int
variable ret for subsequent calls that return int (e.g., s3_blob_register_parts
and s3_queue_add_pending_file). Ensure all error-handling branches use file_id
for deletion/logging when appropriate.
In `@plugins/out_s3/s3_store.c`:
- Around line 553-589: In flb_chunk_file_open, fix the meta_len logging and make
the fseek safe: log meta_len using the correct unsigned 16-bit specifier (e.g.,
cast to unsigned and use %hu or use PRIu16) instead of %d, and before fseek
validate meta_len against the actual remaining file size (use fseek/ftell or
fstat to get file length and compute remaining bytes) to ensure meta_len is not
larger than available bytes; if validation fails, log an error and close/return
NULL. Also ensure the fseek call uses a signed/long offset type appropriate for
fseek and handle any overflow when converting meta_len to the seek offset.
In `@plugins/out_s3/s3_stream.c`:
- Around line 60-140: The cleanup path in stream_compress_file_chunked may leak
compressed_chunk if an upstream compressor (e.g., flb_gzip_compress) returns -1
without freeing its output; update the cleanup block in
stream_compress_file_chunked to defensively free compressed_chunk (check if
compressed_chunk is non-NULL then call flb_free) in addition to freeing
read_buffer so any partially-allocated compressed_chunk is released on error or
early return.
In `@tests/include/s3_test_helpers.h`:
- Around line 206-270: The Windows build is missing linkage for the RPC/OLE
functions used in s3_test_create_temp_store_dir() (CoCreateGuid, UuidToStringA,
RpcStringFreeA); update the test target's target_link_libraries invocation to
add ole32 and rpcrt4 when WIN32 is true (or conditionally for the S3 test
target) so the linker can resolve those symbols; keep the existing
fluent-bit-static, ${CMAKE_THREAD_LIBS_INIT} and ${SYSTEMD_LIB} entries and only
add ole32 and rpcrt4 inside the WIN32 branch.
♻️ Duplicate comments (2)
plugins/out_s3/s3_stream.c (1)
319-337: Delete/recreate on Windows reintroduces a TOCTOU race
GetTempFileName()creates the file, thenDeleteFile()+_O_EXCLrecreates it. Another process can potentially win that window. Better: open the already-created file (no delete), or switch toCreateFile(..., CREATE_NEW, ...)on a generated unique name without creating it first.tests/runtime/out_s3_config.c (1)
18-430: Tests don’t currently assert the generated S3 key/URI, so they won’t catch key_format regressions
Mosts3_key_format_*tests only assert HTTP method. To validate$TAG/$INDEX/$UUID/%Y...expansion, add mock expectations for request URI/path (and ideally host/port for endpoint tests).
🧹 Nitpick comments (11)
plugins/out_s3/s3_auth.c (1)
230-230: Consider lowering log level frominfotodebugfor routine operations.Logging at
infolevel on every successful presigned URL retrieval may produce excessive log volume in production, especially with multipart uploads that fetch multiple URLs per upload. Consider usingflb_plg_debugfor routine success messages and reservinginfofor significant state changes.Suggested change
- flb_plg_info(ctx->ins, "Pre-signed URL retrieved successfully"); + flb_plg_debug(ctx->ins, "Pre-signed URL retrieved successfully");tests/internal/parquet.c (1)
906-941: Useflb_malloc/flb_freefor consistency with Fluent Bit memory management.The test uses standard
malloc()andfree()while the rest of the codebase usesflb_malloc()andflb_free(). For consistency with Fluent Bit's memory allocation patterns, consider using the Fluent Bit wrappers.Suggested change
const size_t long_str_size = 100000; /* 100KB string */ - char *long_str = (char *)malloc(long_str_size); + char *long_str = (char *)flb_malloc(long_str_size); if (!long_str) { TEST_CHECK(0); /* Memory allocation failed */ return; }- free(long_str); + flb_free(long_str); cleanup_test_context(&ctx);plugins/out_s3/s3_store.c (5)
116-140: Buffer warning threshold logic looks inverted at first glance; consider simplifying
space_remaining/limit < (1.0 - 0.8)is equivalent to “used > 80%”. It’s correct, but harder to read thannew_total > limit * 0.8.
383-407: You can likely drop the platform split by usinggmtime_r()fromflb_compat.h
This file already includesflb_compat.h, which provides agmtime_rwrapper on Windows. Callinggmtime_runconditionally would reduce platform-specific code here.
260-295: Use PATH_SEPARATOR macro instead of hardcoded "/" for Windows compatibilityThe
snprintf(chunk_path, sizeof(chunk_path), "%s/%s", ...)call uses a hardcoded "/" separator. Fluent Bit already providesPATH_SEPARATORmacro (defined ininclude/fluent-bit/calyptia/calyptia_constants.h) for platform-aware path construction. Follow the pattern used inplugins/out_file/file.cand use:snprintf(chunk_path, sizeof(chunk_path), "%s" PATH_SEPARATOR "%s", stream_path, chunk_name)This ensures consistency across the codebase and prevents edge cases on Windows.
33-49: UseFLB_SYSTEM_WINDOWSconsistently for Windows detection across the fileLine 34 uses
_WIN32to defineFLB_PATH_MAX, but line 278 usesFLB_SYSTEM_WINDOWSfor the same purpose (statvs_stat). Fluent Bit definesFLB_SYSTEM_WINDOWSvia CMake whenCMAKE_SYSTEM_NAMEmatches "Windows" and uses it as the standard platform macro throughout the codebase. PreferFLB_SYSTEM_WINDOWSin both places to maintain consistency with the project convention and avoid potential divergence if code is built outside the standard CMake flow.
142-215: Append failure handling for existing files needs safeguardsWhen
flb_fstore_file_append()fails on an existing buffer file (one that was passed in rather than newly created), the file is left in a partial/corrupted state with no cleanup or status flag. Callers only receive a warning. Consider either: (a) locking the file to prevent reuse, or (b) deleting and allowing the pipeline to recreate a fresh buffer. The existinglockedfield and lock/unlock functions provide infrastructure for option (a).plugins/out_s3/s3_queue.h (1)
23-102: Header should directly include type dependencies (stdint.h,time.h)
This header usesuint64_tandtime_tbut only includes<sys/types.h>. Relying ons3.htransitive includes is brittle.Proposed fix
`#include` "s3.h" `#include` <sys/types.h> /* for off_t */ + `#include` <stdint.h> /* for uint64_t */ + `#include` <time.h> /* for time_t */tests/runtime/out_s3_config.c (1)
687-835: Looped scenario tests: consider including scenario name in WAIT label
S3_TEST_WAIT_MOCK_EXHAUSTED(5, "storage_class_variations")inside a loop makes failures harder to pinpoint. Consider suffixing withclasses[i]/acls[i]in the label.Also applies to: 762-835
plugins/out_s3/s3_blob.c (1)
507-509: Prefer a provided tag length overstrlen(tag)
You passstrlen(event_chunk->tag)intos3_queue_add_pending_file(). Ifevent_chunkalready provides a tag length (common in Fluent Bit event APIs), use it to avoid surprises with embedded NULs or non-string tags.#!/bin/bash # Check struct flb_event_chunk definition for tag length fields. rg -n --type=h 'struct\s+flb_event_chunk\b' include -n -C 3tests/runtime/out_s3_edge_cases.c (1)
80-92: Minor: Redundant assertion at line 92.The assertion
TEST_CHECK(ret == 0)at line 92 checks the sameretvalue fromflb_start()that was already verified at line 82. This is redundant but not harmful. The comment "Verify no crash occurred - reaching here is the test" suggests the intent is to verify the test didn't crash, which is implicitly true by reaching that point.Suggested simplification
/* Verify no crash occurred - reaching here is the test */ - TEST_CHECK(ret == 0); + TEST_MSG("Empty flush completed without crash");
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (33)
plugins/out_s3/CMakeLists.txtplugins/out_s3/s3.cplugins/out_s3/s3.hplugins/out_s3/s3_auth.cplugins/out_s3/s3_auth.hplugins/out_s3/s3_blob.cplugins/out_s3/s3_blob.hplugins/out_s3/s3_multipart.cplugins/out_s3/s3_multipart.hplugins/out_s3/s3_queue.cplugins/out_s3/s3_queue.hplugins/out_s3/s3_store.cplugins/out_s3/s3_store.hplugins/out_s3/s3_stream.cplugins/out_s3/s3_stream.htests/include/aws_client_mock.ctests/include/aws_client_mock.htests/include/s3_test_helpers.htests/internal/CMakeLists.txttests/internal/aws_compress.ctests/internal/aws_util.ctests/internal/parquet.ctests/runtime/CMakeLists.txttests/runtime/out_s3.ctests/runtime/out_s3_blob.ctests/runtime/out_s3_config.ctests/runtime/out_s3_edge_cases.ctests/runtime/out_s3_error_handling.ctests/runtime/out_s3_format.ctests/runtime/out_s3_multipart.ctests/runtime/out_s3_queue.ctests/runtime/out_s3_recovery.ctests/runtime/out_s3_special_features.c
💤 Files with no reviewable changes (1)
- tests/runtime/out_s3.c
🚧 Files skipped from review as they are similar to previous changes (7)
- tests/internal/CMakeLists.txt
- tests/include/aws_client_mock.h
- plugins/out_s3/s3_stream.h
- plugins/out_s3/s3_auth.h
- plugins/out_s3/CMakeLists.txt
- plugins/out_s3/s3_multipart.h
- tests/internal/aws_compress.c
🧰 Additional context used
🧠 Learnings (10)
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
tests/internal/parquet.cplugins/out_s3/s3_stream.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.
Applied to files:
tests/internal/parquet.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like `#ifdef` FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.
Applied to files:
tests/internal/parquet.cplugins/out_s3/s3_stream.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use `#ifdef` FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Applied to files:
tests/internal/parquet.cplugins/out_s3/s3_stream.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.
Applied to files:
tests/internal/parquet.cplugins/out_s3/s3_stream.c
📚 Learning: 2025-09-04T12:35:36.904Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:3275-3282
Timestamp: 2025-09-04T12:35:36.904Z
Learning: The out_s3 plugin intentionally uses a simple numeric comparison for retry_limit (chunk->failures >= ctx->ins->retry_limit) rather than the standard Fluent Bit pattern that checks for FLB_OUT_RETRY_UNLIMITED (-1). The maintainer wants to keep this current behavior for consistency within the plugin.
Applied to files:
plugins/out_s3/s3_stream.cplugins/out_s3/s3_queue.cplugins/out_s3/s3_multipart.c
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
Applied to files:
plugins/out_s3/s3_stream.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.
Applied to files:
tests/include/s3_test_helpers.hplugins/out_s3/s3_store.ctests/runtime/out_s3_error_handling.cplugins/out_s3/s3_queue.c
📚 Learning: 2025-09-04T12:35:22.872Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:1339-1344
Timestamp: 2025-09-04T12:35:22.872Z
Learning: In the Fluent Bit S3 plugin, the user prefers to maintain current retry_limit behavior without special handling for FLB_OUT_RETRY_UNLIMITED (-1), as there's no documentation indicating -1 should be used for infinite retries and consistency with current logic is preferred.
Applied to files:
plugins/out_s3/s3_auth.cplugins/out_s3/s3_queue.cplugins/out_s3/s3_multipart.c
📚 Learning: 2025-12-17T18:45:02.283Z
Learnt from: edsiper
Repo: fluent/fluent-bit PR: 11286
File: src/flb_http_client.c:1657-1715
Timestamp: 2025-12-17T18:45:02.283Z
Learning: In the Fluent Bit OAuth2 implementation (src/flb_oauth2.c), the function flb_oauth2_get_access_token returns a pointer to the cached token stored in the OAuth2 context (ctx->access_token), not a new allocation. The OAuth2 context manages the token lifecycle, so callers must not free the returned token pointer.
Applied to files:
plugins/out_s3/s3_auth.cplugins/out_s3/s3.h
🧬 Code graph analysis (13)
tests/runtime/out_s3_multipart.c (3)
tests/include/s3_test_helpers.h (5)
s3_test_next(451-459)s3_test_create_temp_db_path(138-196)s3_test_create_temp_store_dir(206-270)s3_test_cleanup_temp_db(430-443)s3_test_cleanup_temp_store_dir(397-422)tests/include/aws_client_mock.c (3)
flb_aws_client_mock_configure_generator(42-47)flb_aws_client_get_mock_generator(137-140)flb_aws_client_mock_clear_generator_instance(71-74)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
tests/internal/parquet.c (3)
src/flb_parquet_impl.cpp (1)
field_name(548-548)src/flb_pack.c (1)
flb_pack_json(530-535)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
plugins/out_s3/s3_stream.c (4)
src/aws/flb_aws_compress.c (1)
flb_aws_compression_compress(110-127)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)src/flb_sds.c (4)
flb_sds_create_size(92-95)flb_sds_cat(120-141)flb_sds_create(78-90)flb_sds_printf(336-387)plugins/out_s3/s3_store.c (1)
flb_chunk_file_open(553-589)
tests/runtime/out_s3_format.c (3)
tests/include/s3_test_helpers.h (5)
s3_test_next(451-459)s3_test_create_temp_db_path(138-196)s3_test_create_temp_store_dir(206-270)s3_test_cleanup_temp_db(430-443)s3_test_cleanup_temp_store_dir(397-422)tests/include/aws_client_mock.c (3)
flb_aws_client_mock_configure_generator(42-47)flb_aws_client_get_mock_generator(137-140)flb_aws_client_mock_clear_generator_instance(71-74)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
tests/runtime/out_s3_blob.c (3)
tests/include/s3_test_helpers.h (5)
s3_test_next(451-459)s3_test_create_temp_db_path(138-196)s3_test_create_temp_store_dir(206-270)s3_test_cleanup_temp_db(430-443)s3_test_cleanup_temp_store_dir(397-422)tests/include/aws_client_mock.c (3)
flb_aws_client_mock_configure_generator(42-47)flb_aws_client_get_mock_generator(137-140)flb_aws_client_mock_clear_generator_instance(71-74)src/flb_lib.c (3)
flb_create(143-225)flb_start(983-994)flb_stop(1011-1055)
tests/include/s3_test_helpers.h (2)
plugins/out_s3/s3.c (1)
setenv(66-69)tests/include/aws_client_mock.c (1)
flb_aws_client_mock_generator_count_unused_requests(143-147)
plugins/out_s3/s3_blob.c (6)
src/aws/flb_aws_util.c (1)
flb_get_s3_key(1311-1389)plugins/out_s3/s3_auth.c (1)
s3_auth_fetch_presigned_url(359-423)src/flb_input_blob.c (1)
flb_input_blob_delivery_notification_destroy(222-231)src/flb_blob_db.c (2)
flb_blob_db_lock(423-428)flb_blob_db_unlock(430-435)src/flb_log_event_decoder.c (2)
flb_log_event_decoder_init(103-120)flb_log_event_decoder_next(309-456)plugins/out_s3/s3_queue.c (1)
s3_queue_add_pending_file(269-282)
tests/runtime/out_s3_config.c (3)
tests/include/s3_test_helpers.h (5)
s3_test_next(451-459)s3_test_create_temp_db_path(138-196)s3_test_create_temp_store_dir(206-270)s3_test_cleanup_temp_db(430-443)s3_test_cleanup_temp_store_dir(397-422)tests/include/aws_client_mock.c (3)
flb_aws_client_mock_configure_generator(42-47)flb_aws_client_get_mock_generator(137-140)flb_aws_client_mock_clear_generator_instance(71-74)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
plugins/out_s3/s3_store.c (4)
src/flb_sds.c (4)
flb_sds_create_size(92-95)flb_sds_printf(336-387)flb_sds_destroy(389-399)flb_sds_create(78-90)src/flb_fstore.c (7)
flb_fstore_file_delete(259-277)flb_fstore_file_create(153-190)flb_fstore_file_append(298-329)flb_fstore_create(479-525)flb_fstore_destroy(527-564)flb_fstore_stream_create(335-402)flb_fstore_file_inactive(238-256)include/fluent-bit/flb_mem.h (2)
flb_free(126-128)flb_calloc(84-96)include/fluent-bit/flb_compat.h (1)
gmtime_r(76-82)
plugins/out_s3/s3_blob.h (1)
plugins/out_s3/s3_blob.c (4)
s3_blob_register_parts(394-427)s3_blob_process_events(442-531)s3_blob_recover_state(171-190)s3_blob_notify_delivery(124-161)
tests/runtime/out_s3_edge_cases.c (3)
tests/include/s3_test_helpers.h (5)
s3_test_next(451-459)s3_test_create_temp_db_path(138-196)s3_test_create_temp_store_dir(206-270)s3_test_cleanup_temp_db(430-443)s3_test_cleanup_temp_store_dir(397-422)tests/include/aws_client_mock.c (3)
flb_aws_client_mock_configure_generator(42-47)flb_aws_client_get_mock_generator(137-140)flb_aws_client_mock_clear_generator_instance(71-74)src/flb_lib.c (7)
flb_create(143-225)flb_service_set(652-678)flb_input(266-276)flb_start(983-994)flb_stop(1011-1055)flb_destroy(228-263)flb_lib_push(843-870)
plugins/out_s3/s3_store.h (1)
plugins/out_s3/s3_store.c (1)
flb_chunk_file_open(553-589)
tests/internal/aws_util.c (3)
src/aws/flb_aws_util.c (1)
flb_get_s3_key(1311-1389)src/flb_sds.c (1)
flb_sds_destroy(389-399)tests/internal/fuzzers/aws_util_fuzzer.c (1)
initialization_crutch(27-36)
🔇 Additional comments (76)
tests/runtime/CMakeLists.txt (1)
238-246: LGTM! Good modular test organization.The expansion from a single test file to nine focused test files improves test maintainability and provides better isolation for different S3 plugin aspects (config, multipart, format, error handling, edge cases, special features, queue, recovery, blob).
tests/include/aws_client_mock.c (3)
2-10: Good documentation for the inclusion pattern.The comment clearly explains that this
.cfile is directly included in test files rather than compiled separately, which prevents symbol conflicts when tests are built as standalone executables.
56-64: Good double-free protection pattern.Clearing the instance pointer before calling destroy prevents double-free scenarios when the generator is destroyed multiple times or when the S3 plugin cleanup has already freed the mock.
266-299: DATA field handling correctly allocates memory forflb_http_client_destroycleanup.The implementation properly:
- Searches for DATA_SIZE or DATA_LEN to determine allocation size
- Falls back to
strlen()if size not specified- Allocates, copies, and null-terminates the data
- Sets all three response fields (
data,data_len,data_size)One minor observation: if
flb_mallocfails at line 291,c->resp.dataremains NULL, but this is acceptable since the code doesn't set the other fields in that case, and the caller will receive a response with no data which is a reasonable failure mode.plugins/out_s3/s3_auth.c (3)
1-128: Well-structured authorization endpoint initialization.The implementation correctly:
- Parses URL and handles failures with proper cleanup
- Creates TLS context only for HTTPS endpoints
- Properly cleans up TLS context if upstream creation fails
- Uses
flb_output_upstream_setto wire the upstream to the plugin instance
262-357: Solid URL path builder with proper encoding and cleanup.The implementation:
- URL-encodes both S3 key and upload ID
- Strips leading slash from key to avoid double slashes
- Has comprehensive error cleanup paths
- Returns the formatted path via
flb_sds_printfresult
359-423: Complete presigned URL fetch flow with proper resource management.The implementation correctly handles all edge cases:
- Returns success (0) with NULL result when endpoint is not configured
- Validates upstream initialization before use
- Handles double-slash prevention for endpoint/path concatenation
- Properly destroys both
url_pathandfull_urlon all pathstests/internal/parquet.c (4)
1-53: Good header organization with proper includes and extern "C" linkage.The file correctly includes all necessary standard headers (stdio, stdlib, string, stdint, math.h) and establishes C++ interoperability for the
validate_parquet_filefunction.
69-136: ChunkIO header construction follows the specification correctly.The implementation properly sets:
- Magic bytes (0xC1, 0x00) at bytes 0-1
- Content length (big-endian) at bytes 10-13
- Metadata length (big-endian) at bytes 22-23
The CRC32 fields are left zeroed, which is acceptable for test scenarios where validation is typically disabled.
1941-2016: Proper use of standardNANandINFINITYmacros.Using macros from
<math.h>instead of division-by-zero expressions avoids potential SIGFPE traps when floating-point exceptions are enabled.
2081-2156: Excellent test coverage across multiple categories.The TEST_LIST provides comprehensive coverage:
- 9 functional tests for basic operations and type conversions
- 12 data quality tests for NULL handling and schema mismatches
- 9 scale/boundary tests including batch boundaries
- 1 optimized compression test covering all 4 types
- 10 negative/error tests
- 3 real-world scenario tests
The organization into categories with comments aids maintainability.
tests/include/s3_test_helpers.h (6)
14-43: Good cross-platform abstraction layer.The header provides clean platform-specific macros for common operations (sleep, unlink, rmdir, mkdir, close) and appropriate includes for both Windows and POSIX environments.
138-196: Correct implementation of unique temp file creation.The Windows path uses
GetTempFileNameAfor atomic file creation, while POSIX usesmkstemps(correctly documented as a BSD/Glibc extension). Both approaches ensure uniqueness and avoid race conditions.
272-298: Safe nftw callback implementation.The callback correctly handles different file types (FTW_F, FTW_SL, FTW_SLN for files/symlinks, FTW_D/FTW_DP for directories) and uses FTW_DEPTH to ensure children are visited before parents.
397-422: Secure directory cleanup without command injection.Replacing
system("rm -rf ...")with platform-specific APIs (nftwon POSIX,FindFirstFile/RemoveDirectoryAon Windows) eliminates the command injection vulnerability while maintaining functionality.
451-459: Correct Windows/POSIX environment variable handling.Uses
_putenv_son Windows andsetenvon POSIX, avoiding the portability issue from the previous review.
477-509: Good polling-based wait pattern for mock exhaustion.The
s3_test_poll_mock_exhaustedfunction andS3_TEST_WAIT_MOCK_EXHAUSTEDmacro provide a deterministic alternative to sleep-based waiting, ensuring tests fail explicitly if mock chains aren't consumed rather than timing out silently.tests/internal/aws_util.c (4)
47-68: Well-structured test coverage for new $FILE_PATH and $FILE_NAME variables.The new macros comprehensively test the S3 key generation with file path/name substitution, including important edge cases:
- Relative paths (
./logs/file.log)- Triple-dot prefixes (
...hidden/file.log)- Parent traversal (
../../../etc/passwd)- Hidden files (
.hidden)- Mixed dots and slashes (
.///logs/./file.log)This provides good coverage for the path sanitization logic that strips leading dots and slashes.
200-201: Correct signature update for existing tests.The additional
NULLparameter correctly indicates that these tests don't exercise the new$FILE_PATH/$FILE_NAMEsubstitution feature, maintaining backward compatibility while extending the API.
402-533: LGTM - New file path tests follow established patterns.All eight new test functions correctly:
- Initialize configuration with
initialization_crutch()- Set up UTC time context
- Call
flb_get_s3_keywith appropriate parameters- Verify expected output
- Clean up with
flb_sds_destroy
550-558: Test registration looks correct.All new test functions are properly registered in the test list.
plugins/out_s3/s3_queue.c (9)
22-30: Good Windows portability wrapper for file access checks.The
flb_accessmacro provides cross-platform compatibility for file existence checks. This addresses the portability concern for theaccess()function.
68-121: Clean memory management in queue entry creation.The function properly handles allocation failures by cleaning up previously allocated resources before returning -1. The documentation clearly states the caller must hold the lock.
142-212: LGTM - Proper resource cleanup on allocation failures.The cascading cleanup pattern correctly frees previously allocated resources when any allocation fails, preventing memory leaks.
444-474: ETag memory management correctly implemented.The code properly:
- Destroys the ETag SDS after saving to database (lines 452-454)
- Cleans up any allocated ETag on upload failure (lines 470-474)
This addresses the previously identified memory leak.
598-603: Proper validation of part_count before allocation.The bounds check (1–10,000) correctly validates
part_countbefore allocating and using it, preventing issues with corrupt or excessive values. This aligns with AWS S3's maximum parts limit.
1021-1045: Recovery loop properly advances through pending files.The loop correctly avoids infinite iteration by:
- Marking parts as
in_progress(line 1042) to exclude them from subsequent queries- Resetting
in_progresswhen skipping due to endpoint mismatch (line 1037)- Processing or cleaning up each file before the next iteration
792-797: Proper NULL check before database operation.The check
if (ctx->blob_db.db != NULL)at line 793 ensures the database call is only made when the database is configured, preventing potential null pointer issues.
721-728: Correct NULL check before database operation in failure path.The
if (ctx->blob_db.db != NULL)check ensures the database operation is only performed when the database is configured.
1158-1218: LGTM - Fstore recovery properly handles chunk states.The function correctly:
- Skips the metadata stream
- Respects chunk lock state
- Handles retry limit exceeded by marking inactive
- Locks files before queue addition and unlocks on failure
plugins/out_s3/s3.h (5)
32-33: Forward declaration correctly added for Parquet schema.The forward declaration allows the header to reference
struct flb_parquet_schemawithout including the full Parquet header, keeping dependencies minimal.
35-36: MAX_FILE_SIZE correctly reflects AWS S3 50TB limit.The value
54975581388800ULLequals 50 TiB (50 × 1024⁴), which matches the AWS S3 maximum object size for multipart uploads (5 TiB max part size × 10,000 max parts).
52-74: Well-designed upload queue entry structure.The structure cleanly supports both:
- DB-tracked uploads (via
file_id,part_db_id,part_id)- Non-DB uploads (via
upload_filepointer)The
needs_upload_creationflag enables deferred multipart upload initialization during recovery.
150-157: Good state management and concurrency additions.The new fields properly support:
initial_upload_done: Tracks first upload completionis_exiting: Graceful shutdown signalingneeds_recovery: Recovery state trackingupload_queue_lock: Thread-safe queue access
175-177: Parquet integration fields properly added.The
format,schema_str, andcached_arrow_schemafields provide the necessary hooks for Parquet output support.plugins/out_s3/s3_blob.h (1)
1-43: Clean header with well-documented public API.The header properly declares the blob management API with:
- Appropriate include guards
- Minimal necessary includes
- Clear documentation comments for each function
- Signatures matching the implementations in
s3_blob.cplugins/out_s3/s3_store.h (2)
51-52: Useful context-free chunk file opener.The function provides a clean interface for opening and validating chunk files, with proper header validation and metadata skipping. The "context-free" design allows usage from C++ code (important for Parquet integration).
32-32: Field rename fromfile_pathtostream_pathin struct s3_file.The rename better reflects the field's purpose as a stream directory path. This is distinct from
upload_queue.file_path(the actual file path for blob uploads). All references throughout the codebase have been properly updated.plugins/out_s3/s3_store.c (1)
83-114: Safe list deletion during iteration; good hardening around NULLfsf->data
Deleting the file entry undermk_list_foreach_safe()is the right pattern, and returning only unlocked matches keeps semantics clear.plugins/out_s3/s3_stream.c (1)
480-515: MsgPack unpack loop handling is much safer now
HandlingEXTRA_BYTESas “success with remaining data” and explicitly erroring onMSGPACK_UNPACK_NOMEM_ERRORavoids the common infinite-loop footgun.plugins/out_s3/s3_blob.c (1)
192-244: This review comment is incorrect; the DB helper functions already clean up on error returns.Both
flb_blob_db_file_get_next_staleandflb_blob_db_file_get_next_abortedreturn -1 on error and properly clean up all dynamically allocated output parameters (sds strings) before returning:
flb_blob_db_file_get_next_stale: On error (exists == -1), cleans up and NULLs*path,*remote_id, and*tag.flb_blob_db_file_get_next_aborted: On error (exists == -1), cleans up and NULLs all five output parameters (*path,*source,*remote_id,*file_tag,*s3_key). Additionally, this function initializes all output pointers to NULL upfront for added safety.The nested allocation pattern in both functions ensures that if any
cfl_sds_create()fails, all previously allocated strings are cleaned up before the error path returns control to the caller. When the caller's loop breaks onret != 1(including error case -1), the output pointers are either NULL or have already been properly managed by the helper function.No memory leak exists in these call sites.
Likely an incorrect or invalid review comment.
tests/runtime/out_s3_queue.c (1)
1-12: Test file structure looks good.The test module follows the established pattern for S3 plugin tests with proper includes, mock chain configuration, and resource cleanup. The seven test cases cover key queue functionality including basic operations, multi-worker scenarios, retry mechanisms, timeouts, concurrency, and shutdown behavior.
Note: Line 8 directly includes
aws_client_mock.crather than linking it separately. This pattern is consistent across all S3 test files and works because each test is compiled as a separate executable, avoiding symbol collision. While not ideal from an architectural standpoint, this is an existing pattern in the codebase.tests/runtime/out_s3_blob.c (2)
1-12: Test module for blob database configuration looks well-structured.The test suite covers key blob database configuration scenarios including custom paths, delivery attempt limits (file-level, part-level, and unlimited), freshness thresholds, and chunk sizing. Each test validates that the configuration is accepted and a successful upload completes.
137-151: Good implementation of retry loop for async DB creation.The retry loop correctly handles the race condition where the database file may not exist immediately after
flb_start(). UsingS3_TEST_SLEEP_MSensures cross-platform compatibility.tests/runtime/out_s3_format.c (2)
1-15: Comprehensive format and compression test suite.The test module provides good coverage of JSON formatting options, various compression algorithms (gzip, zstd, snappy), log_key extraction, and invalid configuration handling. All
flb_lib_pushreturn values are properly checked.
851-910: Good negative test case for invalid configuration.The test correctly expects
flb_start()to fail with an invalid compression type and properly handles cleanup regardless of the outcome. The defensive check at lines 900-902 ensuresflb_stop()is only called if start unexpectedly succeeded.tests/runtime/out_s3_error_handling.c (3)
14-41: Well-defined AWS error response templates.The error response templates are properly structured XML matching real AWS S3 error formats. This enables realistic testing of error handling paths for various failure scenarios.
67-76: Good defensive NULL handling with early bailout.The pattern of checking for NULL after
s3_test_create_temp_*functions and returning early prevents potential NULL pointer dereferences. This pattern is consistently applied across all test functions in this file.
743-756: Test list is complete and properly structured.All 10 error handling tests are registered with appropriate names. The list is properly NULL-terminated for the test framework.
tests/runtime/out_s3_edge_cases.c (3)
1-13: Good inclusion of flb_compat.h for cross-platform compatibility.The inclusion of
flb_compat.hat line 4 helps ensure cross-platform compatibility for system calls and types. This addresses previous portability concerns.
938-1079: Good lifecycle stability test with proper isolation.The rapid start/stop test correctly creates separate mock chains, temp paths, and contexts for each cycle. The call to
s3_test_next()at line 1014 ensures unique temporary paths for the second cycle. This validates that the plugin handles repeated initialization/shutdown cleanly.
1081-1096: Test list is complete.All 12 edge case tests are properly registered with descriptive names.
plugins/out_s3/s3_multipart.c (19)
36-65: LGTM! Cross-platform abstractions are well implemented.The Windows-specific
O_BINARYflag addresses binary data corruption concerns (e.g., Parquet files), and thesleep_ms()wrapper properly handles platform differences betweenSleep()(Windows, milliseconds) andusleep()(POSIX, microseconds).
106-148: LGTM! ETag extraction now correctly preserves quotes.The implementation properly includes the surrounding double-quotes in the extracted ETag value, which is required by the S3 CompleteMultipartUpload API. Bounds checking is also correct.
164-216: LGTM! Part size calculation correctly enforces AWS S3 constraints.The function properly handles the tradeoffs between user configuration, file size, and AWS hard limits (5 MiB minimum, 5 GiB maximum, 10000 parts maximum).
218-248: LGTM! MD5 base64 encoding now properly NUL-terminates output.The function correctly ensures the output buffer is NUL-terminated (Lines 238-241) with a bounds check to prevent buffer overflow.
250-299: LGTM! URI construction handles both presigned URLs and standard paths correctly.The function properly URL-encodes the S3 key and handles memory cleanup on error paths.
301-355: LGTM! CompleteMultipartUpload XML payload construction is robust.The function properly validates all ETags before building the XML and handles memory cleanup on all error paths.
408-413: LGTM! Storage class header now correctly increments the header count.The
n++at line 412 properly advances the header index after setting the storage_class header, addressing the previous review concern.
420-445: LGTM! Resource cleanup is thorough and handles NULL gracefully.
451-500: LGTM! Multipart upload structure initialization with proper error handling.The function correctly manages the sequential index persistence and provides appropriate cleanup on all error paths.
502-559: LGTM! CreateMultipartUpload initiation handles the API flow correctly.
561-653: LGTM! UploadPart implementation correctly handles part uploads with optional MD5 verification.
655-736: LGTM! CompleteMultipartUpload handles success and NoSuchUpload error appropriately.The function properly returns distinct error codes for different failure modes, enabling callers to handle upload expiration gracefully.
738-800: LGTM! AbortMultipartUpload correctly handles the abort flow.
802-880: LGTM! ListParts validation correctly distinguishes between transient and definitive failures.The function now properly returns
-1for transient errors (network failures, unexpected status codes) and0for definitive NoSuchUpload responses, allowing callers to implement appropriate retry logic.
894-1016: LGTM! File part upload handles 32-bit platform safety and efficient I/O.The overflow guards using
uint64_tcasting and the direct-to-buffer read pattern address previous review concerns effectively.
1018-1043: LGTM! Abort helper provides consistent presigned URL handling with fallback.The function gracefully falls back to aborting without a presigned URL if the URL fetch fails, ensuring cleanup attempts aren't blocked by transient auth endpoint issues.
1045-1081: LGTM! Initialization helper properly sets up the multipart upload structure.
1083-1201: LGTM! Part upload orchestration with proper AWS limits and overflow-safe backoff.The function correctly enforces the 10000-part AWS limit, ensures at least one retry attempt, and uses overflow-safe exponential backoff calculation.
1203-1294: LGTM! Complete multipart upload orchestration with proper error handling.The function correctly validates uploaded byte count against file size and ensures resources are cleaned up on all paths.
tests/runtime/out_s3_multipart.c (5)
1-13: LGTM! Test includes provide necessary mock and helper functionality.Including
aws_client_mock.cdirectly (line 8) is acceptable in test code to simplify mock linkage.
14-67: LGTM! Test follows best practices for isolation and cleanup.The test properly uses unique temporary paths, condition-based waiting (
S3_TEST_WAIT_MOCK_EXHAUSTED), and thorough cleanup. Theflb_lib_push()return value is correctly validated.
69-123: LGTM! Metadata test validates ACL header propagation.
125-498: LGTM! All test functions follow consistent patterns with proper validation and cleanup.The
flb_test_multiple_data_pushestest correctly validates eachflb_lib_push()return value in the loop (lines 485-486), addressing the previous review concern.
500-512: LGTM! Test list is complete and well-organized.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
01929d6 to
bb051a2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
plugins/out_s3/s3_store.h (1)
32-32: Field rename fromfile_pathtostream_path.This semantic rename better reflects the purpose (stream directory path). Ensure all references in
s3_queue.chave been updated to usestream_pathas flagged in the previous review.tests/runtime/out_s3_edge_cases.c (1)
7-8: Including implementation file directly is non-standard.Including
aws_client_mock.cinstead of a header violates best practices. Createaws_client_mock.hwith function declarations and link the implementation at build time.
🧹 Nitpick comments (8)
tests/include/aws_client_mock.c (1)
266-299: Consider extracting data length lookup to avoid repeated iteration.The inner loop at lines 275-286 scans all config parameters each time
FLB_AWS_CLIENT_MOCK_SET_DATAis encountered. While functionally correct, this could be extracted into a helper or the loop could break earlier to improve clarity.♻️ Optional: Extract helper function
+static size_t get_response_data_len(struct flb_aws_client_mock_response *response) +{ + int j; + for (j = 0; j < response->length; ++j) { + if (response->config_parameters[j].config_parameter == + FLB_AWS_CLIENT_MOCK_SET_DATA_SIZE || + response->config_parameters[j].config_parameter == + FLB_AWS_CLIENT_MOCK_SET_DATA_LEN) { + return (size_t)(uintptr_t)response->config_parameters[j].config_value; + } + } + return 0; +}plugins/out_s3/s3_multipart.h (1)
94-100: Inconsistent parameter type:flb_sds_tvsconst char *.
s3_multipart_upload_file_partusesflb_sds_t pre_signed_urlwhile other functions usechar *pre_signed_url. Consider usingconst char *for consistency with the rest of the API, unlessflb_sds_tis specifically required here.♻️ Suggested fix for consistency
int s3_multipart_upload_file_part(struct flb_s3 *ctx, const char *file_path, off_t offset_start, off_t offset_end, struct multipart_upload *m_upload, - flb_sds_t pre_signed_url); + char *pre_signed_url);plugins/out_s3/s3.h (1)
192-197: Consider addingconstqualifier to immutable parameters.The
file_pathandtagparameters ins3_initiate_multipart_uploadappear to be read-only based on the implementation ins3.c. Addingconstwould clarify intent.♻️ Optional: Add const qualifiers
int s3_initiate_multipart_upload(struct flb_s3 *ctx, - uint64_t file_id, - const char *file_path, - const char *tag, - int tag_len); + uint64_t file_id, + const char *file_path, + const char *tag, + int tag_len);Note: Parameters already have
const, but the indentation is inconsistent.plugins/out_s3/s3_auth.c (2)
155-171: Consider checking return values from HTTP header functions.The return values from
flb_http_add_header(),flb_http_basic_auth(), andflb_http_bearer_auth()are not checked. While these functions rarely fail, authentication header failures could cause silent authorization issues.♻️ Suggested improvement
static int setup_http_client_headers(struct flb_s3 *ctx, struct flb_http_client *client) { - flb_http_add_header(client, "Accept", 6, "text/plain", 10); - flb_http_add_header(client, "User-Agent", 10, "Fluent-Bit", 10); + int ret; + + ret = flb_http_add_header(client, "Accept", 6, "text/plain", 10); + if (ret < 0) { + return -1; + } + ret = flb_http_add_header(client, "User-Agent", 10, "Fluent-Bit", 10); + if (ret < 0) { + return -1; + } if (ctx->authorization_endpoint_username && ctx->authorization_endpoint_password) { - flb_http_basic_auth(client, ctx->authorization_endpoint_username, - ctx->authorization_endpoint_password); + ret = flb_http_basic_auth(client, ctx->authorization_endpoint_username, + ctx->authorization_endpoint_password); + if (ret < 0) { + return -1; + } } else if (ctx->authorization_endpoint_bearer_token) { - flb_http_bearer_auth(client, ctx->authorization_endpoint_bearer_token); + ret = flb_http_bearer_auth(client, ctx->authorization_endpoint_bearer_token); + if (ret < 0) { + return -1; + } } return 0; }
207-207: Check return value ofsetup_http_client_headers().If header setup fails (especially authentication headers), the request should not proceed.
♻️ Suggested fix
- setup_http_client_headers(ctx, http_client); + ret = setup_http_client_headers(ctx, http_client); + if (ret < 0) { + flb_plg_error(ctx->ins, "Failed to setup HTTP client headers"); + goto cleanup; + }tests/internal/parquet.c (1)
288-304: Tempfile paths are collision-prone and not portable.The current implementation uses fixed
/tmp/flb_test_*.{msgpack,parquet}paths based only ontest_name. This can cause issues with:
- Parallel test execution: Tests with the same name running concurrently will collide
- Stale files: Failed tests may leave files that affect subsequent runs
- Windows portability:
/tmpdoesn't exist on WindowsConsider using PID + random suffix or
mkstemp()-style unique paths:snprintf(ctx->msgpack_file, sizeof(ctx->msgpack_file), "/tmp/flb_test_%s_%ld_%d.msgpack", test_name, (long)getpid(), rand());Or better, use a cross-platform temp directory helper similar to
s3_test_create_temp_db_path()froms3_test_helpers.h.tests/runtime/out_s3_edge_cases.c (2)
91-92: Redundant assertion.
retwas already checked at line 81 and hasn't been modified. This check provides no additional value.Suggested fix
- /* Verify no crash occurred - reaching here is the test */ - TEST_CHECK(ret == 0); - s3_test_cleanup_temp_db(db_path);
1072-1073: Redundant assertion.
retfromflb_lib_pushwas already checked at line 1065. This duplicate check adds no value.Suggested fix
- /* Verify second configuration worked - flb_lib_push succeeded */ - TEST_CHECK(ret >= 0); - s3_test_cleanup_temp_db(db_path);
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (33)
plugins/out_s3/CMakeLists.txtplugins/out_s3/s3.cplugins/out_s3/s3.hplugins/out_s3/s3_auth.cplugins/out_s3/s3_auth.hplugins/out_s3/s3_blob.cplugins/out_s3/s3_blob.hplugins/out_s3/s3_multipart.cplugins/out_s3/s3_multipart.hplugins/out_s3/s3_queue.cplugins/out_s3/s3_queue.hplugins/out_s3/s3_store.cplugins/out_s3/s3_store.hplugins/out_s3/s3_stream.cplugins/out_s3/s3_stream.htests/include/aws_client_mock.ctests/include/aws_client_mock.htests/include/s3_test_helpers.htests/internal/CMakeLists.txttests/internal/aws_compress.ctests/internal/aws_util.ctests/internal/parquet.ctests/runtime/CMakeLists.txttests/runtime/out_s3.ctests/runtime/out_s3_blob.ctests/runtime/out_s3_config.ctests/runtime/out_s3_edge_cases.ctests/runtime/out_s3_error_handling.ctests/runtime/out_s3_format.ctests/runtime/out_s3_multipart.ctests/runtime/out_s3_queue.ctests/runtime/out_s3_recovery.ctests/runtime/out_s3_special_features.c
💤 Files with no reviewable changes (1)
- tests/runtime/out_s3.c
🚧 Files skipped from review as they are similar to previous changes (8)
- tests/internal/aws_compress.c
- tests/internal/CMakeLists.txt
- tests/runtime/out_s3_queue.c
- plugins/out_s3/s3_auth.h
- tests/include/aws_client_mock.h
- tests/runtime/out_s3_format.c
- plugins/out_s3/CMakeLists.txt
- plugins/out_s3/s3_queue.h
🧰 Additional context used
🧠 Learnings (10)
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
tests/internal/parquet.cplugins/out_s3/s3_stream.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.
Applied to files:
tests/internal/parquet.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like `#ifdef` FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.
Applied to files:
tests/internal/parquet.cplugins/out_s3/s3_stream.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use `#ifdef` FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Applied to files:
tests/internal/parquet.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.
Applied to files:
tests/internal/parquet.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.
Applied to files:
plugins/out_s3/s3_store.ctests/runtime/out_s3_error_handling.ctests/include/s3_test_helpers.hplugins/out_s3/s3_queue.c
📚 Learning: 2025-12-17T18:45:02.283Z
Learnt from: edsiper
Repo: fluent/fluent-bit PR: 11286
File: src/flb_http_client.c:1657-1715
Timestamp: 2025-12-17T18:45:02.283Z
Learning: In the Fluent Bit OAuth2 implementation (src/flb_oauth2.c), the function flb_oauth2_get_access_token returns a pointer to the cached token stored in the OAuth2 context (ctx->access_token), not a new allocation. The OAuth2 context manages the token lifecycle, so callers must not free the returned token pointer.
Applied to files:
plugins/out_s3/s3_auth.cplugins/out_s3/s3.h
📚 Learning: 2025-09-04T12:35:36.904Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:3275-3282
Timestamp: 2025-09-04T12:35:36.904Z
Learning: The out_s3 plugin intentionally uses a simple numeric comparison for retry_limit (chunk->failures >= ctx->ins->retry_limit) rather than the standard Fluent Bit pattern that checks for FLB_OUT_RETRY_UNLIMITED (-1). The maintainer wants to keep this current behavior for consistency within the plugin.
Applied to files:
plugins/out_s3/s3_blob.cplugins/out_s3/s3_stream.cplugins/out_s3/s3_queue.cplugins/out_s3/s3_multipart.c
📚 Learning: 2025-09-04T12:35:22.872Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:1339-1344
Timestamp: 2025-09-04T12:35:22.872Z
Learning: In the Fluent Bit S3 plugin, the user prefers to maintain current retry_limit behavior without special handling for FLB_OUT_RETRY_UNLIMITED (-1), as there's no documentation indicating -1 should be used for infinite retries and consistency with current logic is preferred.
Applied to files:
plugins/out_s3/s3_blob.cplugins/out_s3/s3_queue.cplugins/out_s3/s3_multipart.c
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
Applied to files:
plugins/out_s3/s3_stream.c
🧬 Code graph analysis (14)
tests/internal/parquet.c (3)
src/flb_parquet_impl.cpp (1)
field_name(548-548)src/flb_pack.c (1)
flb_pack_json(530-535)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
plugins/out_s3/s3_store.c (2)
src/flb_fstore.c (7)
flb_fstore_file_create(153-190)flb_fstore_file_meta_set(74-108)flb_fstore_file_append(298-329)flb_fstore_create(479-525)flb_fstore_destroy(527-564)flb_fstore_stream_create(335-402)flb_fstore_file_inactive(238-256)include/fluent-bit/flb_compat.h (1)
gmtime_r(76-82)
tests/runtime/out_s3_multipart.c (4)
tests/include/s3_test_helpers.h (5)
s3_test_next(451-459)s3_test_create_temp_db_path(138-196)s3_test_create_temp_store_dir(206-270)s3_test_cleanup_temp_db(430-443)s3_test_cleanup_temp_store_dir(397-422)tests/include/aws_client_mock.c (3)
flb_aws_client_mock_configure_generator(42-47)flb_aws_client_get_mock_generator(137-140)flb_aws_client_mock_clear_generator_instance(71-74)src/flb_lib.c (1)
flb_lib_push(843-870)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
plugins/out_s3/s3_store.h (1)
plugins/out_s3/s3_store.c (1)
flb_chunk_file_open(553-632)
tests/runtime/out_s3_blob.c (2)
tests/include/s3_test_helpers.h (5)
s3_test_next(451-459)s3_test_create_temp_db_path(138-196)s3_test_create_temp_store_dir(206-270)s3_test_cleanup_temp_db(430-443)s3_test_cleanup_temp_store_dir(397-422)tests/include/aws_client_mock.c (3)
flb_aws_client_mock_configure_generator(42-47)flb_aws_client_get_mock_generator(137-140)flb_aws_client_mock_clear_generator_instance(71-74)
plugins/out_s3/s3_blob.h (1)
plugins/out_s3/s3_blob.c (4)
s3_blob_register_parts(403-436)s3_blob_process_events(451-538)s3_blob_recover_state(171-199)s3_blob_notify_delivery(124-161)
plugins/out_s3/s3_multipart.h (1)
plugins/out_s3/s3_multipart.c (13)
flb_s3_calculate_optimal_part_size(164-216)s3_multipart_get_md5_base64(218-248)s3_multipart_create_headers(357-418)s3_multipart_upload_new(451-500)s3_multipart_upload_destroy(420-445)s3_multipart_initiate(502-559)s3_multipart_upload_part(561-653)s3_multipart_complete(655-736)s3_multipart_abort(738-800)s3_multipart_abort_with_url(1022-1043)s3_multipart_check_upload_exists(806-880)s3_multipart_upload_file_part(894-1016)s3_multipart_upload_file(1203-1294)
plugins/out_s3/s3_auth.c (5)
include/fluent-bit/flb_mem.h (1)
flb_free(126-128)src/flb_utils.c (1)
flb_utils_url_split(1531-1667)src/flb_upstream.c (2)
flb_upstream_create_url(416-481)flb_upstream_conn_get(711-844)src/flb_http_client.c (6)
flb_http_add_header(1015-1047)flb_http_basic_auth(1394-1398)flb_http_bearer_auth(1406-1441)flb_http_client(866-911)flb_http_do(1645-1705)flb_http_client_destroy(1821-1828)src/aws/flb_aws_util.c (1)
flb_aws_uri_encode_path(1490-1526)
tests/runtime/out_s3_edge_cases.c (3)
tests/include/s3_test_helpers.h (5)
s3_test_next(451-459)s3_test_create_temp_db_path(138-196)s3_test_create_temp_store_dir(206-270)s3_test_cleanup_temp_db(430-443)s3_test_cleanup_temp_store_dir(397-422)tests/include/aws_client_mock.c (3)
flb_aws_client_mock_configure_generator(42-47)flb_aws_client_get_mock_generator(137-140)flb_aws_client_mock_clear_generator_instance(71-74)src/flb_lib.c (6)
flb_create(143-225)flb_input(266-276)flb_start(983-994)flb_stop(1011-1055)flb_destroy(228-263)flb_lib_push(843-870)
tests/internal/aws_util.c (1)
src/aws/flb_aws_util.c (1)
flb_get_s3_key(1311-1389)
plugins/out_s3/s3.h (1)
plugins/out_s3/s3.c (5)
cb_s3_upload(1340-1607)s3_format_chunk(1196-1337)s3_upload_file(1015-1120)s3_initiate_multipart_upload(898-1013)s3_plugin_under_test(73-79)
plugins/out_s3/s3_stream.c (4)
src/aws/flb_aws_compress.c (1)
flb_aws_compression_compress(110-127)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)src/flb_sds.c (4)
flb_sds_create_size(92-95)flb_sds_cat(120-141)flb_sds_create(78-90)flb_sds_printf(336-387)plugins/out_s3/s3_store.c (1)
flb_chunk_file_open(553-632)
plugins/out_s3/s3_queue.c (8)
include/fluent-bit/flb_mem.h (2)
flb_calloc(84-96)flb_free(126-128)plugins/out_s3/s3_store.c (3)
s3_store_buffer_put(217-258)s3_store_file_unlock(641-646)s3_store_file_lock(634-639)plugins/out_s3/s3_auth.c (1)
s3_auth_fetch_presigned_url(359-423)plugins/out_s3/s3_multipart.c (2)
s3_multipart_upload_file_part(894-1016)s3_multipart_complete(655-736)src/flb_blob_db.c (2)
flb_blob_db_lock(423-428)flb_blob_db_unlock(430-435)plugins/out_s3/s3.c (1)
s3_format_chunk(1196-1337)lib/cfl/src/cfl_sds.c (2)
cfl_sds_len(88-91)cfl_sds_destroy(127-137)plugins/out_s3/s3_blob.c (1)
s3_blob_notify_delivery(124-161)
plugins/out_s3/s3_multipart.c (5)
include/fluent-bit/flb_compat.h (1)
usleep(129-133)plugins/out_s3/s3.c (1)
write_seq_index(127-147)include/fluent-bit/flb_mem.h (2)
flb_calloc(84-96)flb_free(126-128)src/flb_http_client.c (1)
flb_http_client_destroy(1821-1828)plugins/out_s3/s3_auth.c (1)
s3_auth_fetch_presigned_url(359-423)
🔇 Additional comments (84)
tests/include/aws_client_mock.c (4)
2-10: Good documentation for test mock usage pattern.The added note clearly explains why this
.cfile is directly included rather than compiled separately, helping prevent future build issues.
54-64: LGTM: Proper double-free prevention.Clearing
flb_aws_client_mock_instancebefore callingflb_aws_client_mock_destroyprevents double-free scenarios when the mock was already freed by S3 plugin cleanup.
66-74: Appropriate addition for test isolation.This function enables safe reconfiguration between tests when the S3 plugin has already freed the mock client via
flb_destroy().
306-308: Macro condition correctly excludes DATA from standard expansion.The additional check ensures DATA is handled by the special case above rather than the generic macro expansion.
plugins/out_s3/s3_store.h (1)
51-52: Context-free chunk file opener enables C++ Parquet integration.The implementation in
s3_store.cproperly validates the chunk header, metadata length bounds, and handles all error paths with appropriate cleanup. This design allows safe usage from the Parquet encoder without requiring S3 context.plugins/out_s3/s3_multipart.h (2)
1-24: Well-structured header with appropriate includes.The header establishes a clean API surface for multipart uploads with proper include guards and dependency on
s3.handsys/types.hforoff_t.
77-88: Critical documentation for recovery correctness.The warning about using the persisted
s3_keyrather than regenerating it is essential. Key formats with timestamps or sequence numbers would produce different keys, causing recovery failures. This documentation prevents a subtle but serious bug.tests/internal/aws_util.c (3)
47-68: Comprehensive test macros for file path templating.Good coverage of the new
$FILE_PATHand$FILE_NAMEplaceholders with edge cases for path sanitization (relative paths, parent traversal, hidden files, mixed dots/slashes).
402-533: Well-structured new tests for file path key formatting.The eight new test functions follow the established pattern and cover:
- Basic
$FILE_PATHsubstitution- Basic
$FILE_NAMEsubstitution- Combined
$FILE_PATH-$FILE_NAME- Edge cases with various dot/slash prefixes
62-63: Verify the sanitization behavior prevents directory traversal.The test confirms that
../../../etc/passwdis sanitized by stripping leading../components, resulting inetc/passwd, which is then safely inserted into thelogs/$FILE_PATHformat to producelogs/etc/passwd. This prevents attackers from using relative paths to escape the intended S3 object key namespace.plugins/out_s3/s3.h (6)
29-33: Appropriate includes and forward declaration.The
pthread.hinclusion supports the newupload_queue_lockmutex, and the forward declaration forflb_parquet_schemacorrectly enables the pointer field without requiring the full Parquet header.
41-46: S3 multipart constants align with AWS limits.The constants correctly reflect AWS S3 constraints: 5 MiB minimum, 5 GiB maximum per part, and 10,000 maximum parts. The default of 100 MiB is a reasonable balance.
52-74: Well-designed upload queue structure for DB-tracked multipart uploads.The structure cleanly separates file identification (
file_id,part_db_id,part_id), file location (file_path, offsets), S3 metadata (s3_key,upload_id), and retry state. Theneeds_upload_creationflag enables lazy multipart initiation.
82-82: Fixed-size etags array matches S3_AWS_MAX_PARTS.The
etags[10000]array size matches theS3_AWS_MAX_PARTSconstant, ensuring sufficient storage for all possible parts in a multipart upload.
150-157: Thread-safety additions for concurrent workers.The
upload_queue_lockmutex andis_exitingflag enable safe operation when multiple workers are configured. Theneeds_recoveryflag supports atomic recovery coordination as shown incb_s3_upload.
175-178: Parquet integration fields properly added.The
formatfield enables runtime format selection,schema_strholds the user-provided schema configuration, andcached_arrow_schemacaches the parsed schema to avoid re-parsing in coroutines (preventing yyjson stack overflow as noted ins3_format_chunk).plugins/out_s3/s3_blob.h (1)
1-43: LGTM - Well-structured header file.The header provides a clean public API for blob handling with proper include guards and documentation comments. The function declarations align with their implementations in
s3_blob.c.plugins/out_s3/s3_auth.c (1)
359-423: Well-structured presigned URL fetch function.The function properly handles:
- Early return when authorization endpoint is not configured
- Validation that upstream is initialized before use
- URL path construction with correct slash handling for all edge cases
- Proper cleanup on all error paths
plugins/out_s3/s3_multipart.c (5)
36-65: Good cross-platform I/O abstraction.The platform-specific macros correctly handle:
- Binary mode file I/O on Windows (O_BINARY flag prevents text mode corruption for Parquet files)
- Cross-platform sleep wrapper using Sleep() on Windows and usleep() on POSIX
106-148: ETag extraction correctly preserves quotes.The function now extracts the ETag including surrounding quotes (required by AWS CompleteMultipartUpload API) and includes proper bounds checking.
922-946: Thorough overflow protection for 32-bit platforms.The code properly guards against:
- Negative offsets
- Invalid offset ranges (end <= start)
- 32-bit size_t truncation when casting from off_t
- Defensive truncation verification
This addresses a critical correctness concern for 32-bit systems.
1099-1105: AWS 10000 parts limit properly enforced.The check at line 1101 prevents exceeding the AWS S3 maximum of 10000 parts per multipart upload.
1203-1294: Complete multipart upload workflow with proper error handling.The function:
- Uses platform-specific stat struct declaration
- Calls
s3_multipart_abort_with_url()on all error paths after initiation- Verifies uploaded bytes match file size
- Properly cleans up resources in all cases
plugins/out_s3/s3_blob.c (3)
41-122: Well-structured abort operation with proper S3 key handling.The function:
- Validates
file_remote_idbefore proceeding- Prefers stored
s3_keyover regeneration (critical for time-based key formats)- Properly cleans up
pre_signed_urlin error paths- Logs warnings when falling back to key regeneration
285-286: Good handling of FLB_OUT_RETRY_UNLIMITED.The check at line 285 correctly handles the unlimited retry case by comparing against
FLB_OUT_RETRY_UNLIMITEDbefore the numeric comparison.
451-538: Proper error handling for blob event processing.The function correctly:
- Uses
int64_t file_idto receiveflb_blob_db_file_insert()return value without truncation- Checks
file_id < 0to catch all negative error codes (not just -1)- Defensively frees partial allocations when
flb_input_blob_file_get_info()fails- Cleans up database entries on registration/queue failures
plugins/out_s3/s3_queue.c (6)
22-30: Good cross-platform file existence check.The
flb_accesswrapper properly handles the portability difference between_access()on Windows andaccess()on POSIX systems.
444-474: Proper ETag memory management.The code correctly:
- Frees the ETag SDS after persisting to database on success (lines 452-454)
- Frees any allocated ETag on failure path (lines 470-474)
- Uses bounds checking before array access
598-603: AWS part count validation properly enforced.The validation ensures
part_countis within valid range (1-10000) before allocating and copying ETags, preventing out-of-bounds access and excessive allocation.
1042-1055: Endpoint mismatch handling corrected.The code now properly resets
in_progressflag before skipping files with endpoint mismatch (line 1047), preventing files from being hidden until zombie cleanup.The loop correctly terminates because:
- Each file's parts are marked
in_progress=1(line 1052) immediately after retrievalflb_blob_db_file_get_next_pendingexcludes files with in-progress parts- This prevents the same file from being returned on subsequent iterations
1167-1228: Correct fstore recovery for legacy buffered chunks.The function properly:
- Skips metadata streams
- Respects file lock state
- Enforces retry limits
- Adds files without DB tracking (
file_id=0) for backward compatibility
719-740: Retry limit handling consistent with plugin convention.The retry limit check at line 721 uses direct numeric comparison (
entry->retry_counter >= ctx->ins->retry_limit), which is consistent with the S3 plugin's established pattern. Based on learnings, this is intentional for consistency within the plugin.tests/runtime/CMakeLists.txt (2)
238-246: LGTM! Expanded S3 test coverage.The replacement of a single S3 test with nine specialized test targets provides comprehensive coverage across configuration, multipart, format, error handling, edge cases, queue, recovery, and blob functionality.
300-306: LGTM! Correct Windows library linkage.The conditional linkage of
ole32andrpcrt4libraries for S3 tests on Windows is correct. These are required for GUID generation (CoCreateGuid,UuidToStringA,RpcStringFreeA) used ins3_test_helpers.hfor creating unique temp paths.plugins/out_s3/s3_store.c (6)
23-49: LGTM! Well-designed cross-platform path length handling.The
FLB_PATH_MAXconstant correctly handles:
- Windows:
MAX_PATH/_MAX_PATHwith 260 fallback- POSIX:
PATH_MAXfromlimits.hwith 4096 fallbackThe inclusion of
<chunkio/cio_file_st.h>properly sources the chunk format constants from the canonical location.
116-140: LGTM! Proactive buffer monitoring.The
check_buffer_spacefunction provides good observability:
- Early return for unlimited buffer (
store_dir_limit_size == 0)- Clear error when buffer is full
- Warning at 80% threshold to allow for proactive intervention
142-159: LGTM! Centralized cleanup reduces code duplication.The
cleanup_failed_filehelper properly handles cleanup of both thes3_filecontext (includingstream_pathSDS) and the underlyingflb_fstore_file, with clear logging of the failure reason.
260-295: LGTM! Correct cross-platform stat implementation.The
calculate_chunk_data_sizefunction properly uses:
_stat/struct _staton Windowsstat/struct staton POSIXThe path truncation check at line 273 correctly guards against buffer overflow.
383-407: LGTM! Correct cross-platform gmtime implementation.The
create_stream_namefunction properly handles the different signatures:
- Windows:
gmtime_s(struct tm*, const time_t*)returns 0 on success- POSIX:
gmtime_r(const time_t*, struct tm*)returns pointer on successThe consistent hyphen-based time format (
%Y-%m-%dT%H-%M-%S) ensures cross-platform compatibility.
553-632: LGTM! Robust chunk file validation.The
flb_chunk_file_openfunction implements thorough validation:
- Magic byte verification (
CIO_FILE_ID_00,CIO_FILE_ID_01)- File size validation before seeking past metadata
- Proper error handling with file closure on all failure paths
- Safe metadata length validation against remaining file bytes (lines 613-619)
The use of canonical chunkio constants ensures compatibility with the file format specification.
tests/runtime/out_s3_multipart.c (5)
1-13: LGTM! Well-structured test file with proper dependencies.The includes provide all necessary components:
- AWS mock framework for simulating S3 responses
- Test helpers for temp path management
- S3 plugin internals for multipart testing
15-67: LGTM! Comprehensive basic multipart test.The test properly:
- Creates unique temp paths using
s3_test_create_temp_db_pathands3_test_create_temp_store_dir- Configures mock responses for complete multipart flow (POST create → PUT upload → POST complete)
- Validates
flb_lib_pushreturn value- Uses
S3_TEST_WAIT_MOCK_EXHAUSTEDfor deterministic waiting- Performs full cleanup including mock generator, temp DB, and store directory
229-277: LGTM! Good failure scenario coverage.The
flb_test_create_multipart_failuretest correctly configures only the initial POST with a 500 error to verify error handling for CreateMultipartUpload failures.
442-498: LGTM! Multiple data pushes with return value validation.The loop at lines 484-488 correctly:
- Captures
flb_lib_pushreturn value- Validates each push with
TEST_CHECK(ret >= 0)- Uses
S3_TEST_SLEEP_MS(100)between pushes for timingThis addresses the previous review concern about ignoring
flb_lib_pushresults.
500-512: LGTM! Comprehensive TEST_LIST coverage.The test list covers the full multipart lifecycle: creation, metadata, upload, completion, failure handling, retry behavior, and abort scenarios.
tests/internal/parquet.c (4)
20-34: LGTM! Complete include set.All required standard headers are explicitly included (
stdio.h,stdlib.h,string.h,stdint.h,math.h), and the compression constants are properly sourced fromflb_aws_compress.h.
69-136: LGTM! Complete ChunkIO header initialization.The
write_msgpack_to_chunk_filefunction now properly initializes all header fields:
- Magic bytes (0xC1, 0x00)
- Content length in big-endian (bytes 10-13)
- Metadata length in big-endian (bytes 22-23)
- Reserved fields zeroed
This matches the ChunkIO specification in
lib/chunkio/include/chunkio/cio_file_st.h.
1941-2016: LGTM! Safe NaN/Infinity generation.Using
NAN,INFINITY, and-INFINITYmacros frommath.his the correct approach, avoiding potential SIGFPE from division-by-zero if FP exceptions are enabled.
2081-2156: LGTM! Well-organized test registry.The
TEST_LISTis clearly categorized with 44 tests covering:
- Functional tests (9)
- Data quality tests (12)
- Scale & boundary tests (9)
- Compression tests (1 consolidated)
- Negative tests (10)
- Real-world scenarios (3)
This provides comprehensive coverage of the Parquet encoder functionality.
tests/runtime/out_s3_blob.c (4)
1-13: LGTM! Consistent test file structure.The includes match the pattern from
out_s3_multipart.c, providing all necessary components for mock-based S3 blob testing.
84-164: LGTM! Robust database creation verification.The
flb_test_blob_database_custom_pathtest properly handles the async nature of database creation:
- Retry loop with
max_retries = 10(up to ~1s total wait)- Uses
S3_TEST_SLEEP_MS(100)for cross-platform sleep- Clear
TEST_MSGon assertion for debuggingThis addresses the previous review concern about flaky stat() checks.
166-373: LGTM! Comprehensive delivery attempt configuration tests.The tests verify that various delivery attempt configurations are accepted:
file_delivery_attempt_limitwith finite value (3)part_delivery_attempt_limitwith finite value (5)- Unlimited attempts with
-1The comment block (lines 166-172) clearly documents that these tests verify configuration acceptance, not actual retry behavior.
585-595: LGTM! Complete test coverage for blob functionality.The TEST_LIST covers all key blob database features:
- Basic and custom path configuration
- File and part delivery attempt limits
- Unlimited retry configuration
- Upload freshness thresholds
- Chunk size configuration
plugins/out_s3/s3_stream.h (1)
1-112: Well-structured streaming interface header.The header defines a clean callback-based API for per-record streaming processing. The documentation comments are thorough, the include dependencies are appropriate (including the previously-flagged
<sys/types.h>foroff_t), and thestream_processor_contextstruct provides useful tracking for processed records and bytes written.tests/include/s3_test_helpers.h (2)
1-511: Comprehensive cross-platform test helper utilities.The implementation addresses all previously flagged concerns:
- Safe directory cleanup using platform APIs instead of
system()- Windows path separator handling in safe path validation
- Atomic temp file creation using
GetTempFileNameA(Windows) andmkstemps/mkdtemp(POSIX)- Deterministic polling-based mock exhaustion via
S3_TEST_WAIT_MOCK_EXHAUSTEDThe helper macros and functions provide good test isolation and cross-platform compatibility.
206-270: Windows library linking for GUID functions is properly configured.The implementation correctly uses
CoCreateGuid(),UuidToStringA(), andRpcStringFreeA()for unique directory naming on Windows. The required librariesole32andrpcrt4are properly linked intests/runtime/CMakeLists.txt(line 304) with appropriate Windows-only and S3-test-only conditionals.tests/runtime/out_s3_config.c (4)
19-83: Well-structured test with proper mock chain and cleanup.The test correctly:
- Sets up mock chain with METHOD expectations
- Creates isolated temp paths via
s3_test_create_temp_db_pathands3_test_create_temp_store_dir- Uses deterministic
S3_TEST_WAIT_MOCK_EXHAUSTEDinstead of fixed sleep- Cleans up all resources including mock generator, temp files, and allocated memory
777-835: Good use of HEADER expectation for ACL validation.The
flb_test_canned_acl_optionstest properly validates that thex-amz-aclheader is set correctly for each ACL option usingexpect(HEADER, "x-amz-acl", acls[i]). This is a good pattern that other configuration tests could adopt to verify their respective headers.
1282-1300: Complete TEST_LIST registration.All 15 test functions are properly registered in the TEST_LIST for execution.
7-8: This pattern is intentional and documented. Theaws_client_mock.cfile is explicitly designed to be directly included in test files, with each test built as a standalone executable. The header file clearly states: "Usage:#includeboth .h and .c files in test files" and notes the limitation that multiple tests should not be compiled into a single executable. The static functions inaws_client_mock.cprovide file-scope isolation, preventing symbol conflicts. This same pattern is consistently used across 10+ test files in the codebase (out_s3_*.c,filter_aws.c, etc.), confirming it is an established approach rather than a mistake.tests/runtime/out_s3_error_handling.c (4)
14-42: Well-organized AWS error response templates.The error response macros provide realistic S3 error XML payloads for testing various error scenarios. These are properly formatted and cover key error codes that the S3 plugin should handle.
44-106: Proper NULL handling with early bailout.The test correctly handles potential NULL returns from
s3_test_create_temp_db_pathands3_test_create_temp_store_dirwith an explicit early return (lines 71-75), preventing NULL pointer dereferences in subsequent code. This pattern is consistently applied across all tests in this file.
381-452: SlowDown throttling test documents plugin behavior.The comment on lines 392-393 clarifies important plugin behavior: "S3 plugin removes file from queue on first failure without retry". This documentation is valuable for understanding the expected test outcome and plugin semantics.
743-756: Complete TEST_LIST for error handling scenarios.All 10 error handling test functions are properly registered, covering authentication, resource, throttling, server errors, retry behavior, and cleanup scenarios.
plugins/out_s3/s3_stream.c (9)
20-45: Clean platform abstraction for file operations.The platform-specific macros (
s3_unlink,s3_close,s3_fdopen) provide a clean abstraction layer. The Windows implementation correctly usesDeleteFilefor unlink and handles the different file descriptor semantics.
60-143: Chunked compression with proper resource cleanup.The
stream_compress_file_chunkedfunction:
- Allocates read buffer with proper NULL check
- Handles remaining bytes calculation correctly
- Frees
compressed_chunkafter each successful write- Includes defensive cleanup in the
cleanupsection for both buffers- Checks for read errors via
ferror()The cleanup block at lines 134-140 now properly frees
compressed_chunkif it's non-NULL, addressing the past review concern about potential leaks fromflb_gzip_compress.
145-216: Robust offset validation and error handling.The function properly validates:
offset_start < 0(line 169)offset_end > 0 && offset_end <= offset_start(line 174)And cleans up the output file on failure (lines 211-213). The compression type logging (lines 155-158) now correctly handles the "unknown" case.
424-453: Windows-compatible rename handling.The code correctly closes the file handle before calling
rename()(line 428-429) to support Windows, where renaming an open file fails. The file is then reopened in append mode after successful rename. This addresses the past review concern about Windows compatibility.
471-490: Proper resource cleanup on initialization failure.The code now correctly calls
stream_context_destroy()when buffer allocation (line 475) or unpacker initialization (line 481) fails, preventing resource leaks. Theunpacker_initializedandresult_initializedflags ensure the destroy function only cleans up what was actually allocated.
492-527: Correct msgpack unpacker return code handling.The function now properly handles all
msgpack_unpacker_next()return codes:
MSGPACK_UNPACK_SUCCESSandMSGPACK_UNPACK_EXTRA_BYTES: Process the unpacked recordMSGPACK_UNPACK_NOMEM_ERROR: Log error and return -1MSGPACK_UNPACK_PARSE_ERROR: Log error and return -1MSGPACK_UNPACK_CONTINUE: Loop condition handles this naturallyThe comment at lines 503-507 correctly explains that both SUCCESS and EXTRA_BYTES indicate a complete message was unpacked.
591-659: Clean entry point with proper success handling for empty input.The function correctly returns success (0) with NULL output for empty input (lines 610-615) and empty processor output (lines 639-645), rather than treating these as errors. This matches the expected "no-op" behavior noted in past reviews.
661-689: JSON processor with proper error propagation.The
stream_json_processornow returns -1 on JSON conversion failure (lines 673-676), properly propagating the error to the caller. This addresses the past review concern about silently dropping records.
691-793: Well-structured log_key processor.The log_key extraction logic is well organized:
find_log_key_in_map: Searches for the configured log key in the mapwrite_string_value: Writes string/binary values directlywrite_json_value: Converts non-string values to JSONstream_log_key_processor: Orchestrates the flowThe function correctly handles missing keys by returning 0 (skip record) rather than failing.
tests/runtime/out_s3_edge_cases.c (11)
104-184: LGTM!Test structure is correct with proper mock setup, data push, wait, and cleanup.
189-266: LGTM!Concurrent upload test with 10 workers is properly configured.
271-347: LGTM!Minimal timeout test follows the established pattern correctly.
353-431: LGTM!Good edge case coverage verifying timeout triggers before reaching file size threshold.
436-514: LGTM!Good coverage for S3 key format with tag segment placeholders.
519-601: LGTM!Tag delimiter configuration test is well structured.
606-682: LGTM!Custom store directory test follows established patterns.
687-766: LGTM!Minimum boundary configuration test is correctly implemented.
771-850: LGTM!Maximum boundary configuration test properly validates large config values while keeping timeout practical.
855-933: LGTM!Compression with small data test correctly uses
sizeof(...) - 1for payload length.
1082-1096: LGTM!Test list correctly registers all 12 test functions with proper NULL termination.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
bb051a2 to
ff0643c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@plugins/out_s3/s3_blob.c`:
- Around line 424-429: The error check after calling
flb_blob_db_file_part_insert currently compares ret == -1 which won't catch the
DB error constant; change the check to detect negative/error returns (e.g. if
(ret < 0) or if (ret != FLB_BLOB_DB_SUCCESS)) and keep the existing
flb_plg_error(ctx->ins, ...) and return -1 path so database insertion failures
from flb_blob_db_file_part_insert (which returns
FLB_BLOB_DB_ERROR_FILE_PART_INSERT on error) are properly handled, matching the
flb_blob_db_file_insert pattern used elsewhere.
♻️ Duplicate comments (4)
tests/internal/parquet.c (1)
234-348: Test context framework reduces boilerplate effectively.The
test_contextstructure and associated helpers (init_test_context,run_conversion,cleanup_test_context,ctx_pack_json,ctx_validate) provide a clean abstraction for test setup and teardown. Theflb_msgpack_raw_to_parquet_file_streamingwrapper correctly manages schema lifecycle.However, the tempfile paths remain collision-prone (as noted in a previous review). Consider adding PID or using
mkstemp()for concurrent test safety.tests/runtime/out_s3_format.c (1)
9-12: Including.cfile directly is non-standard but works for this test structure.Line 10 includes
aws_client_mock.cdirectly. While unconventional and would cause linker errors if multiple test files were compiled into a single executable, this approach is documented inaws_client_mock.h(lines 6-8) as a known limitation. Since each S3 test file is compiled as a separate executable (perCMakeLists.txt), this works correctly.For future maintainability, consider compiling
aws_client_mock.cas a separate object and linking it, but this would require build system changes.tests/runtime/out_s3_error_handling.c (1)
7-8: Direct inclusion of implementation file is non-standard.Including
aws_client_mock.cdirectly rather than linking against it is an architectural smell, though consistent with other test files in this PR. This works because each test is compiled separately, avoiding linker conflicts.Consider creating a proper header (
aws_client_mock.h) with declarations and linking the implementation at build time for better separation of concerns.tests/runtime/out_s3_edge_cases.c (1)
7-8: Direct inclusion of implementation file.Same pattern as other test files in this PR - including
aws_client_mock.cdirectly. Consistent but non-standard.
🧹 Nitpick comments (8)
tests/internal/parquet.c (1)
905-941: Consider usingflb_malloc/flb_freefor consistency.The test uses standard
malloc/freefor the 100KB string allocation while other parts of Fluent Bit useflb_malloc/flb_free. For consistency with the codebase memory management patterns, consider using the Fluent Bit wrappers.♻️ Suggested change
- char *long_str = (char *)malloc(long_str_size); + char *long_str = flb_malloc(long_str_size); ... - free(long_str); + flb_free(long_str);plugins/out_s3/s3_blob.h (1)
1-43: LGTM - Clean API header.The header correctly declares the blob handling API with proper include guards and minimal dependencies. Function signatures match their implementations in
s3_blob.c.Minor: The parameter alignment in
s3_blob_notify_deliveryhas inconsistent indentation (extra leading spaces on lines 37-41). Consider aligning parameters consistently:♻️ Optional formatting fix
/* Send delivery notification to input plugin */ int s3_blob_notify_delivery(struct flb_s3 *ctx, - struct flb_config *config, - cfl_sds_t source, - cfl_sds_t file_path, - uint64_t file_id, - int success); + struct flb_config *config, + cfl_sds_t source, + cfl_sds_t file_path, + uint64_t file_id, + int success);tests/include/s3_test_helpers.h (1)
138-195: LGTM - Atomic temp file creation.Good use of platform-specific atomic file creation (
GetTempFileNameAon Windows,mkstempon POSIX) to avoid race conditions in temp path generation.Consider using
strncpyorsnprintfinstead ofstrcpyat line 187 for defense-in-depth, even though the buffer size is correctly controlled:♻️ Optional defensive change
path = flb_malloc(strlen(template_path) + 1); if (path) { - strcpy(path, template_path); + snprintf(path, strlen(template_path) + 1, "%s", template_path); }plugins/out_s3/s3_auth.h (1)
41-47: Consider using the enum type instead ofintforurl_typeparameter.The
s3_auth_fetch_presigned_urlfunction takesint url_typebut should uses3_presigned_url_type_tfor better type safety and self-documentation.♻️ Suggested change
/* Unified presigned URL fetcher - works for both standard and blob uploads */ int s3_auth_fetch_presigned_url(struct flb_s3 *ctx, flb_sds_t *result_url, - int url_type, + s3_presigned_url_type_t url_type, const char *s3_key, const char *upload_id, int part_number);plugins/out_s3/s3_stream.h (1)
80-88: Consider clarifying theout_bufparameter documentation.The comment "Output buffer (FILE: marker for temp file)" on line 76 is unclear. It would help to explain what "FILE: marker" means and how callers should use this parameter.
plugins/out_s3/s3_store.c (1)
391-401: Consider using thegmtime_rwrapper for consistency.A
gmtime_rwrapper is available inflb_compat.hthat handles the parameter order conversion for Windows (gmtime_s). Using it would eliminate the platform-specific preprocessor block and provide a single consistent code path across platforms.tests/runtime/out_s3_config.c (2)
1-13: Including.cfile directly is unconventional but acceptable for test scaffolding.The direct inclusion of
aws_client_mock.cat line 8 is unusual (typically.cfiles are compiled separately). This pattern works for isolated test files but can cause issues with multiple translation units. If this test file is part of a larger test suite compiled together, consider moving to proper compilation unit separation.
762-835: Good use of HEADER expectation for ACL verification.Unlike the
storage_class_variationstest which only checks METHOD, this test includesexpect(HEADER, "x-amz-acl", acls[i])at line 782, which actually validates that the configuredcanned_aclvalue is sent in the request headers. This is a more thorough test.Consider adding similar HEADER expectations to
flb_test_storage_class_variationsto verify thex-amz-storage-classheader is actually sent.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (33)
plugins/out_s3/CMakeLists.txtplugins/out_s3/s3.cplugins/out_s3/s3.hplugins/out_s3/s3_auth.cplugins/out_s3/s3_auth.hplugins/out_s3/s3_blob.cplugins/out_s3/s3_blob.hplugins/out_s3/s3_multipart.cplugins/out_s3/s3_multipart.hplugins/out_s3/s3_queue.cplugins/out_s3/s3_queue.hplugins/out_s3/s3_store.cplugins/out_s3/s3_store.hplugins/out_s3/s3_stream.cplugins/out_s3/s3_stream.htests/include/aws_client_mock.ctests/include/aws_client_mock.htests/include/s3_test_helpers.htests/internal/CMakeLists.txttests/internal/aws_compress.ctests/internal/aws_util.ctests/internal/parquet.ctests/runtime/CMakeLists.txttests/runtime/out_s3.ctests/runtime/out_s3_blob.ctests/runtime/out_s3_config.ctests/runtime/out_s3_edge_cases.ctests/runtime/out_s3_error_handling.ctests/runtime/out_s3_format.ctests/runtime/out_s3_multipart.ctests/runtime/out_s3_queue.ctests/runtime/out_s3_recovery.ctests/runtime/out_s3_special_features.c
💤 Files with no reviewable changes (1)
- tests/runtime/out_s3.c
🚧 Files skipped from review as they are similar to previous changes (4)
- tests/internal/CMakeLists.txt
- tests/internal/aws_compress.c
- tests/include/aws_client_mock.c
- plugins/out_s3/CMakeLists.txt
🧰 Additional context used
🧠 Learnings (10)
📚 Learning: 2025-09-04T12:35:36.904Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:3275-3282
Timestamp: 2025-09-04T12:35:36.904Z
Learning: The out_s3 plugin intentionally uses a simple numeric comparison for retry_limit (chunk->failures >= ctx->ins->retry_limit) rather than the standard Fluent Bit pattern that checks for FLB_OUT_RETRY_UNLIMITED (-1). The maintainer wants to keep this current behavior for consistency within the plugin.
Applied to files:
plugins/out_s3/s3_blob.cplugins/out_s3/s3_stream.cplugins/out_s3/s3_multipart.cplugins/out_s3/s3_queue.c
📚 Learning: 2025-09-04T12:35:22.872Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:1339-1344
Timestamp: 2025-09-04T12:35:22.872Z
Learning: In the Fluent Bit S3 plugin, the user prefers to maintain current retry_limit behavior without special handling for FLB_OUT_RETRY_UNLIMITED (-1), as there's no documentation indicating -1 should be used for infinite retries and consistency with current logic is preferred.
Applied to files:
plugins/out_s3/s3_blob.cplugins/out_s3/s3_auth.cplugins/out_s3/s3_multipart.cplugins/out_s3/s3_queue.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.
Applied to files:
tests/runtime/out_s3_error_handling.cplugins/out_s3/s3_store.ctests/include/s3_test_helpers.hplugins/out_s3/s3_queue.c
📚 Learning: 2025-12-17T18:45:02.283Z
Learnt from: edsiper
Repo: fluent/fluent-bit PR: 11286
File: src/flb_http_client.c:1657-1715
Timestamp: 2025-12-17T18:45:02.283Z
Learning: In the Fluent Bit OAuth2 implementation (src/flb_oauth2.c), the function flb_oauth2_get_access_token returns a pointer to the cached token stored in the OAuth2 context (ctx->access_token), not a new allocation. The OAuth2 context manages the token lifecycle, so callers must not free the returned token pointer.
Applied to files:
plugins/out_s3/s3_auth.cplugins/out_s3/s3.h
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
tests/internal/parquet.cplugins/out_s3/s3_stream.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.
Applied to files:
tests/internal/parquet.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like `#ifdef` FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.
Applied to files:
tests/internal/parquet.cplugins/out_s3/s3_stream.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use `#ifdef` FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Applied to files:
tests/internal/parquet.cplugins/out_s3/s3_stream.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.
Applied to files:
tests/internal/parquet.cplugins/out_s3/s3_stream.c
📚 Learning: 2025-10-23T07:43:16.216Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 11059
File: plugins/in_tail/tail_file.c:1618-1640
Timestamp: 2025-10-23T07:43:16.216Z
Learning: In plugins/in_tail/tail_file.c, when truncate_long_lines is enabled and the buffer is full, the early truncation path uses `lines > 0` as the validation pattern to confirm whether process_content successfully processed content. This is intentional to track occurrences of line processing rather than byte consumption, and consuming bytes based on `processed_bytes > 0` would be overkill for this validation purpose.
Applied to files:
plugins/out_s3/s3_stream.c
🧬 Code graph analysis (17)
plugins/out_s3/s3_blob.h (1)
plugins/out_s3/s3_blob.c (4)
s3_blob_register_parts(403-436)s3_blob_process_events(451-538)s3_blob_recover_state(171-199)s3_blob_notify_delivery(124-161)
tests/include/aws_client_mock.h (1)
tests/include/aws_client_mock.c (1)
flb_aws_client_mock_clear_generator_instance(71-74)
plugins/out_s3/s3_blob.c (6)
src/aws/flb_aws_util.c (1)
flb_get_s3_key(1311-1389)plugins/out_s3/s3_auth.c (1)
s3_auth_fetch_presigned_url(359-423)plugins/out_s3/s3_multipart.c (2)
s3_multipart_abort(738-800)s3_multipart_check_upload_exists(806-880)src/flb_input_blob.c (1)
flb_input_blob_delivery_notification_destroy(222-231)src/flb_blob_db.c (2)
flb_blob_db_lock(423-428)flb_blob_file_update_remote_id(586-617)plugins/out_s3/s3_queue.c (1)
s3_queue_add_pending_file(269-282)
tests/runtime/out_s3_edge_cases.c (2)
tests/include/s3_test_helpers.h (5)
s3_test_next(449-457)s3_test_create_temp_db_path(138-195)s3_test_create_temp_store_dir(205-268)s3_test_cleanup_temp_db(428-441)s3_test_cleanup_temp_store_dir(395-420)tests/include/aws_client_mock.c (3)
flb_aws_client_mock_configure_generator(42-47)flb_aws_client_get_mock_generator(137-140)flb_aws_client_mock_clear_generator_instance(71-74)
plugins/out_s3/s3_store.h (1)
plugins/out_s3/s3_store.c (1)
flb_chunk_file_open(553-632)
plugins/out_s3/s3_auth.h (1)
plugins/out_s3/s3_auth.c (3)
s3_auth_init_endpoint(79-128)s3_auth_request_presigned_url(173-260)s3_auth_fetch_presigned_url(359-423)
plugins/out_s3/s3_queue.h (1)
plugins/out_s3/s3_queue.c (11)
s3_queue_remove(317-326)s3_queue_buffer_chunk(366-388)s3_queue_add_file(126-140)s3_queue_add_part(142-212)s3_queue_add_pending_file(269-282)s3_queue_add_file_unlocked(68-121)s3_queue_add_pending_file_unlocked(218-264)s3_queue_process_entry(765-824)s3_queue_recover_all(867-922)s3_queue_recover_from_database(1015-1103)s3_queue_recover_from_fstore(1168-1228)
tests/internal/parquet.c (3)
src/flb_parquet_impl.cpp (1)
field_name(548-548)src/flb_pack.c (1)
flb_pack_json(530-535)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
tests/runtime/out_s3_format.c (3)
tests/include/s3_test_helpers.h (5)
s3_test_next(449-457)s3_test_create_temp_db_path(138-195)s3_test_create_temp_store_dir(205-268)s3_test_cleanup_temp_db(428-441)s3_test_cleanup_temp_store_dir(395-420)tests/include/aws_client_mock.c (3)
flb_aws_client_mock_configure_generator(42-47)flb_aws_client_get_mock_generator(137-140)flb_aws_client_mock_clear_generator_instance(71-74)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
tests/runtime/out_s3_blob.c (2)
tests/include/s3_test_helpers.h (5)
s3_test_next(449-457)s3_test_create_temp_db_path(138-195)s3_test_create_temp_store_dir(205-268)s3_test_cleanup_temp_db(428-441)s3_test_cleanup_temp_store_dir(395-420)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
plugins/out_s3/s3_multipart.h (1)
plugins/out_s3/s3_multipart.c (13)
flb_s3_calculate_optimal_part_size(164-216)s3_multipart_get_md5_base64(218-248)s3_multipart_create_headers(357-418)s3_multipart_upload_new(451-500)s3_multipart_upload_destroy(420-445)s3_multipart_initiate(502-559)s3_multipart_upload_part(561-653)s3_multipart_complete(655-736)s3_multipart_abort(738-800)s3_multipart_abort_with_url(1022-1043)s3_multipart_check_upload_exists(806-880)s3_multipart_upload_file_part(894-1016)s3_multipart_upload_file(1203-1294)
plugins/out_s3/s3_store.c (4)
src/flb_sds.c (4)
flb_sds_create_size(92-95)flb_sds_printf(336-387)flb_sds_destroy(389-399)flb_sds_create(78-90)src/flb_fstore.c (7)
flb_fstore_file_create(153-190)flb_fstore_file_meta_set(74-108)flb_fstore_file_append(298-329)flb_fstore_create(479-525)flb_fstore_destroy(527-564)flb_fstore_stream_create(335-402)flb_fstore_file_inactive(238-256)include/fluent-bit/flb_mem.h (2)
flb_free(126-128)flb_calloc(84-96)include/fluent-bit/flb_compat.h (1)
gmtime_r(76-82)
plugins/out_s3/s3_stream.c (5)
src/aws/flb_aws_compress.c (1)
flb_aws_compression_compress(110-127)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)src/flb_sds.c (5)
flb_sds_destroy(389-399)flb_sds_create_size(92-95)flb_sds_cat(120-141)flb_sds_create(78-90)flb_sds_printf(336-387)plugins/out_s3/s3_store.c (1)
flb_chunk_file_open(553-632)tests/internal/processor.c (1)
processor(66-126)
tests/runtime/out_s3_config.c (4)
tests/include/s3_test_helpers.h (5)
s3_test_next(449-457)s3_test_create_temp_db_path(138-195)s3_test_create_temp_store_dir(205-268)s3_test_cleanup_temp_db(428-441)s3_test_cleanup_temp_store_dir(395-420)tests/include/aws_client_mock.c (3)
flb_aws_client_mock_configure_generator(42-47)flb_aws_client_get_mock_generator(137-140)flb_aws_client_mock_clear_generator_instance(71-74)src/flb_lib.c (7)
flb_create(143-225)flb_service_set(652-678)flb_input(266-276)flb_start(983-994)flb_lib_push(843-870)flb_stop(1011-1055)flb_destroy(228-263)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
plugins/out_s3/s3_stream.h (1)
plugins/out_s3/s3_stream.c (4)
stream_compress_file(147-218)stream_process_msgpack_file(593-661)stream_json_processor(663-691)stream_log_key_processor(766-795)
tests/include/s3_test_helpers.h (2)
plugins/out_s3/s3.c (1)
setenv(66-69)tests/include/aws_client_mock.c (1)
flb_aws_client_mock_generator_count_unused_requests(143-147)
tests/internal/aws_util.c (1)
src/aws/flb_aws_util.c (1)
flb_get_s3_key(1311-1389)
🔇 Additional comments (111)
tests/internal/parquet.c (9)
1-52: Well-structured test infrastructure with clear organization.The file header, includes, and external function declaration are well-organized. The
#ifdef __cplusplusguards aroundvalidate_parquet_fileare appropriate for C++ linkage.
54-136: Chunk file header implementation correctly follows ChunkIO format.The
write_msgpack_to_chunk_filefunction properly implements the 24-byte ChunkIO header with:
- Magic bytes (0xC1, 0x00)
- Content length (big-endian)
- Metadata length (big-endian)
The detailed comments documenting the format are helpful for future maintenance.
350-440: Functional tests cover core conversion paths thoroughly.
test_basic_conversion,test_multiple_records, andtest_large_record_countprovide good coverage of:
- Single record conversion with validation
- Multi-record batches (100 records)
- Large datasets (70,000 records) to trigger batch boundaries
The use of
field_expectationarrays with strategic validation points (first, middle, last records) is a good testing pattern.
442-594: Type conversion tests comprehensively cover conversion paths.The tests validate all major type conversions:
- Boolean from int/float/string
- Integer from float/string/bool
- Float from int/string/bool
- String from all types including objects/arrays (JSON serialization)
The validation of both type and value for each conversion path ensures the encoder handles coercion correctly.
1006-1086: Negative tests properly validate error handling.The destructive tests cover important error cases:
- Truncated/corrupted msgpack data
- Invalid JSON schema syntax
- Empty schema (no fields)
Each test correctly validates that the conversion returns
-1on failure.
1267-1422: Schema mismatch tests are critical for crash prevention.These tests (
test_edge_schema_more_fields,test_edge_schema_less_fields,test_edge_field_name_mismatch,test_edge_all_fields_missing) validate important edge cases where schema doesn't match data. The comment "Critical for crash fix validation" indicates these tests guard against potential segfaults or buffer overruns in the encoder.
1872-1939: Batch boundary tests are thorough.Testing at 65535, 65536, 65537, and 131072 records ensures the Parquet writer handles row group boundaries correctly. The validation of boundary records (first of batch, last of batch, cross-batch) is appropriate.
1941-2016: Special float value handling correctly uses standard macros.Using
NAN,INFINITY, and-INFINITYfrom<math.h>is the correct approach, as division-by-zero can trigger SIGFPE when floating-point exceptions are enabled.
2081-2156: Test registry is well-organized with clear categorization.The
TEST_LISTprovides excellent organization with 6 categories:
- Functional tests (9)
- Data quality tests (12)
- Scale & boundary tests (9)
- Compression tests (1, consolidated)
- Negative tests (10)
- Real-world scenarios (3)
The comments explaining optimization choices (e.g., consolidating compression tests) are helpful.
tests/include/aws_client_mock.h (2)
6-8: LGTM - Improved documentation clarity.The updated header comment clearly documents the usage pattern, limitations (no multi-test executables), and thread safety concerns. This is helpful for developers using the mock infrastructure.
192-197: LGTM - Essential for proper mock lifecycle management.This function correctly addresses the use-after-free scenario where
flb_destroy()frees the mock client internally, and a subsequent call toconfigure_generator()would otherwise attempt to destroy an already-freed instance. The documentation clearly explains when to use this.tests/runtime/CMakeLists.txt (2)
300-306: LGTM - Addresses Windows build requirements.The conditional linking of
ole32andrpcrt4for S3 tests on Windows correctly resolves the linker dependencies for GUID generation functions (CoCreateGuid,UuidToStringA,RpcStringFreeA) used ins3_test_helpers.h.
238-246: LGTM - Good test modularization.Splitting the monolithic S3 test into focused test suites (config, multipart, format, error_handling, edge_cases, special_features, queue, recovery, blob) improves maintainability and allows parallel test execution. Windows-specific linking for ole32 and rpcrt4 is correctly scoped to S3 tests only.
tests/include/s3_test_helpers.h (7)
1-43: LGTM - Well-designed cross-platform abstractions.The platform-specific macros cleanly abstract away differences between Windows and POSIX systems. The include structure properly guards Windows-specific headers.
81-121: LGTM - Test isolation counter and temp directory helper.The temp directory resolution follows standard precedence (TMPDIR → TMP → TEMP → /tmp), which is appropriate for cross-platform test execution.
205-268: LGTM - GUID-based directory uniqueness on Windows.Good use of
CoCreateGuidon Windows for globally unique directory names, addressing the previous review concern about counter-based collisions across translation units.
270-342: LGTM - Safe recursive directory removal.The implementations correctly handle recursive removal using platform-native APIs (
nftwwithFTW_DEPTH | FTW_PHYSon POSIX,FindFirstFile/FindNextFileon Windows), avoiding the command injection vulnerability from the previoussystem("rm -rf ...")approach.
351-386: LGTM - Safe path validation.The validation correctly checks both path separators (
/and\) and includes reasonable fallback paths.
449-457: LGTM - Cross-platform environment variable setting.Correctly uses
_putenv_son Windows andsetenvon POSIX. Based on learnings, Fluent Bit only supports MSVC on Windows, so this is sufficient.
475-507: LGTM - Deterministic mock polling.The polling-based approach is superior to fixed
sleep()calls as it makes tests deterministic and fail-fast when the mock chain isn't consumed as expected.tests/runtime/out_s3_format.c (6)
18-96: LGTM - Well-structured test with proper cleanup.The test correctly:
- Uses
s3_test_next()for isolation- Checks all return values (
TEST_CHECK)- Uses
S3_TEST_WAIT_MOCK_EXHAUSTEDinstead of fixed sleep- Follows proper cleanup sequence including
flb_aws_client_mock_clear_generator_instance()before freeing resources
180-267: LGTM - Good test for aggregation behavior.The streaming conversion test correctly validates that multiple
flb_lib_pushcalls are aggregated into a single chunk. The comment at lines 195-199 clearly documents the expected behavior.
269-354: LGTM - Comprehensive compression coverage.Good coverage of gzip, zstd, and snappy compression modes. Each test follows the established pattern consistently.
Also applies to: 356-435, 437-515
517-601: LGTM - log_key extraction tests.Good coverage of the
log_keyfeature both standalone and combined with compression.Also applies to: 603-687
851-910: LGTM - Negative test case.Good practice to test invalid configuration handling. The test correctly expects
flb_start()to return-1for invalid compression type, and properly handles cleanup in both success and failure cases (lines 900-903).
912-926: LGTM - Complete test registration.All 11 test functions are properly registered in
TEST_LIST.tests/internal/aws_util.c (3)
47-68: Good coverage of edge cases for file path handling.The test macros comprehensively cover various edge cases including relative paths, triple dots, parent directory traversal, hidden files, and mixed dots/slashes. This helps ensure the path sanitization logic in
flb_get_s3_keyproperly strips leading dots and slashes.
402-533: Well-structured test functions for file path features.The new test functions follow the established pattern in this file and thoroughly exercise the
$FILE_PATHand$FILE_NAMEsubstitution logic. The edge case tests verify proper handling of potentially problematic inputs like parent traversal (../../../etc/passwd) and hidden files.
200-200: Function signature update looks correct.All existing
flb_get_s3_keycalls have been updated to passNULLfor the newfile_pathparameter, maintaining backward compatibility while enabling the new functionality.tests/runtime/out_s3_queue.c (4)
14-74: Test structure is clean and follows good practices.The
flb_test_queue_basic_operationstest demonstrates a well-organized pattern: setup mock chain, create temp resources, configure and start Fluent Bit, push data, wait for completion, and clean up. This pattern is consistently applied across all tests in the file.
145-206: Good test coverage for retry mechanism.The retry test properly simulates a transient 500 error on UploadPart followed by success, verifying that the queue correctly retries failed operations. The mock chain at lines 156-161 correctly models the failure-then-success scenario.
469-479: Test list registration is complete.All seven test functions are properly registered in the TEST_LIST, ensuring they will be executed by the test framework.
7-8: This inclusion pattern is intentional and safe.The
aws_client_mock.cfile is designed to be directly included in test files. The file itself documents this pattern (see lines 1-10) and notes that each test is compiled as a separate standalone executable, which prevents symbol conflicts. The file uses static declarations for internal functions and variables, making them local to each compiled test binary. There is no risk of multiple definition errors because the tests are not linked together into a single executable.plugins/out_s3/s3_auth.h (1)
26-31: Well-designed enum for presigned URL types.The
s3_presigned_url_type_tenum provides type safety for the different multipart upload operations. This is cleaner than using magic numbers or string constants.plugins/out_s3/s3_store.c (7)
33-49: Good cross-platform path length handling.The
FLB_PATH_MAXdefinition properly handles Windows (MAX_PATH/_MAX_PATH) and POSIX (PATH_MAX) with sensible fallback values. This addresses previous portability concerns.
116-140: Buffer space check with warning threshold is a good addition.The
check_buffer_spacefunction proactively warns when the buffer is 80% full, giving operators time to react before data loss occurs. The error message at line 128-129 provides useful debugging information.
142-159: Clean helper for failed file cleanup.The
cleanup_failed_filefunction properly handles partial cleanup scenarios by checking for NULL before freeing resources. This prevents memory leaks in error paths.
244-252: Proper cleanup on append failure.The code now properly cleans up newly created files when append fails (lines 247-250), addressing the previous concern about leaving partially written buffer files.
278-288: Cross-platform stat handling is correct.Using
_staton Windows andstaton POSIX with appropriate struct types ensures portability. TheFLB_SYSTEM_WINDOWSguard is the correct macro for Fluent Bit's Windows support.
506-524: Memory leak fixed in s3_store_file_inactive.The function now properly frees
s3_file->stream_pathbefore freeing the struct (lines 517-519), addressing the previously identified memory leak.
584-620: Robust meta_len validation before seeking.The code now properly validates
meta_lenagainst the remaining file size (lines 613-620) before attempting to seek, preventing undefined behavior on corrupt chunk files. The error messages are informative and include the problematic values.plugins/out_s3/s3_stream.h (3)
23-27: Proper includes with sys/types.h for off_t.The header correctly includes
<sys/types.h>foroff_t, addressing the previous review concern about potential compilation issues.
29-43: Well-designed callback pattern for record processing.The
record_processor_fntypedef andstream_processor_contextstruct provide a clean, extensible framework for processing records in different formats (JSON, log_key, and potentially Parquet). The context tracks useful metrics likerecords_processedandbytes_written.
45-62: Good documentation for stream_compress_file.The documentation clearly explains the offset parameters, which is important since
-1foroffset_endmeans EOF. This streaming approach is memory-efficient for large files.plugins/out_s3/s3_store.h (2)
51-52: Good API design for cross-module chunk file access.The context-free chunk file opener with C++ compatibility is well-designed. The implementation (from the provided context in
s3_store.c) shows thorough validation of chunk headers, magic bytes, and metadata length bounds.
26-34: LGTM - struct definition is clean and well-documented.The
s3_filestruct with the renamedstream_pathfield is correctly implemented. The field comments clearly describe the purpose of each member. All usages ofstream_pathin s3_store.c and s3.c are consistent with this change.plugins/out_s3/s3_blob.c (6)
41-122: Well-implemented abort helper with proper validation and cleanup.The function correctly:
- Validates
file_remote_idbefore proceeding- Uses stored
s3_keywhen available with appropriate fallback warning- Cleans up all resources on error paths including the
pre_signed_urlleak fix
124-161: LGTM - Proper allocation checks and cleanup.The function correctly validates the
cfl_sds_createreturn value and cleans up the notification on path allocation failure.
171-199: LGTM - Recovery orchestrator with proper lock handling.The blob DB lock acquisition is correctly checked, and the unlock failure is appropriately logged as a warning.
201-253: Stale file recovery implementation looks correct.The function properly:
- Logs abort failures with detailed context (file_id, path, upload_id, parts, ret)
- Cleans up allocated strings each iteration
- Resets NULL pointers after cleanup to avoid double-free
Note: Passing
NULLfors3_keyat line 225 triggers the legacy fallback inabort_multipart_upload. The comment clarifies this is intentional for stale files without stored keys.
255-401: Comprehensive aborted file handling with proper validation.The function correctly:
- Uses stored
s3_keyfor upload validation when available- Provides fallback with appropriate warning for legacy data
- Logs abort failures with detailed context
- Tracks statistics for observability (resume, fresh start, discarded counts)
The
FLB_OUT_RETRY_UNLIMITEDcheck at line 285 is consistent with the project's retry handling approach. Based on learnings, the S3 plugin intentionally uses its own retry logic pattern.
451-538: Well-structured event processing with proper type handling.The function correctly:
- Uses
int64_t file_id(line 458) to matchflb_blob_db_file_insertreturn type- Checks
file_id < 0(line 496) to catch all negative error codes- Defensively frees partial allocations on
flb_input_blob_file_get_infofailure- Documents the coroutine context limitations in the architecture comment
plugins/out_s3/s3_auth.c (6)
36-77: LGTM - URL parsing with proper cleanup.The
parse_urlfunction correctly callsurl_parts_destroyon all error paths, preventing potential leaks from partial allocations.
79-128: Clean endpoint initialization with proper TLS handling.The function correctly:
- Determines HTTPS usage and creates TLS context only when needed
- Uses direct
flb_output_upstream_setcall (past upstream flags issue resolved)- Cleans up TLS context and URL parts on upstream creation failure
130-153: LGTM - Port validation with proper bounds checking.The function correctly rejects port 0 by requiring
port_val >= 1(line 141), addressing the past review concern about invalid client connection ports.
173-260: Well-structured HTTP request handling with comprehensive cleanup.The function properly:
- Uses a single
cleanuplabel for resource management- Handles both reusing existing
result_urland creating new allocation- Provides detailed error logging including response payload when available
262-357: LGTM - URL path construction with proper encoding and cleanup.The function correctly:
- Strips leading
/from encoded keys to avoid double slashes- Uses explicit
/delimiter between bucket and key components- Uses distinct endpoint paths for each operation type (create, upload, complete, abort)
- Cleans up all allocated resources on error
359-423: Comprehensive presigned URL fetching with all edge cases handled.The function correctly:
- Guards against uninitialized upstream (line 376-378)
- Prevents double-slashes in URL construction (lines 393-410)
- Frees
full_urlonflb_sds_printffailure (lines 411-414)- Returns success with
NULLresult when no endpoint is configuredtests/runtime/out_s3_multipart.c (3)
14-498: Well-structured test suite with comprehensive coverage.The tests correctly:
- Use unique temporary paths via
s3_test_create_temp_db_pathands3_test_create_temp_store_dir- Check
flb_lib_pushreturn values in all tests- Use
S3_TEST_WAIT_MOCK_EXHAUSTEDinstead of baresleep()calls- Clean up all resources including temporary files and mock state
- Cover success paths, failure scenarios, and retry behavior
500-512: LGTM - Complete test registration.All nine test functions are properly registered with matching names and NULL terminator.
7-8: This is an intentional, documented pattern in the test suite. The header filetests/include/aws_client_mock.hexplicitly specifies: "Usage:#includeboth .h and .c files in test files." This approach is used consistently across 13 test files, including all S3-related tests. No action needed.Likely an incorrect or invalid review comment.
plugins/out_s3/s3_multipart.h (4)
20-37: LGTM - Clean header setup with appropriate includes.The header correctly includes
sys/types.hforoff_tands3.hfor the S3 context type. Helper function declarations are well-documented.
39-76: Well-designed multipart lifecycle API.The API provides clean separation between:
- Lifecycle management (
new/destroy)- AWS API wrappers (initiate, upload_part, complete, abort)
- Convenience helpers (
abort_with_url)Parameter types are consistent using
const char*throughout.
77-89: Excellent documentation for recovery-critical function.The
IMPORTANTcomment clearly warns developers about the key regeneration pitfall, which is critical for upload recovery correctness. The return value semantics (1/0/-1) are also well-documented.
90-107: LGTM - Streaming upload API with appropriate types.The high-level file upload functions use:
off_tfor byte offsets (correct type for large file support)flb_sds_tfor presigned URL (appropriate for dynamic string ownership)plugins/out_s3/s3_queue.h (4)
1-24: LGTM!License header, include guards, and dependencies are properly structured.
26-87: LGTM!Function declarations are well-documented with clear parameter semantics. The unlocked variants properly indicate the caller must hold
upload_queue_lock. Return value conventions are clearly documented.
89-101: LGTM!The three-phase recovery architecture is clearly documented with consistent phase numbering (0, 1, 2). The documentation accurately describes each phase's responsibility: zombie cleanup, state transitions, and queue rebuild.
103-103: LGTM!Include guard properly closed.
tests/runtime/out_s3_error_handling.c (4)
44-106: LGTM!Test properly validates AccessDenied error handling with:
- NULL checks for temp paths with early bailout
- Polling-based wait instead of fixed sleep
- Complete cleanup of mock instance and temp resources
108-519: LGTM!All error handling tests follow a consistent and correct pattern:
- Proper NULL checks for temp resources with early bailout
- Polling-based mock exhaustion waits
- Complete cleanup of all resources
Good coverage of AWS error codes: NoSuchBucket, NoSuchUpload, InvalidAccessKeyId, SignatureDoesNotMatch, SlowDown, and InternalError.
521-741: LGTM!Auto-retry and failure cleanup tests are well-structured:
flb_test_auto_retry_enabledandflb_test_auto_retry_disabledproperly test theauto_retry_requestsconfiguration optionflb_test_failure_cleanup_and_abortvalidates the plugin's behavior when UploadPart failsAll tests maintain consistent cleanup patterns.
743-756: LGTM!Test list properly registers all 10 test functions with NULL terminator.
tests/runtime/out_s3_blob.c (4)
1-82: LGTM!Includes and first test (
flb_test_blob_database_file_config) follow the established pattern with proper mock chain setup and cleanup.
84-164: LGTM!The
flb_test_blob_database_custom_pathtest properly handles the async DB creation race condition with a retry loop using the cross-platformS3_TEST_SLEEP_MSmacro.
166-509: LGTM!Delivery attempt limit and freshness tests properly verify that:
file_delivery_attempt_limit,part_delivery_attempt_limit, andupload_part_freshness_limitconfigurations are accepted- Unlimited delivery attempts (
-1) is handled correctly- The upload flow completes successfully with these configurations
All tests maintain consistent setup and cleanup patterns.
511-595: LGTM!The
flb_test_upload_chunk_size_configtest and TEST_LIST are properly structured. All 8 blob configuration tests are registered.plugins/out_s3/s3_stream.c (8)
20-51: LGTM!Platform-specific includes and macros are properly structured:
- Windows uses
windows.h,io.h,fcntl.h,share.h,process.h- POSIX uses
unistd.h,fcntl.h- Abstraction macros (
s3_unlink,s3_close,s3_fdopen) provide clean cross-platform interface
147-218: LGTM!The
stream_compress_filefunction properly:
- Validates offset parameters with clear error messages
- Handles the "unknown" compression type in debug logging
- Cleans up the output file on failure
265-492: LGTM!The
stream_context_setupfunction properly handles:
- Secure temp file creation using platform-appropriate methods (
GetTempFileName+_sopen_son Windows,mkstempon POSIX)- Windows compatibility by closing the file before
rename()- Resource cleanup on allocation failures via
stream_context_destroy()- Dynamic buffer resizing for Windows temp paths
The TOCTOU race window on Windows between
DeleteFile()and_sopen_s()is acknowledged in comments and mitigated byGetTempFileName's unique naming.
494-529: LGTM!The
process_unpacked_recordsfunction correctly handles all msgpack return codes:
MSGPACK_UNPACK_SUCCESSandMSGPACK_UNPACK_EXTRA_BYTESboth process the unpacked record and continue the loopMSGPACK_UNPACK_NOMEM_ERRORreturns -1 with an error logMSGPACK_UNPACK_PARSE_ERRORreturns -1 with an error logThe loop properly continues processing remaining bytes when
EXTRA_BYTESis returned.
531-591: LGTM!The
process_msgpack_streamandcreate_output_file_markerfunctions are straightforward and correct.
593-661: LGTM!The
stream_process_msgpack_filefunction properly:
- Handles empty input as a no-op returning 0 (not an error)
- Handles empty output from processor as a no-op
- Maintains correct cleanup in all code paths
663-795: LGTM!The JSON and log_key processors are well-implemented:
stream_json_processorproperly returns -1 on conversion failurefind_log_key_in_maphandles both STR and BIN key typesstream_log_key_processorcorrectly writes string/bin values directly and falls back to JSON for other types
52-145: The Snappy compression implementation is correct.flb_aws_compression_compress()usesflb_snappy_compress_wrapper(), which callsflb_snappy_compress_framed_data()internally—not raw Snappy compression. This produces output in Google's Snappy framing format with stream identifiers, length fields, and CRC32C checksums, making it safe to concatenate chunks. The code and its comment are accurate.tests/runtime/out_s3_edge_cases.c (5)
18-98: LGTM!The
flb_test_empty_data_uploadtest correctly:
- Uses fixed sleep (not mock exhaustion) since no data is pushed and mock may not be consumed
- Verifies no crash by asserting
ret == 0after cleanup- Properly cleans up all resources
100-431: LGTM!These tests properly exercise edge cases:
- Large file chunking with multiple data pushes
- Maximum concurrent uploads with 10 workers
- Minimal timeout configuration
- Timeout triggering before file size threshold
All follow the established pattern with polling-based mock exhaustion and complete cleanup.
433-682: LGTM!Good coverage of S3 key format edge cases:
- Special characters and tag segment extraction with
$TAG[n]- Custom tag delimiter configuration
- Custom store directory
All tests maintain consistent patterns.
684-933: LGTM!Boundary value and compression tests are well-structured:
flb_test_config_boundary_minimumtests minimum configuration valuesflb_test_config_boundary_maximumtests high values (100M, 50M, 100 workers)flb_test_compression_small_datacorrectly usessizeof() - 1for payload length (per past review fix)
935-1096: LGTM!The
flb_test_rapid_start_stoptest properly validates:
- Complete cleanup between start/stop cycles
- Fresh resource allocation with different configurations
- No state leakage between runs
TEST_LIST correctly registers all 12 edge case tests.
tests/runtime/out_s3_config.c (2)
19-83: LGTM - Well-structured test with proper mock setup and cleanup.The test correctly sets up the mock chain with METHOD expectations, configures the S3 plugin with
s3_key_format, pushes data, and properly cleans up all resources including the temp database and store directory.
1282-1300: Test list is complete and well-organized.All 15 test functions are properly registered in the TEST_LIST with descriptive names that match the test function purposes.
plugins/out_s3/s3_multipart.c (9)
36-65: LGTM - Cross-platform I/O abstractions are well-implemented.The platform-specific macros properly address:
- Binary mode (
O_BINARY) on Windows to prevent text-mode corruption of binary data (Parquet files)- Cross-platform sleep wrapper using
Sleep()on Windows andusleep()on POSIX
106-148: LGTM - ETag extraction correctly preserves quotes.The function now correctly includes the surrounding quotes in the extracted ETag (line 141:
start_quotetoend_quote + 1), which is required by the AWS CompleteMultipartUpload API. The bounds checking at lines 130-138 prevents buffer overruns.
164-216: LGTM - Part size calculation correctly enforces AWS constraints.The function properly handles:
- AWS minimum part size (5 MiB) at line 208
- AWS maximum part size (5 GiB) at line 211
- Maximum 10,000 parts limit with adaptive sizing at lines 187-204
- Rounding to MiB/GiB boundaries for cleaner sizes
218-248: LGTM - MD5 base64 encoding with proper NUL termination.The function now correctly NUL-terminates the output buffer (lines 238-244), making it safe for callers to use
strlen()on the result. The bounds check at line 239 prevents buffer overflows.
250-299: LGTM - URI building with proper encoding and error handling.The function correctly:
- Uses
flb_aws_uri_encode_pathfor proper S3 key encoding- Handles presigned URLs as a fast path
- Handles both
/keyandkeyformats- Cleans up
encoded_keyon all error paths
357-418: LGTM - Header creation with proper counter incrementing.The function correctly increments
nafter each header assignment, including the storage_class header at line 413. The dynamic allocation matches the countedheaders_len.
894-1016: LGTM - File part upload with proper overflow protection and error handling.The function correctly:
- Guards against 32-bit overflow with
uint64_tcast comparison at line 933- Validates offset ranges before processing
- Reads directly into
part_bufferwithout redundant intermediate buffers- Handles short reads properly in the read loop
- Cleans up resources on all error paths
1083-1201: LGTM - Part upload with safe exponential backoff and AWS limits enforcement.The function correctly:
- Ensures at least 1 retry attempt with
max_attemptsat line 1097- Enforces AWS S3 maximum of 10,000 parts at line 1101
- Uses safe overflow-protected exponential backoff calculation at lines 1168-1184
- Properly cleans up presigned URLs after each use
1018-1043: LGTM - Abort helper with presigned URL support and fallback.The function correctly fetches the abort presigned URL before calling
s3_multipart_abort(), and gracefully falls back to attempting abort without the presigned URL if the fetch fails. This ensures abort is attempted even when the auth endpoint is unavailable.plugins/out_s3/s3.h (3)
29-46: LGTM - Well-defined constants and forward declaration.The header correctly:
- Forward declares
struct flb_parquet_schemato avoid circular includes- Includes
<pthread.h>and<sys/types.h>forpthread_mutex_tandoff_t- Defines MAX_FILE_SIZE as 50TB matching current AWS S3 limits
- Defines AWS multipart constraints (5 MiB min, 5 GiB max, 10,000 parts)
52-92: LGTM - Queue and multipart structures are well-designed.The
upload_queuestruct supports both file-level and part-level tracking with appropriate fields. Themultipart_uploadstruct's fixedetags[10000]array is appropriate given AWS S3's 10,000 parts maximum limit.
94-178: LGTM - Context structure with proper thread safety.The
flb_s3structure correctly adds:
pthread_mutex_t upload_queue_lockfor thread-safe queue access (line 157)formatfield for JSON/Parquet distinctioncached_arrow_schemapointer using the forward-declared typeplugins/out_s3/s3_queue.c (7)
22-30: LGTM - Cross-platform file access wrapper.The
flb_accessmacro provides proper Windows/POSIX compatibility for file existence checks, addressing the previous portability concern.
68-212: LGTM - Queue entry addition with proper locking and cleanup.The functions correctly:
- Provide both locked and unlocked variants for flexibility
- Use consistent lock/unlock patterns with
pthread_mutex_lock/unlock- Clean up all allocated resources on error paths (tag_copy, path_copy, entry)
396-485: LGTM - Part upload with proper ETag memory management.The function correctly:
- Frees the ETag SDS after persisting to database (lines 452-454)
- Cleans up the ETag on failure paths (lines 470-474)
- Uses correct argument order for
flb_blob_db_file_part_in_progresscalls
541-713: LGTM - Multipart completion with thorough cleanup.The function correctly:
- Validates
part_countbounds (0 < count ≤ 10000) before allocation at lines 599-603- Uses
s3_keyfrom the entry parameter for consistency- Cleans up all allocated resources (m_upload fields, fetched strings, ETags) in the cleanup section
- Handles both success and error paths appropriately
867-922: LGTM - Well-organized three-phase recovery architecture.The recovery process is clearly structured:
- Phase 0: Zombie cleanup (reset in_progress parts from crashed runs)
- Phase 1: State transitions (stale → pending, aborted handling)
- Phase 2: Queue rebuild from persistent storage
The DB locking around zombie cleanup (lines 877-888) ensures safe concurrent access.
1015-1103: LGTM - Database recovery with proper state management.The function correctly:
- Resets
in_progressflag before skipping on endpoint mismatch (line 1047)- Marks parts as
in_progressto prevent re-query (line 1052)- Cleans up all allocated strings in the cleanup section
- Handles both resume and new upload creation paths
1167-1228: LGTM - Fstore recovery with proper chunk handling.The function correctly:
- Uses
mk_list_foreach_safefor safe iteration- Skips the metadata stream and locked chunks
- Marks chunks exceeding retry limit as inactive
- Locks chunks before adding to queue and unlocks on failure
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
Add Parquet format support to S3 output plugin with: - Parquet writer integration - Enhanced multipart upload handling - Improved queue and store management - Stream processing updates - Configuration examples Signed-off-by: Arbin Cheng <arbin.cheng@coins.ph> Signed-off-by: Arbin <arbin.cheng@coins.ph>
ff0643c to
e7de562
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@tests/runtime/out_s3_config.c`:
- Around line 1184-1280: flb_test_invalid_parameter_combinations may call
flb_destroy() without stopping the engine if flb_start() unexpectedly returns 0;
for both failing sub-tests (the "invalid compression" branch and the
"total_file_size > 5TB" branch) check the ret value after flb_start() and if ret
== 0 call flb_stop(ctx) before calling flb_destroy(ctx) (use the existing ret
variable and the flb_stop and flb_destroy functions referenced in
flb_test_invalid_parameter_combinations).
- Around line 42-43: Several tests call
flb_aws_client_mock_clear_generator_instance() in teardown which only nullifies
the pointer and leaks the mock wrapper and its request chain when the S3 plugin
later destroys its surrogate; replace every call to
flb_aws_client_mock_clear_generator_instance() with
flb_aws_client_mock_destroy_generator() so the mock wrapper and associated
resources are freed correctly, updating occurrences around the teardown of tests
that used init_options.client_generator/ flb_aws_client_get_mock_generator()
(e.g., the calls near the teardown blocks currently using
clear_generator_instance).
In `@tests/runtime/out_s3_edge_cases.c`:
- Around line 56-78: Check return values of s3_test_create_temp_db_path("empty")
and s3_test_create_temp_store_dir("empty") for NULL before calling
flb_output_set; if either db_path or store_dir is NULL, perform the same
early-bail cleanup used in out_s3_error_handling.c (free any non-NULL resources,
destroy ctx with flb_destroy(ctx) if created, and mark the test as failed) and
return to avoid dereferencing NULL in flb_output_set. Locate the temp helpers
(s3_test_create_temp_db_path, s3_test_create_temp_store_dir), the ctx creation
(flb_create), and the flb_output_set call to insert the NULL checks and cleanup
path.
In `@tests/runtime/out_s3_error_handling.c`:
- Around line 521-669: Modify the mock chains in flb_test_auto_retry_enabled and
flb_test_auto_retry_disabled so they differ: for flb_test_auto_retry_enabled
build the FLB_AWS_CLIENT_MOCK sequence with CreateMultipartUpload (set STATUS
200), UploadPart returning a transient failure first (set STATUS 503 or 500)
followed by a successful UploadPart (set STATUS 200), then
CompleteMultipartUpload (set STATUS 200); for flb_test_auto_retry_disabled build
the chain with CreateMultipartUpload (200), a single failing UploadPart
(503/500) and then Complete (200) so the failing UploadPart is not retried. Keep
using flb_aws_client_mock_configure_generator / init_options.client_generator
and S3_TEST_WAIT_MOCK_EXHAUSTED, and update any expectations/assertions so the
enabled test consumes the extra retry response (i.e. mock chain exhausted) while
the disabled test does not retry and thus the observed request sequence differs.
In `@tests/runtime/out_s3_multipart.c`:
- Around line 69-109: The test flb_test_create_multipart_with_metadata currently
only covers canned_acl but the comment says “ACL, storage class”; update the
test to set a storage class and assert it on the CreateMultipartUpload request:
add the expected HEADER "x-amz-storage-class" in the mock chain response for the
first POST (the chain variable created by FLB_AWS_CLIENT_MOCK) and pass
"storage_class", "<SOME_CLASS>" into flb_output_set when creating the s3 output
(same call that sets "canned_acl"); ensure the header name and storage class
string match the value you set so the mock assertion triggers.
♻️ Duplicate comments (4)
tests/internal/parquet.c (1)
288-304: Tempfile strategy is collision-prone and not portable.Using fixed
/tmp/flb_test_${name}.*paths risks collisions if tests run concurrently (or reruns leave stale files), and will break on non-POSIX environments. Consider usingmkstemp()or including PID/random suffix.Proposed direction
static int init_test_context(test_context *ctx, const char *test_name) { msgpack_sbuffer_init(&ctx->sbuf); msgpack_packer_init(&ctx->packer, &ctx->sbuf, msgpack_sbuffer_write); - snprintf(ctx->msgpack_file, sizeof(ctx->msgpack_file), - "/tmp/flb_test_%s.msgpack", test_name); - snprintf(ctx->parquet_file, sizeof(ctx->parquet_file), - "/tmp/flb_test_%s.parquet", test_name); + snprintf(ctx->msgpack_file, sizeof(ctx->msgpack_file), + "/tmp/flb_test_%s_%ld.msgpack", test_name, (long)getpid()); + snprintf(ctx->parquet_file, sizeof(ctx->parquet_file), + "/tmp/flb_test_%s_%ld.parquet", test_name, (long)getpid());tests/runtime/out_s3_config.c (1)
18-557: Key format + endpoint tests don’t validate URI/host, so they can’t catch config regressionsThe
s3_key_format_*andendpoint_*tests assert only METHOD (and occasionally a header), but never assert the generated request URI/host/port—so$TAG/$INDEX/$UUIDexpansion and endpoint parsing can regress while tests still pass. Consider making the format deterministic (e.g., remove%...formatters or freeze time) and addingexpect(URI/HOST/PORT, ...)assertions via the mock.tests/runtime/out_s3_edge_cases.c (1)
6-11: Stop includingaws_client_mock.cdirectly (use header + link the object/library instead)This file still directly includes
../include/aws_client_mock.c(Line 8). That’s non-idiomatic and risks duplicate-symbol issues depending on how runtime tests are linked.tests/runtime/out_s3_format.c (1)
8-12: Stop includingaws_client_mock.cdirectlyThis file still includes
../include/aws_client_mock.c(Line 10). Prefer including only the header and linkingaws_client_mock.conce via the test build system.
🧹 Nitpick comments (10)
plugins/out_s3/s3_store.h (1)
51-52: Consider addingextern "C"guards to align header with advertised C++ compatibility.The comment claims the function "can be used from anywhere including C++", but the header lacks
extern "C"guards. Without these guards, attempting to use this function from C++ code would result in linker errors due to C++ name mangling.♻️ Proposed fix
Add
extern "C"guards to the header:`#ifndef` FLB_S3_STORE_H `#define` FLB_S3_STORE_H +#ifdef __cplusplus +extern "C" { +#endif + `#include` <fluent-bit/flb_output_plugin.h> `#include` <fluent-bit/flb_fstore.h>And at the end:
void s3_store_file_lock(struct s3_file *s3_file); void s3_store_file_unlock(struct s3_file *s3_file); +#ifdef __cplusplus +} +#endif + `#endif`plugins/out_s3/s3_multipart.h (1)
94-107: Consider consistency in string parameter types.
s3_multipart_upload_file_partusesflb_sds_tforpre_signed_url(line 100), whiles3_multipart_upload_fileusesconst char *for all string parameters (lines 104-106). This appears intentional based on the implementation where presigned URLs are managed asflb_sds_t, but documenting this design choice would improve maintainability.plugins/out_s3/s3.h (1)
192-198: Consider documenting the initiate function's relationship with multipart module.The
s3_initiate_multipart_uploadfunction in s3.h appears to be the orchestration entry point that coordinates withs3_multipart_upload_newand related functions ins3_multipart.h. A brief comment clarifying this relationship would help maintainers understand the layered architecture.tests/runtime/out_s3_config.c (2)
687-983: Option coverage gaps: tests set config but don’t assert resulting request headersExamples:
storage_classvariations (Lines 695-744) don’t assertx-amz-storage-class.content_typetests (Lines 841-912) don’t assertContent-Type.send_content_md5flag tests (Lines 914-982) don’t assertContent-MD5presence/absence.Add
expect(HEADER, ...)on the appropriate requests so these tests fail when behavior regresses.Example patch (storage_class header assertion)
@@ - chain = FLB_AWS_CLIENT_MOCK( + chain = FLB_AWS_CLIENT_MOCK( response(expect(METHOD, FLB_HTTP_POST), + expect(HEADER, "x-amz-storage-class", classes[i]), set(STATUS, 200), set(PAYLOAD, S3_TEST_MOCK_CREATE_RESP), set(PAYLOAD_SIZE, sizeof(S3_TEST_MOCK_CREATE_RESP) - 1)),
215-220: Avoid fixedS3_TEST_SLEEP_MS(2000); wait for a condition insteadThe fixed delay (Line 218) makes the test slower and can be flaky across CI variance. Prefer waiting until the first upload has completed (e.g., poll mock state/store artifacts), then push the second batch.
tests/runtime/out_s3_multipart.c (1)
34-38: Consider explicit bailout ifdb_path/store_diris NULL (don’t assumeTEST_CHECKaborts)If
TEST_CHECKis non-fatal, laterflb_output_set(..., db_path, store_dir, ...)can deref NULL.tests/runtime/out_s3_error_handling.calready uses an explicitif (...) return;pattern—worth standardizing here.tests/runtime/out_s3_edge_cases.c (2)
83-92:empty_data_uploadis “no crash” only; add an assertion that no S3 requests were madeCurrently the test just sleeps (Line 85) and stops/destroys. If the plugin accidentally uploads on empty flush, the test might still pass. If the mock API exposes “unused request count”, assert it remains unchanged (or configure the mock to fail on any request for this test).
771-833:workers=100inconfig_boundary_maximummay be unnecessarily heavy for CIIf this spawns 100 worker threads, it can slow the suite or cause resource issues. Consider using a smaller “high but reasonable” value (or guard/mark as expensive) while still validating boundary handling.
tests/runtime/out_s3_format.c (2)
17-687: Format/compression/log_key tests don’t assert the transformed upload payloadAll tests only validate that the multipart request sequence happens (mock exhausted). That won’t catch regressions where:
format=jsondoesn’t serialize correctly,compression=gzip|zstd|snappyisn’t applied,log_keyextraction doesn’t happen.If the mock can assert on request body bytes, add minimal checks (e.g., gzip magic
1f 8b, zstd magic, snappy framed header) or assert on derived key suffix/headers where applicable.
56-60: Add explicit bailout ifdb_path/store_diris NULLTo avoid NULL deref if
TEST_CHECKisn’t fatal (or temp creation fails), add an explicitif (db_path == NULL || store_dir == NULL) { ...; return; }cleanup likeout_s3_error_handling.c.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (33)
plugins/out_s3/CMakeLists.txtplugins/out_s3/s3.cplugins/out_s3/s3.hplugins/out_s3/s3_auth.cplugins/out_s3/s3_auth.hplugins/out_s3/s3_blob.cplugins/out_s3/s3_blob.hplugins/out_s3/s3_multipart.cplugins/out_s3/s3_multipart.hplugins/out_s3/s3_queue.cplugins/out_s3/s3_queue.hplugins/out_s3/s3_store.cplugins/out_s3/s3_store.hplugins/out_s3/s3_stream.cplugins/out_s3/s3_stream.htests/include/aws_client_mock.ctests/include/aws_client_mock.htests/include/s3_test_helpers.htests/internal/CMakeLists.txttests/internal/aws_compress.ctests/internal/aws_util.ctests/internal/parquet.ctests/runtime/CMakeLists.txttests/runtime/out_s3.ctests/runtime/out_s3_blob.ctests/runtime/out_s3_config.ctests/runtime/out_s3_edge_cases.ctests/runtime/out_s3_error_handling.ctests/runtime/out_s3_format.ctests/runtime/out_s3_multipart.ctests/runtime/out_s3_queue.ctests/runtime/out_s3_recovery.ctests/runtime/out_s3_special_features.c
💤 Files with no reviewable changes (1)
- tests/runtime/out_s3.c
🚧 Files skipped from review as they are similar to previous changes (7)
- tests/include/s3_test_helpers.h
- tests/internal/CMakeLists.txt
- plugins/out_s3/s3_stream.c
- plugins/out_s3/s3_auth.h
- plugins/out_s3/s3_stream.h
- plugins/out_s3/CMakeLists.txt
- plugins/out_s3/s3_multipart.c
🧰 Additional context used
🧠 Learnings (11)
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
tests/internal/parquet.ctests/internal/aws_compress.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.
Applied to files:
tests/internal/parquet.ctests/internal/aws_compress.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like `#ifdef` FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.
Applied to files:
tests/internal/parquet.ctests/internal/aws_compress.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use `#ifdef` FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Applied to files:
tests/internal/parquet.ctests/internal/aws_compress.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.
Applied to files:
tests/internal/parquet.ctests/internal/aws_compress.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.
Applied to files:
plugins/out_s3/s3_store.ctests/internal/aws_compress.ctests/runtime/out_s3_error_handling.cplugins/out_s3/s3_queue.c
📚 Learning: 2025-09-04T12:35:36.904Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:3275-3282
Timestamp: 2025-09-04T12:35:36.904Z
Learning: The out_s3 plugin intentionally uses a simple numeric comparison for retry_limit (chunk->failures >= ctx->ins->retry_limit) rather than the standard Fluent Bit pattern that checks for FLB_OUT_RETRY_UNLIMITED (-1). The maintainer wants to keep this current behavior for consistency within the plugin.
Applied to files:
plugins/out_s3/s3_blob.cplugins/out_s3/s3_queue.c
📚 Learning: 2025-09-04T12:35:22.872Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10825
File: plugins/out_s3/s3.c:1339-1344
Timestamp: 2025-09-04T12:35:22.872Z
Learning: In the Fluent Bit S3 plugin, the user prefers to maintain current retry_limit behavior without special handling for FLB_OUT_RETRY_UNLIMITED (-1), as there's no documentation indicating -1 should be used for infinite retries and consistency with current logic is preferred.
Applied to files:
plugins/out_s3/s3_blob.cplugins/out_s3/s3_auth.cplugins/out_s3/s3_queue.c
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.
Applied to files:
tests/internal/aws_compress.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
tests/internal/aws_compress.c
📚 Learning: 2025-12-17T18:45:02.283Z
Learnt from: edsiper
Repo: fluent/fluent-bit PR: 11286
File: src/flb_http_client.c:1657-1715
Timestamp: 2025-12-17T18:45:02.283Z
Learning: In the Fluent Bit OAuth2 implementation (src/flb_oauth2.c), the function flb_oauth2_get_access_token returns a pointer to the cached token stored in the OAuth2 context (ctx->access_token), not a new allocation. The OAuth2 context manages the token lifecycle, so callers must not free the returned token pointer.
Applied to files:
plugins/out_s3/s3_auth.cplugins/out_s3/s3.h
🧬 Code graph analysis (15)
tests/include/aws_client_mock.h (1)
tests/include/aws_client_mock.c (1)
flb_aws_client_mock_clear_generator_instance(71-74)
tests/runtime/out_s3_multipart.c (2)
tests/include/s3_test_helpers.h (5)
s3_test_next(449-457)s3_test_create_temp_db_path(138-195)s3_test_create_temp_store_dir(205-268)s3_test_cleanup_temp_db(428-441)s3_test_cleanup_temp_store_dir(395-420)tests/include/aws_client_mock.c (3)
flb_aws_client_mock_configure_generator(42-47)flb_aws_client_get_mock_generator(137-140)flb_aws_client_mock_clear_generator_instance(71-74)
tests/internal/parquet.c (3)
src/flb_parquet_impl.cpp (1)
field_name(548-548)src/flb_pack.c (1)
flb_pack_json(530-535)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
tests/runtime/out_s3_blob.c (4)
tests/include/s3_test_helpers.h (3)
s3_test_next(449-457)s3_test_cleanup_temp_db(428-441)s3_test_cleanup_temp_store_dir(395-420)tests/include/aws_client_mock.c (3)
flb_aws_client_mock_configure_generator(42-47)flb_aws_client_get_mock_generator(137-140)flb_aws_client_mock_clear_generator_instance(71-74)src/flb_lib.c (6)
flb_create(143-225)flb_input(266-276)flb_start(983-994)flb_lib_push(843-870)flb_stop(1011-1055)flb_destroy(228-263)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
tests/runtime/out_s3_queue.c (4)
tests/include/s3_test_helpers.h (5)
s3_test_next(449-457)s3_test_create_temp_db_path(138-195)s3_test_create_temp_store_dir(205-268)s3_test_cleanup_temp_db(428-441)s3_test_cleanup_temp_store_dir(395-420)tests/include/aws_client_mock.c (3)
flb_aws_client_mock_configure_generator(42-47)flb_aws_client_get_mock_generator(137-140)flb_aws_client_mock_clear_generator_instance(71-74)src/flb_lib.c (10)
flb_create(143-225)flb_service_set(652-678)flb_input(266-276)flb_input_set(305-335)flb_output(279-289)flb_output_set(520-551)flb_start(983-994)flb_lib_push(843-870)flb_stop(1011-1055)flb_destroy(228-263)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
plugins/out_s3/s3_store.c (2)
src/flb_fstore.c (4)
flb_fstore_file_create(153-190)flb_fstore_file_meta_set(74-108)flb_fstore_file_append(298-329)flb_fstore_create(479-525)include/fluent-bit/flb_compat.h (1)
gmtime_r(76-82)
tests/runtime/out_s3_config.c (3)
tests/include/s3_test_helpers.h (5)
s3_test_next(449-457)s3_test_create_temp_db_path(138-195)s3_test_create_temp_store_dir(205-268)s3_test_cleanup_temp_db(428-441)s3_test_cleanup_temp_store_dir(395-420)tests/include/aws_client_mock.c (3)
flb_aws_client_mock_configure_generator(42-47)flb_aws_client_get_mock_generator(137-140)flb_aws_client_mock_clear_generator_instance(71-74)src/flb_lib.c (3)
flb_create(143-225)flb_start(983-994)flb_stop(1011-1055)
plugins/out_s3/s3_blob.c (5)
src/aws/flb_aws_util.c (1)
flb_get_s3_key(1311-1389)plugins/out_s3/s3_multipart.c (2)
s3_multipart_abort(738-800)s3_multipart_check_upload_exists(806-880)src/flb_input_blob.c (2)
flb_input_blob_delivery_notification_destroy(222-231)flb_input_blob_file_get_info(31-112)src/flb_blob_db.c (1)
flb_blob_file_update_remote_id(586-617)plugins/out_s3/s3_queue.c (1)
s3_queue_add_pending_file(269-282)
plugins/out_s3/s3_store.h (1)
plugins/out_s3/s3_store.c (1)
flb_chunk_file_open(553-632)
plugins/out_s3/s3_queue.h (1)
plugins/out_s3/s3_queue.c (11)
s3_queue_remove(317-326)s3_queue_buffer_chunk(366-388)s3_queue_add_file(126-140)s3_queue_add_part(142-212)s3_queue_add_pending_file(269-282)s3_queue_add_file_unlocked(68-121)s3_queue_add_pending_file_unlocked(218-264)s3_queue_process_entry(765-824)s3_queue_recover_all(867-922)s3_queue_recover_from_database(1015-1103)s3_queue_recover_from_fstore(1168-1228)
plugins/out_s3/s3_multipart.h (1)
plugins/out_s3/s3_multipart.c (13)
flb_s3_calculate_optimal_part_size(164-216)s3_multipart_get_md5_base64(218-248)s3_multipart_create_headers(357-418)s3_multipart_upload_new(451-500)s3_multipart_upload_destroy(420-445)s3_multipart_initiate(502-559)s3_multipart_upload_part(561-653)s3_multipart_complete(655-736)s3_multipart_abort(738-800)s3_multipart_abort_with_url(1022-1043)s3_multipart_check_upload_exists(806-880)s3_multipart_upload_file_part(894-1016)s3_multipart_upload_file(1203-1294)
tests/runtime/out_s3_error_handling.c (3)
tests/include/s3_test_helpers.h (5)
s3_test_next(449-457)s3_test_create_temp_db_path(138-195)s3_test_create_temp_store_dir(205-268)s3_test_cleanup_temp_db(428-441)s3_test_cleanup_temp_store_dir(395-420)tests/include/aws_client_mock.c (3)
flb_aws_client_mock_configure_generator(42-47)flb_aws_client_get_mock_generator(137-140)flb_aws_client_mock_clear_generator_instance(71-74)src/flb_lib.c (3)
flb_create(143-225)flb_service_set(652-678)flb_start(983-994)
tests/internal/aws_util.c (1)
src/aws/flb_aws_util.c (1)
flb_get_s3_key(1311-1389)
plugins/out_s3/s3_queue.c (7)
include/fluent-bit/flb_mem.h (1)
flb_calloc(84-96)plugins/out_s3/s3_store.c (4)
s3_store_buffer_put(217-258)s3_store_file_unlock(641-646)s3_store_file_delete(526-551)s3_store_file_inactive(506-524)plugins/out_s3/s3_auth.c (1)
s3_auth_fetch_presigned_url(359-423)plugins/out_s3/s3_multipart.c (2)
s3_multipart_upload_file_part(894-1016)s3_multipart_complete(655-736)src/flb_blob_db.c (3)
flb_blob_file_update_remote_id(586-617)flb_blob_db_lock(423-428)flb_blob_db_unlock(430-435)plugins/out_s3/s3.c (1)
s3_upload_file(1015-1120)lib/cfl/src/cfl_sds.c (1)
cfl_sds_len(88-91)
plugins/out_s3/s3_blob.h (1)
plugins/out_s3/s3_blob.c (4)
s3_blob_register_parts(403-436)s3_blob_process_events(451-538)s3_blob_recover_state(171-199)s3_blob_notify_delivery(124-161)
🔇 Additional comments (98)
plugins/out_s3/s3_store.c (19)
20-49: Cross-platform path handling looks good.The cross-platform
FLB_PATH_MAXdefinition correctly handles Windows (MAX_PATH/_MAX_PATH) and POSIX (PATH_MAX) with sensible fallbacks. The inclusion offlb_compat.hand chunkio headers aligns with the codebase patterns.
51-81: Filename hash generation is acceptable.The djb2-based hash combined with time provides reasonable uniqueness. Note that if
tm.tv_nsecis 0,hash2becomes 0, but this is unlikely in practice and the primary hash from the tag still provides differentiation.
83-114: LGTM!The function correctly handles the iteration with
mk_list_foreach_safeto safely delete files with NULL data, and properly checks both meta size and tag content before returning a match.
116-140: LGTM!Buffer space checking logic is correct with proper handling of unlimited buffers (size 0) and warning threshold calculations. The early return on line 121-123 correctly guards against division by zero.
142-159: LGTM!The cleanup helper correctly handles resource release in the right order and safely handles NULL inputs.
161-215: LGTM!The function has comprehensive error handling with proper cleanup at each failure point using
cleanup_failed_file. Resource ownership is clear, and the filename is correctly destroyed after use.
217-258: LGTM!The function correctly handles append failures by cleaning up newly created files (addressing previous review feedback). The
file_createdflag properly tracks ownership for cleanup decisions.
260-295: LGTM!Cross-platform stat handling is properly implemented with
_stat/statguards. The function correctly uses the chunkio constantCIO_FILE_HEADER_MINand validates the path length withsnprintf.
297-335: LGTM!The file context creation for restoration is well-structured. Note that
first_log_timeandcreate_timeare set to the current time rather than preserved from the original file, but this is acceptable since the original timestamps aren't stored in the chunk metadata.
337-363: LGTM!The restoration loop correctly handles existing contexts and gracefully continues on failures. Using
mk_list_foreach(non-safe) is appropriate here since the list isn't modified during iteration.
365-381: LGTM!The function correctly restores buffered files from all streams except the active and metadata streams, which are handled separately.
383-407: LGTM!The stream name creation uses platform-appropriate time functions with correct error checking (Windows
gmtime_sreturns 0 on success, POSIXgmtime_rreturns non-NULL on success). The Windows-safe format with hyphens ensures consistent naming across platforms.
409-449: LGTM!The initialization function properly creates the file store and stream, with correct error handling and cleanup on each failure path. The
stream_nameis correctly destroyed after use, and the CI detection logic has been removed as per previous feedback.
451-484: LGTM!The exit function correctly frees all allocated resources in the proper order (
stream_pathbefores3_file), nullifies pointers, and destroys the file store.
486-504: LGTM!Simple and correct check for data presence across all streams.
506-524: LGTM!The memory leak from the previous review has been addressed -
stream_pathis now properly freed before thes3_filestruct.
526-551: LGTM!The function has robust null checks and validates ownership consistency before deletion. The check at lines 536-538 prevents potential corruption if
fsf->datadoesn't match the expecteds3_file.
553-632: LGTM!The chunk file opener has comprehensive validation including magic byte checks, file size validation before seeking past metadata, and proper cleanup on all error paths. The
meta_lenvalidation against remaining file size (lines 613-620) addresses previous review concerns about corrupt headers.
634-646: LGTM!Simple and correct lock/unlock helper functions with null checks.
plugins/out_s3/s3_store.h (1)
26-34: LGTM!The
s3_filestruct is well-organized with clear member documentation. The field rename fromfile_pathtostream_pathhas been applied.tests/runtime/CMakeLists.txt (2)
238-246: LGTM - Well-organized modular test structure.Splitting the monolithic S3 test into nine focused test files improves maintainability and allows targeted testing of specific functionality (config, multipart, format, error handling, edge cases, special features, queue, recovery, blob).
300-306: LGTM - Appropriate Windows-specific linking.The conditional linking of
ole32andrpcrt4for S3 tests on Windows is correct for resolving RPC/OLE functions used in the test helpers (CoCreateGuid, UuidToStringA, RpcStringFreeA). The regex pattern correctly matches allout_s3*test source files.tests/internal/parquet.c (4)
1-37: LGTM - Well-structured test file with comprehensive coverage.The test file properly includes all necessary standard headers and organizes tests into clear categories (functional, data quality, scale/boundary, compression, negative, real-world scenarios).
234-248: LGTM - Clean test context framework.The test_context structure provides good encapsulation of test resources, and the init/cleanup pattern ensures proper resource management.
1111-1130: Remove unused variableret.The variable
retis declared at line 1113 (before this version's cleanup) but the current version at line 1114 shows ctx is used. Looking at the test logic, noretvariable is needed sincerun_conversionresult is checked directly via TEST_CHECK.Actually, reviewing again - this test function at lines 1111-1130 doesn't declare
ret. The past comment may have been addressed.
2081-2156: LGTM - Comprehensive TEST_LIST with clear categorization.The test list is well-organized into six categories with clear comments, covering functional tests, data quality, scale/boundary, compression, negative tests, and real-world scenarios. The NULL terminator is properly included.
plugins/out_s3/s3_multipart.h (2)
1-25: LGTM - Clean header structure with proper guards.The header follows Fluent Bit conventions with appropriate include guards, copyright header, and necessary includes for dependencies.
77-89: LGTM - Excellent documentation for recovery-critical function.The detailed comment explaining that
s3_keymust be the persisted key (not regenerated) is valuable for preventing subtle bugs during recovery scenarios where time-based or sequence-based keys could mismatch.plugins/out_s3/s3.h (4)
29-46: LGTM - Correct AWS S3 multipart constraints.The constants correctly reflect AWS S3 limits:
- Part size: 5 MiB minimum (
S3_AWS_MIN_PART_SIZE) to 5 GiB maximum (S3_AWS_MAX_PART_SIZE)- Maximum parts: 10,000 (
S3_AWS_MAX_PARTS)- Maximum file size: 50 TB (
MAX_FILE_SIZE)- Default part size: 100 MiB is a reasonable default
52-74: LGTM - Well-structured upload queue.The
upload_queuestructure properly tracks all necessary state for queued uploads including file IDs, offsets, S3 key, upload ID, retry counter, and timing information. Theneeds_upload_creationflag supports both new uploads and recovery scenarios.
150-158: LGTM - Good thread safety with upload_queue_lock.Adding
pthread_mutex_t upload_queue_lockto protect the upload queue access is essential for correct concurrent operation. The state flags (initial_upload_done,is_exiting,needs_recovery) provide clear lifecycle management.
175-178: LGTM - Parquet integration fields properly declared.The
format,schema_str, andcached_arrow_schemafields provide clean integration with the Parquet encoder. The forward declaration at line 33 properly resolves the pointer type without requiring the full Parquet header.plugins/out_s3/s3_blob.h (1)
1-43: LGTM - Clean blob handling API.The header provides a well-defined public API for S3 blob operations:
s3_blob_register_parts: Splits files into parts for multipart uploads3_blob_process_events: Handles incoming blob eventss3_blob_recover_state: Recovery for stale/aborted uploadss3_blob_notify_delivery: Delivery status notificationThe documentation above each function clearly explains its purpose. The use of
cfl_sds_tforsourceandfile_pathins3_blob_notify_deliveryis consistent with the implementation ins3_blob.c.plugins/out_s3/s3_auth.c (5)
1-128: LGTM - Initialization and URL parsing implementation is well-structured.The endpoint initialization properly handles TLS context creation for HTTPS, upstream setup, and cleanup on failure paths. The
parse_urlfunction now correctly callsurl_parts_destroyon error paths.
130-171: LGTM - Port validation and header setup are correct.The port validation at lines 140-141 properly rejects port 0 and ensures the valid range of 1-65535. Header setup handles both basic auth and bearer token authentication patterns cleanly.
173-260: LGTM - HTTP request handling with proper resource cleanup.The presigned URL request function properly:
- Validates and parses the URL
- Handles connection and HTTP client creation with cleanup on failure
- Checks response status and handles both payload and no-payload error cases
- Manages the result URL allocation correctly
262-357: LGTM - URL path construction handles edge cases correctly.The path builder properly:
- URL-encodes both the S3 key and upload ID
- Strips leading slash from encoded key to avoid double slashes
- Validates required parameters for each operation type
- Cleans up allocated resources on all error paths
359-423: LGTM - Presigned URL fetching with proper slash handling.The implementation correctly handles the edge case where both the endpoint URL ends with
/and the path starts with/, avoiding double slashes in the final URL. The cleanup paths properly free bothurl_pathandfull_url.plugins/out_s3/s3_blob.c (7)
41-122: LGTM - Multipart abort implementation with proper validation.The function correctly:
- Validates
file_remote_idbefore proceeding- Uses stored
s3_keywhen available, falling back to regeneration with appropriate warnings- Properly cleans up all allocated resources including the presigned URL on error
124-161: LGTM - Delivery notification with proper error handling.The notification function now properly:
- Checks allocation of the notification structure
- Validates
cfl_sds_createreturn for the path (previously flagged)- Calls
flb_notification_cleanupon enqueue failure
171-199: LGTM - Recovery state handling with proper locking.The lock/unlock return values are now checked and logged appropriately. The function coordinates stale and aborted file handling under the DB lock.
201-253: LGTM - Stale file recovery with abort failure logging.The implementation now logs abort failures with detailed context (file_id, path, upload_id, parts, return code) as previously requested.
255-401: LGTM - Aborted file handling with comprehensive retry logic.The implementation properly:
- Uses stored
s3_keyfor validation when available- Falls back to key regeneration with warnings for legacy data
- Distinguishes between retry scenarios (with/without upload_id)
- Logs abort failures for discarded files with full context
451-538: LGTM - Event processing with proper cleanup and error handling.The implementation correctly:
- Uses
int64_t file_idto avoid truncation fromflb_blob_db_file_insert- Checks
file_id < 0for error detection (line 496)- Defensively frees partial allocations on
flb_input_blob_file_get_infofailure- Uses
%zuformat specifier forsize_t(line 497)- Cleans up DB entries on registration or queue failures
403-436: Error check pattern is correct.
flb_blob_db_file_part_insert()returns 0 (FLB_BLOB_DB_SUCCESS) on success and negative error codes (e.g.,FLB_BLOB_DB_ERROR_FILE_PART_INSERT = -209) on failure. The checkif (ret < 0)at line 426 properly detects all error conditions.plugins/out_s3/s3_queue.h (2)
20-27: LGTM - Header includes and return type are correct.The
s3_queue_removedeclaration now returnsint(line 26), matching the implementation ins3_queue.c(lines 317-326) which returns-1on NULL entry and0on success.
89-101: LGTM - Phase documentation is now consistent.The three-phase architecture comment (lines 91-93) now correctly documents Phase 0, 1, 2, matching the implementation in
s3_queue_recover_all.plugins/out_s3/s3_queue.c (10)
22-30: LGTM - Windows portability for file access check.The
flb_accessmacro provides cross-platform compatibility by mapping to_accesson Windows and standardaccesson POSIX systems.
68-121: LGTM - Queue entry allocation with proper cleanup.The unlocked add function properly handles allocation failures with cascading cleanup of previously allocated resources (tag_copy, path_copy, entry).
142-212: LGTM - Part queue entry with complete field initialization.The
s3_queue_add_partfunction:
- Properly allocates and initializes all entry fields
- Has cascading cleanup for all allocation failures
- Acquires the lock only when adding to the list
284-326: LGTM - Entry destruction and removal are properly separated.The design cleanly separates:
s3_queue_entry_destroy: Frees memory without list manipulations3_queue_remove: Removes from list and then destroysThis allows callers to handle list removal themselves when needed.
396-485: LGTM - DB-tracked upload with ETag memory management.The ETag SDS string is now properly destroyed after being saved to the database (lines 452-454) and on failure paths (lines 470-474), fixing the memory leak identified in past reviews.
541-713: LGTM - Multipart completion with proper validation and cleanup.The implementation now:
- Validates
part_countbounds (lines 599-603) before allocation- Properly cleans up all ETags in the 10000-element array (lines 691-695)
- Handles abort scenarios correctly
783-798: LGTM - DB NULL check added for upload creation failure.The
flb_blob_db_file_set_aborted_statecall is now guarded byctx->blob_db.db != NULL(line 793), consistent with other DB operations in the file.
867-922: LGTM - Three-phase recovery with proper lock handling.The recovery implementation correctly:
- Acquires and releases locks around zombie cleanup
- Logs lock failures appropriately
- Coordinates the three phases in order
1031-1103: LGTM - Database recovery with endpoint mismatch handling.The implementation now resets
in_progressflags before skipping files with endpoint mismatches (line 1047), preventing files from being hidden until zombie cleanup.
1167-1228: LGTM - Fstore recovery for non-DB-tracked files.The fstore recovery properly:
- Skips the metadata stream
- Checks lock status and failure counts
- Locks files before adding to queue and unlocks on failure
tests/include/aws_client_mock.h (2)
6-8: LGTM - Improved documentation for mock usage limitations.The updated comments clearly document important usage constraints: single-test-per-executable limitation and thread-safety concerns with static variables.
192-197: LGTM - Clear generator instance function for test cleanup.This function addresses a common test pattern where
flb_destroy()frees the mock client, and subsequent calls toconfigure_generatorwould otherwise cause use-after-free. The documentation clearly explains when to use this function.tests/internal/aws_compress.c (4)
8-8: LGTM!The include for
flb_snappy.his correctly added to support the new Snappy framed compression test.
77-245: Well-structured Snappy framed compression test.The test comprehensively covers:
- Simple string round-trip validation
- Large data spanning multiple 64KB blocks
- Concatenated frame handling (streaming scenario)
- Graceful failure on NULL/empty input
Memory management is handled correctly with proper cleanup in all paths.
413-413: LGTM!Test registration follows the established pattern for compression tests.
482-483: LGTM!Minor formatting cleanup.
tests/include/aws_client_mock.c (6)
2-10: Helpful documentation for the mock pattern.The comment clearly explains that this
.cfile is directly included (not separately compiled) and warns about symbol conflicts if multiple tests are linked together. This is important guidance for maintainers.
54-64: Good defensive pattern for double-free prevention.Caching the instance locally and clearing the global before destruction prevents re-entrant double-free scenarios. This is a robust approach.
66-74: Necessary lifecycle management function.This function correctly addresses the case where the S3 plugin has already freed the mock client during
flb_destroy(). Clearing the pointer without freeing prevents use-after-free whenconfigure_generatoris called again.
229-233: LGTM!Adding
FLB_HTTP_DELETEcompletes the HTTP method coverage for mock expectations.
266-299: Correct dynamic allocation for response data.The DATA field must be dynamically allocated since
flb_http_client_destroy()will free it. The implementation correctly:
- Searches for explicit size parameters first
- Falls back to
strlenfor string data- Allocates with null-terminator space
- Sets both
data_lenanddata_sizeNote: The
strlenfallback assumes the config value is a null-terminated string, which is the expected usage pattern for mock responses.
306-311: LGTM!The macro correctly excludes
FLB_AWS_CLIENT_MOCK_SET_DATAsince it's handled specially above with dynamic allocation.tests/runtime/out_s3_queue.c (9)
1-12: LGTM!Good test file structure with proper includes. Note that including
aws_client_mock.cdirectly (line 8) follows the documented pattern where each test is built as a standalone executable.
14-74: Well-structured basic queue test.The test correctly:
- Sets up a mock request chain for multipart upload flow
- Creates isolated temp paths for db and store
- Configures the S3 output plugin
- Exercises the queue with data push
- Performs comprehensive cleanup including temp files
The use of
flb_aws_client_mock_clear_generator_instance()afterflb_destroy()correctly handles the case where the plugin may have already freed the mock client.
77-142: LGTM!Multiple workers test correctly configures
workers=5and pushes multiple chunks to exercise worker distribution.
145-206: Good retry mechanism test coverage.The mock chain correctly simulates a 500 error on the first UploadPart, followed by a successful retry. This validates the plugin's error handling and retry logic.
209-269: LGTM!Timeout-triggered upload test correctly sets
upload_timeout=2swith a largetotal_file_size=100Mto ensure the upload is triggered by timeout rather than size.
272-335: LGTM!Concurrent uploads test with 10 workers and 10 data pushes exercises the concurrent processing path.
338-400: LGTM!Graceful shutdown test with
grace=2validates that pending queue entries are handled during shutdown.
403-467: LGTM!High concurrency stress test with 16 workers and 20 rapid pushes (10ms intervals) provides good coverage for race condition detection.
469-479: LGTM!Test list correctly registers all 7 queue tests.
tests/runtime/out_s3_blob.c (11)
1-12: LGTM!Good test file structure with proper includes for blob database tests.
19-82: LGTM!Basic blob_database_file configuration test with proper setup and cleanup.
137-151: Good async-safe stat() verification.The retry loop correctly handles the race condition between
flb_start()and async database creation. UsingS3_TEST_SLEEP_MS(100)ensures Windows compatibility.
166-172: LGTM!Good documentation explaining that these tests verify configuration acceptance rather than actual retry behavior, which is handled by the recovery mechanism.
175-239: LGTM!File delivery attempt limit configuration test.
242-306: LGTM!Part delivery attempt limit configuration test.
309-373: LGTM!Unlimited delivery attempts (-1) configuration test.
380-443: LGTM!Upload part freshness limit configuration test.
446-509: LGTM!Short freshness limit configuration test.
516-579: LGTM!Upload chunk size configuration test.
585-595: LGTM!Test list correctly registers all 8 blob tests.
tests/internal/aws_util.c (8)
47-68: Comprehensive test coverage for new S3 key format variables.The test macros cover:
$FILE_PATH: Full path substitution$FILE_NAME: Filename-only extraction- Combined usage of both variables
- Edge cases: relative paths, triple dots, parent traversal, hidden files, mixed dots/slashes
This provides thorough coverage of the path sanitization logic in
flb_get_s3_key.
76-76: LGTM!Good representative file path for testing with a realistic nginx log rotation example.
200-200: LGTM!All existing
flb_get_s3_keycalls correctly updated withNULLfile_path parameter for backward compatibility.
402-416: LGTM!Test validates basic
$FILE_PATHsubstitution with the full sanitized path.
418-432: LGTM!Test validates
$FILE_NAMEextraction (just the filename portion).
434-448: LGTM!Test validates using both
$FILE_PATHand$FILE_NAMEin the same format string.
450-533: Good edge case coverage for path sanitization.The edge case tests verify that the path sanitization logic correctly handles:
- Relative paths (
./)- Triple dot prefixes (
...)- Parent traversal attempts (
../../../)- Hidden files (
.hidden)- Mixed dots and slashes
This ensures S3 keys are safe from path traversal vulnerabilities.
550-558: LGTM!All 8 new tests correctly registered in TEST_LIST.
tests/runtime/out_s3_config.c (1)
6-10: The direct.cinclusion pattern is intentional and appropriate for this codebase's test architecture.The runtime tests are built as separate standalone executables (one per test file), not linked into a single binary. The aws_client_mock.c file explicitly documents this pattern with a note stating: "Each test is built as a standalone executable, avoiding symbol conflicts." The static declarations in aws_client_mock.c (
static flb_aws_client_mock_instance,static mock_generator,static mock_client_vtable) ensure each test gets its own isolated mock state with no duplicate symbol conflicts.Separately compiling and linking aws_client_mock.c as suggested would actually break the per-test mock isolation that the current design provides. The current approach is idiomatic for the runtime test model used here.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| /* Test: auto_retry_requests enabled */ | ||
| void flb_test_auto_retry_enabled(void) | ||
| { | ||
| flb_ctx_t *ctx; | ||
| int in_ffd, out_ffd, ret; | ||
| char *db_path, *store_dir; | ||
| struct flb_out_s3_init_options init_options = {0}; | ||
| struct flb_aws_client_mock_request_chain *chain; | ||
|
|
||
| s3_test_next(); | ||
|
|
||
| /* With auto_retry enabled, transient failures should be retried */ | ||
| chain = FLB_AWS_CLIENT_MOCK( | ||
| response( | ||
| set(STATUS, 200), | ||
| set(PAYLOAD, S3_TEST_MOCK_CREATE_RESP), | ||
| set(PAYLOAD_SIZE, sizeof(S3_TEST_MOCK_CREATE_RESP) - 1) | ||
| ), | ||
| response( | ||
| set(STATUS, 200), | ||
| set(DATA, S3_TEST_MOCK_UPLOAD_PART_RESP), | ||
| set(DATA_SIZE, sizeof(S3_TEST_MOCK_UPLOAD_PART_RESP) - 1) | ||
| ), | ||
| response( | ||
| set(STATUS, 200), | ||
| set(PAYLOAD, S3_TEST_MOCK_COMPLETE_RESP), | ||
| set(PAYLOAD_SIZE, sizeof(S3_TEST_MOCK_COMPLETE_RESP) - 1) | ||
| ) | ||
| ); | ||
|
|
||
| flb_aws_client_mock_configure_generator(chain); | ||
| init_options.client_generator = flb_aws_client_get_mock_generator(); | ||
|
|
||
| db_path = s3_test_create_temp_db_path("err"); | ||
| store_dir = s3_test_create_temp_store_dir("err"); | ||
| TEST_CHECK(db_path != NULL); | ||
| TEST_CHECK(store_dir != NULL); | ||
| if (db_path == NULL || store_dir == NULL) { | ||
| flb_free(db_path); | ||
| flb_free(store_dir); | ||
| return; | ||
| } | ||
|
|
||
| ctx = flb_create(); | ||
| TEST_CHECK(ctx != NULL); | ||
| flb_service_set(ctx, "flush", "1", "grace", "1", NULL); | ||
|
|
||
| in_ffd = flb_input(ctx, (char *)"lib", NULL); | ||
| TEST_CHECK(in_ffd >= 0); | ||
| flb_input_set(ctx, in_ffd, "tag", "test", NULL); | ||
|
|
||
| out_ffd = flb_output(ctx, (char *)"s3", (struct flb_lib_out_cb *)&init_options); | ||
| TEST_CHECK(out_ffd >= 0); | ||
| flb_output_set(ctx, out_ffd, "match", "*", "region", S3_TEST_DEFAULT_REGION, "bucket", S3_TEST_DEFAULT_BUCKET, | ||
| "blob_database_file", db_path, "store_dir", store_dir, | ||
| "total_file_size", "10M", "upload_chunk_size", "5M", | ||
| "upload_timeout", "1s", "auto_retry_requests", "true", NULL); | ||
|
|
||
| ret = flb_start(ctx); | ||
| TEST_CHECK(ret == 0); | ||
| ret = flb_lib_push(ctx, in_ffd, (char *)JSON_TD, sizeof(JSON_TD) - 1); | ||
| TEST_CHECK(ret >= 0); | ||
|
|
||
| /* Poll until mock chain exhausted instead of fixed sleep */ | ||
| S3_TEST_WAIT_MOCK_EXHAUSTED(5, "auto_retry_enabled"); | ||
|
|
||
| flb_stop(ctx); | ||
| flb_destroy(ctx); | ||
| flb_aws_client_mock_clear_generator_instance(); | ||
| s3_test_cleanup_temp_db(db_path); | ||
| s3_test_cleanup_temp_store_dir(store_dir); | ||
| flb_free(db_path); | ||
| flb_free(store_dir); | ||
| } | ||
|
|
||
| /* Test: auto_retry_requests disabled */ | ||
| void flb_test_auto_retry_disabled(void) | ||
| { | ||
| flb_ctx_t *ctx; | ||
| int in_ffd, out_ffd, ret; | ||
| char *db_path, *store_dir; | ||
| struct flb_out_s3_init_options init_options = {0}; | ||
| struct flb_aws_client_mock_request_chain *chain; | ||
|
|
||
| s3_test_next(); | ||
|
|
||
| chain = FLB_AWS_CLIENT_MOCK( | ||
| response( | ||
| set(STATUS, 200), | ||
| set(PAYLOAD, S3_TEST_MOCK_CREATE_RESP), | ||
| set(PAYLOAD_SIZE, sizeof(S3_TEST_MOCK_CREATE_RESP) - 1) | ||
| ), | ||
| response( | ||
| set(STATUS, 200), | ||
| set(DATA, S3_TEST_MOCK_UPLOAD_PART_RESP), | ||
| set(DATA_SIZE, sizeof(S3_TEST_MOCK_UPLOAD_PART_RESP) - 1) | ||
| ), | ||
| response( | ||
| set(STATUS, 200), | ||
| set(PAYLOAD, S3_TEST_MOCK_COMPLETE_RESP), | ||
| set(PAYLOAD_SIZE, sizeof(S3_TEST_MOCK_COMPLETE_RESP) - 1) | ||
| ) | ||
| ); | ||
|
|
||
| flb_aws_client_mock_configure_generator(chain); | ||
| init_options.client_generator = flb_aws_client_get_mock_generator(); | ||
|
|
||
| db_path = s3_test_create_temp_db_path("err"); | ||
| store_dir = s3_test_create_temp_store_dir("err"); | ||
| TEST_CHECK(db_path != NULL); | ||
| TEST_CHECK(store_dir != NULL); | ||
| if (db_path == NULL || store_dir == NULL) { | ||
| flb_free(db_path); | ||
| flb_free(store_dir); | ||
| return; | ||
| } | ||
|
|
||
| ctx = flb_create(); | ||
| TEST_CHECK(ctx != NULL); | ||
| flb_service_set(ctx, "flush", "1", "grace", "1", NULL); | ||
|
|
||
| in_ffd = flb_input(ctx, (char *)"lib", NULL); | ||
| TEST_CHECK(in_ffd >= 0); | ||
| flb_input_set(ctx, in_ffd, "tag", "test", NULL); | ||
|
|
||
| out_ffd = flb_output(ctx, (char *)"s3", (struct flb_lib_out_cb *)&init_options); | ||
| TEST_CHECK(out_ffd >= 0); | ||
| flb_output_set(ctx, out_ffd, "match", "*", "region", S3_TEST_DEFAULT_REGION, "bucket", S3_TEST_DEFAULT_BUCKET, | ||
| "blob_database_file", db_path, "store_dir", store_dir, | ||
| "total_file_size", "10M", "upload_chunk_size", "5M", | ||
| "upload_timeout", "1s", "auto_retry_requests", "false", NULL); | ||
|
|
||
| ret = flb_start(ctx); | ||
| TEST_CHECK(ret == 0); | ||
| ret = flb_lib_push(ctx, in_ffd, (char *)JSON_TD, sizeof(JSON_TD) - 1); | ||
| TEST_CHECK(ret >= 0); | ||
|
|
||
| /* Poll until mock chain exhausted instead of fixed sleep */ | ||
| S3_TEST_WAIT_MOCK_EXHAUSTED(5, "auto_retry_disabled"); | ||
|
|
||
| flb_stop(ctx); | ||
| flb_destroy(ctx); | ||
| flb_aws_client_mock_clear_generator_instance(); | ||
| s3_test_cleanup_temp_db(db_path); | ||
| s3_test_cleanup_temp_store_dir(store_dir); | ||
| flb_free(db_path); | ||
| flb_free(store_dir); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto_retry_enabled/disabled don’t test retry behavior (mock chain is all-200)
Both tests currently run the same successful request chain, so they can’t catch regressions in auto_retry_requests. Inject a transient failure (e.g., UploadPart 503/500 then 200) and assert the expected sequence differs between enabled vs disabled.
Example direction (enabled: retry UploadPart once)
@@
- /* With auto_retry enabled, transient failures should be retried */
- chain = FLB_AWS_CLIENT_MOCK(
- response(
- set(STATUS, 200),
- set(PAYLOAD, S3_TEST_MOCK_CREATE_RESP),
- set(PAYLOAD_SIZE, sizeof(S3_TEST_MOCK_CREATE_RESP) - 1)
- ),
- response(
- set(STATUS, 200),
- set(DATA, S3_TEST_MOCK_UPLOAD_PART_RESP),
- set(DATA_SIZE, sizeof(S3_TEST_MOCK_UPLOAD_PART_RESP) - 1)
- ),
- response(
- set(STATUS, 200),
- set(PAYLOAD, S3_TEST_MOCK_COMPLETE_RESP),
- set(PAYLOAD_SIZE, sizeof(S3_TEST_MOCK_COMPLETE_RESP) - 1)
- )
- );
+ /* With auto_retry enabled, transient failures should be retried */
+ chain = FLB_AWS_CLIENT_MOCK(
+ response(expect(METHOD, FLB_HTTP_POST),
+ set(STATUS, 200),
+ set(PAYLOAD, S3_TEST_MOCK_CREATE_RESP),
+ set(PAYLOAD_SIZE, sizeof(S3_TEST_MOCK_CREATE_RESP) - 1)),
+ response(expect(METHOD, FLB_HTTP_PUT),
+ set(STATUS, 503),
+ set(PAYLOAD, AWS_ERROR_SLOW_DOWN),
+ set(PAYLOAD_SIZE, sizeof(AWS_ERROR_SLOW_DOWN) - 1)),
+ response(expect(METHOD, FLB_HTTP_PUT),
+ set(STATUS, 200),
+ set(DATA, S3_TEST_MOCK_UPLOAD_PART_RESP),
+ set(DATA_SIZE, sizeof(S3_TEST_MOCK_UPLOAD_PART_RESP) - 1)),
+ response(expect(METHOD, FLB_HTTP_POST),
+ set(STATUS, 200),
+ set(PAYLOAD, S3_TEST_MOCK_COMPLETE_RESP),
+ set(PAYLOAD_SIZE, sizeof(S3_TEST_MOCK_COMPLETE_RESP) - 1))
+ );🤖 Prompt for AI Agents
In `@tests/runtime/out_s3_error_handling.c` around lines 521 - 669, Modify the
mock chains in flb_test_auto_retry_enabled and flb_test_auto_retry_disabled so
they differ: for flb_test_auto_retry_enabled build the FLB_AWS_CLIENT_MOCK
sequence with CreateMultipartUpload (set STATUS 200), UploadPart returning a
transient failure first (set STATUS 503 or 500) followed by a successful
UploadPart (set STATUS 200), then CompleteMultipartUpload (set STATUS 200); for
flb_test_auto_retry_disabled build the chain with CreateMultipartUpload (200), a
single failing UploadPart (503/500) and then Complete (200) so the failing
UploadPart is not retried. Keep using flb_aws_client_mock_configure_generator /
init_options.client_generator and S3_TEST_WAIT_MOCK_EXHAUSTED, and update any
expectations/assertions so the enabled test consumes the extra retry response
(i.e. mock chain exhausted) while the disabled test does not retry and thus the
observed request sequence differs.
Add test coverage for: - AWS utilities - Parquet implementation - S3 output configuration - Edge cases handling - Error handling - Format conversion - Multipart uploads - Queue management - Recovery mechanisms - Special features Signed-off-by: Arbin Cheng <arbin.cheng@coins.ph> Signed-off-by: Arbin <arbin.cheng@coins.ph>
e7de562 to
c1dd6a8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@tests/runtime/out_s3_blob.c`:
- Around line 75-82: Replace calls to
flb_aws_client_mock_clear_generator_instance() with
flb_aws_client_mock_destroy_generator() to properly free the mock wrapper and
request chain; update every teardown block (e.g., the instance at the shown
lines and the other occurrences around the test cleanup blocks) so that after
flb_stop(ctx) and flb_destroy(ctx) you call
flb_aws_client_mock_destroy_generator() instead of
flb_aws_client_mock_clear_generator_instance(), preserving the surrounding
cleanup calls (s3_test_cleanup_temp_db, s3_test_cleanup_temp_store_dir,
flb_free) in the same order.
In `@tests/runtime/out_s3_edge_cases.c`:
- Around line 111-122: Replace calls to
flb_aws_client_mock_clear_generator_instance() with
flb_aws_client_mock_destroy_generator() to properly tear down the AWS client
mock and avoid leaks; update every test teardown that currently calls
flb_aws_client_mock_clear_generator_instance() (e.g., in
flb_test_empty_data_upload and the other teardown blocks around the occurrences
listed) so they call flb_aws_client_mock_destroy_generator() instead, ensuring
the mock generator is destroyed during flb_stop()/flb_destroy() cleanup.
♻️ Duplicate comments (3)
tests/internal/parquet.c (1)
288-304: Avoid fixed/tmp/flb_test_*.{msgpack,parquet}paths (collisions + non-portable).
This is a known flake source if tests run concurrently or on non-POSIX environments.Suggested direction (centralized temp-path helper + use it everywhere)
static int init_test_context(test_context *ctx, const char *test_name) { @@ - snprintf(ctx->msgpack_file, sizeof(ctx->msgpack_file), - "/tmp/flb_test_%s.msgpack", test_name); - snprintf(ctx->parquet_file, sizeof(ctx->parquet_file), - "/tmp/flb_test_%s.parquet", test_name); + /* TODO: generate unique temp paths (mkstemp/GetTempFileName) instead of fixed /tmp names */ + snprintf(ctx->msgpack_file, sizeof(ctx->msgpack_file), + "/tmp/flb_test_%s_%ld.msgpack", test_name, (long) getpid()); + snprintf(ctx->parquet_file, sizeof(ctx->parquet_file), + "/tmp/flb_test_%s_%ld.parquet", test_name, (long) getpid());Also applies to: 732-786, 1872-1939, 1941-2016
tests/runtime/out_s3_config.c (1)
75-83: Fix mock teardown: replaceflb_aws_client_mock_clear_generator_instance()withflb_aws_client_mock_destroy_generator()to avoid leaks.Proposed patch
- flb_aws_client_mock_clear_generator_instance(); + flb_aws_client_mock_destroy_generator();Also applies to: 142-150, 222-230, 288-296, 354-362, 421-429, 492-500, 549-557, 616-624, 673-681, 751-759, 826-834, 903-911, 973-981, 1044-1052, 1112-1120, 1170-1178, 1233-1239, 1280-1286
tests/runtime/out_s3_edge_cases.c (1)
124-208: Add the samedb_path/store_dirNULL-guard pattern used inflb_test_empty_data_upload()to the rest of the tests.
Otherwise a temp-path allocation failure can turn into a NULL deref insideflb_output_set().Also applies to: 210-371, 373-455, 457-625, 627-706, 708-790, 792-874, 876-957, 959-1103
🧹 Nitpick comments (6)
tests/internal/aws_compress.c (1)
214-230: Potential memory leak if decompression fails after partial allocation.If
flb_snappy_uncompress_framed_datafails (returns non-zero) but has already partially allocateddecompressed_data, the current code path skips the free at line 229. This could result in a memory leak depending on the implementation's error handling behavior.Consider initializing
decompressed_data = NULLbefore the call and adding a defensive free on failure:♻️ Suggested defensive cleanup
+ decompressed_data = NULL; /* Decompress concatenated data */ ret = flb_snappy_uncompress_framed_data(concatenated, concat_len, &decompressed_data, &decompressed_len); TEST_CHECK(ret == 0); TEST_MSG("flb_snappy_uncompress_framed_data failed for concatenated chunks, ret=%d", ret); if (ret == 0) { TEST_CHECK(decompressed_len == total_input_len); TEST_MSG("Length mismatch for concatenated: expected %zu, got %zu", total_input_len, decompressed_len); ret = memcmp(expected, decompressed_data, total_input_len); TEST_CHECK(ret == 0); TEST_MSG("Content mismatch for concatenated chunks"); flb_free(decompressed_data); } + else if (decompressed_data != NULL) { + flb_free(decompressed_data); + }tests/runtime/out_s3_error_handling.c (1)
521-669: Consider adding transient failures to differentiate auto_retry tests.Both
flb_test_auto_retry_enabledandflb_test_auto_retry_disableduse identical all-success mock chains. They verify the configuration is accepted but don't actually test retry behavior differences.To meaningfully test
auto_retry_requests, inject a transient failure (e.g., 503 SlowDown on UploadPart) and verify that:
- With
auto_retry_requests=true: the mock chain includes retry response(s)- With
auto_retry_requests=false: the request is not retriedThis is a test coverage gap rather than a bug.
tests/runtime/out_s3_blob.c (1)
18-82: Consider a small setup/teardown helper to reduce copy/paste across tests.
Not blocking, but these tests repeat the same context/mocks/temp-dir lifecycle with only 1–2 knobs changed, which increases maintenance risk.Also applies to: 174-239, 241-373, 379-579
tests/internal/parquet.c (1)
408-441: CI runtime risk: record-count tests are very large (70k–131k+) and duplicated across cases.
If this slows CI, consider reducing counts, merging overlaps, or gating “long” tests behind an env flag.Also applies to: 1872-1939
tests/runtime/out_s3_config.c (1)
215-221: Avoid fixed sleeps for sequencing if possible.
S3_TEST_SLEEP_MS(2000)can be slow/flaky. If there’s a deterministic signal (e.g., waiting for first upload completion before pushing again), prefer that.tests/runtime/out_s3_edge_cases.c (1)
15-122:flb_test_empty_data_upload: consider asserting “no upload happened” (or explicitly allow both).
Right now it’s effectively “no crash” + a sleep. If the plugin starts uploading empty payloads due to a regression, this test won’t catch it.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (18)
tests/include/aws_client_mock.ctests/include/aws_client_mock.htests/include/s3_test_helpers.htests/internal/CMakeLists.txttests/internal/aws_compress.ctests/internal/aws_util.ctests/internal/parquet.ctests/runtime/CMakeLists.txttests/runtime/out_s3.ctests/runtime/out_s3_blob.ctests/runtime/out_s3_config.ctests/runtime/out_s3_edge_cases.ctests/runtime/out_s3_error_handling.ctests/runtime/out_s3_format.ctests/runtime/out_s3_multipart.ctests/runtime/out_s3_queue.ctests/runtime/out_s3_recovery.ctests/runtime/out_s3_special_features.c
💤 Files with no reviewable changes (1)
- tests/runtime/out_s3.c
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/internal/CMakeLists.txt
- tests/include/aws_client_mock.h
🧰 Additional context used
🧠 Learnings (8)
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components such as ARROW/PARQUET (which use `#ifdef FLB_HAVE_ARROW` guards), ZSTD support is always available and doesn't need build-time conditionals. ZSTD headers are included directly without guards across multiple plugins and core components.
Applied to files:
tests/internal/parquet.ctests/internal/aws_compress.c
📚 Learning: 2025-08-29T06:25:02.561Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:7-7
Timestamp: 2025-08-29T06:25:02.561Z
Learning: In Fluent Bit, ZSTD (zstandard) compression library is bundled directly in the source tree at `lib/zstd-1.5.7` and is built unconditionally as a static library. Unlike optional external dependencies, ZSTD does not use conditional compilation guards like `FLB_HAVE_ZSTD` and is always available. Headers like `<fluent-bit/flb_zstd.h>` can be included directly without guards.
Applied to files:
tests/internal/parquet.ctests/internal/aws_compress.c
📚 Learning: 2025-08-29T06:24:26.170Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:39-42
Timestamp: 2025-08-29T06:24:26.170Z
Learning: In Fluent Bit, ZSTD compression support is enabled by default and does not require conditional compilation guards (like `#ifdef` FLB_HAVE_ZSTD) around ZSTD-related code declarations and implementations.
Applied to files:
tests/internal/parquet.ctests/internal/aws_compress.c
📚 Learning: 2025-08-29T06:24:55.855Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:52-56
Timestamp: 2025-08-29T06:24:55.855Z
Learning: ZSTD compression is always available in Fluent Bit and does not require conditional compilation guards. Unlike Arrow/Parquet which use `#ifdef` FLB_HAVE_ARROW guards, ZSTD is built unconditionally with flb_zstd.c included directly in src/CMakeLists.txt and a bundled ZSTD library at lib/zstd-1.5.7/.
Applied to files:
tests/internal/parquet.ctests/internal/aws_compress.c
📚 Learning: 2025-08-29T06:25:27.250Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: tests/internal/aws_compress.c:93-107
Timestamp: 2025-08-29T06:25:27.250Z
Learning: In Fluent Bit, ZSTD compression is enabled by default and is treated as a core dependency, not requiring conditional compilation guards like `#ifdef FLB_HAVE_ZSTD`. Unlike some other optional components, ZSTD support is always available and doesn't need build-time conditionals.
Applied to files:
tests/internal/parquet.ctests/internal/aws_compress.c
📚 Learning: 2025-09-08T11:21:33.975Z
Learnt from: cosmo0920
Repo: fluent/fluent-bit PR: 10851
File: include/fluent-bit/flb_simd.h:60-66
Timestamp: 2025-09-08T11:21:33.975Z
Learning: Fluent Bit currently only supports MSVC compiler on Windows, so additional compiler compatibility guards may be unnecessary for Windows-specific code paths.
Applied to files:
tests/runtime/out_s3_error_handling.ctests/internal/aws_compress.ctests/include/s3_test_helpers.h
📚 Learning: 2025-08-29T06:24:44.797Z
Learnt from: shadowshot-x
Repo: fluent/fluent-bit PR: 10794
File: src/aws/flb_aws_compress.c:26-26
Timestamp: 2025-08-29T06:24:44.797Z
Learning: In Fluent Bit, ZSTD support is always available and enabled by default. The build system automatically detects and uses either the system libzstd library or builds the bundled ZSTD version. Unlike other optional dependencies like Arrow which use conditional compilation guards (e.g., FLB_HAVE_ARROW), ZSTD does not require conditional includes or build flags.
Applied to files:
tests/internal/aws_compress.c
📚 Learning: 2025-08-31T12:46:11.940Z
Learnt from: ThomasDevoogdt
Repo: fluent/fluent-bit PR: 9277
File: .github/workflows/pr-compile-check.yaml:147-151
Timestamp: 2025-08-31T12:46:11.940Z
Learning: In fluent-bit CMakeLists.txt, the system library preference flags are defined as FLB_PREFER_SYSTEM_LIB_ZSTD and FLB_PREFER_SYSTEM_LIB_KAFKA with the FLB_ prefix.
Applied to files:
tests/internal/aws_compress.c
🧬 Code graph analysis (7)
tests/internal/parquet.c (3)
src/flb_parquet_impl.cpp (1)
field_name(548-548)src/flb_pack.c (1)
flb_pack_json(530-535)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
tests/runtime/out_s3_blob.c (2)
tests/include/s3_test_helpers.h (5)
s3_test_next(449-457)s3_test_create_temp_db_path(138-195)s3_test_create_temp_store_dir(205-268)s3_test_cleanup_temp_db(428-441)s3_test_cleanup_temp_store_dir(395-420)tests/include/aws_client_mock.c (3)
flb_aws_client_mock_configure_generator(42-47)flb_aws_client_get_mock_generator(137-140)flb_aws_client_mock_clear_generator_instance(71-74)
tests/runtime/out_s3_format.c (3)
tests/include/s3_test_helpers.h (5)
s3_test_next(449-457)s3_test_create_temp_db_path(138-195)s3_test_create_temp_store_dir(205-268)s3_test_cleanup_temp_db(428-441)s3_test_cleanup_temp_store_dir(395-420)tests/include/aws_client_mock.c (3)
flb_aws_client_mock_configure_generator(42-47)flb_aws_client_get_mock_generator(137-140)flb_aws_client_mock_clear_generator_instance(71-74)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
tests/runtime/out_s3_edge_cases.c (3)
tests/include/s3_test_helpers.h (5)
s3_test_next(449-457)s3_test_create_temp_db_path(138-195)s3_test_create_temp_store_dir(205-268)s3_test_cleanup_temp_db(428-441)s3_test_cleanup_temp_store_dir(395-420)tests/include/aws_client_mock.c (3)
flb_aws_client_mock_configure_generator(42-47)flb_aws_client_get_mock_generator(137-140)flb_aws_client_mock_clear_generator_instance(71-74)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)
tests/internal/aws_compress.c (3)
src/flb_snappy.c (2)
flb_snappy_compress_framed_data(134-255)flb_snappy_uncompress_framed_data(257-481)include/fluent-bit/flb_mem.h (1)
flb_free(126-128)src/aws/flb_aws_compress.c (1)
flb_aws_compression_get_type(87-108)
tests/include/s3_test_helpers.h (2)
plugins/out_s3/s3.c (1)
setenv(66-69)tests/include/aws_client_mock.c (1)
flb_aws_client_mock_generator_count_unused_requests(143-147)
tests/internal/aws_util.c (3)
src/aws/flb_aws_util.c (1)
flb_get_s3_key(1311-1389)src/flb_sds.c (1)
flb_sds_destroy(389-399)tests/internal/fuzzers/aws_util_fuzzer.c (1)
initialization_crutch(27-36)
🔇 Additional comments (28)
tests/internal/aws_compress.c (6)
77-124: LGTM!The comment block correctly references Google's Snappy framing format specification. Test case 1 properly validates the round-trip compression for a simple string with correct memory management - variables are reset to NULL after freeing, and both success and failure paths handle cleanup appropriately.
126-165: LGTM!Test case 2 properly exercises multi-block compression with a 100KB buffer that spans multiple 64KB Snappy blocks. Memory management is correct across all code paths, and the test validates both length and content integrity of the round-trip.
241-245: LGTM!Test case 4 correctly validates that the API returns -1 for NULL/empty input, matching the documented behavior of
flb_snappy_compress_framed_data.
413-413: LGTM!Test registration follows the existing pattern and correctly adds the new Snappy test to the test suite.
482-483: LGTM!Minor formatting cleanup - removes extraneous whitespace after the function call.
8-8: No changes needed — Snappy is unconditionally available in Fluent Bit.Snappy is bundled in
lib/snappy-fef67ac/and always compiled as part of the core build, similar to ZSTD. Unlike optional components like Arrow/Parquet that use#ifdef FLB_HAVE_ARROWguards, Snappy has no conditional compilation flag and is always available. The include#include <fluent-bit/flb_snappy.h>at line 8 is correct and does not require guards.Likely an incorrect or invalid review comment.
tests/include/aws_client_mock.c (3)
56-74: LGTM! Safe lifecycle management for mock instances.The separation of clearing and destroying prevents double-free scenarios when the S3 plugin has already freed the mock client. The
flb_aws_client_mock_clear_generator_instance()function enables clean test reuse.
266-299: LGTM! Proper dynamic allocation for DATA field.The DATA field handling correctly allocates memory that
flb_http_client_destroy()can safely free, avoiding use-after-free issues with static test data.
306-313: LGTM! Correctly excludes DATA from generic expansion.The macro condition prevents double-handling of the DATA field which is specially processed above.
tests/include/s3_test_helpers.h (4)
138-195: LGTM! Robust cross-platform temp file creation.The implementation correctly uses platform-specific APIs (
GetTempFileNameA/mkstemp) for atomic unique file creation, with proper cleanup on allocation failure.
395-420: LGTM! Safe directory cleanup with proper validation.The implementation correctly validates paths against known temp directories before deletion, uses platform-specific APIs instead of
system(), and properly handles recursive directory removal.
475-507: LGTM! Deterministic mock chain verification.The polling approach replaces arbitrary
sleep()calls with condition-based waiting, ensuring tests fail deterministically if retry logic regresses rather than passing based on timing luck.
88-88: Static counter in header is appropriate for the build model.
g_s3_test_counteris declaredstatic, giving each translation unit its own copy. Since each S3 test is compiled as a separate executable (confirmed:add_executableis called once per test source file in the foreach loop, with each test linked independently), this per-executable isolation is correct and prevents any counter collisions between tests.tests/runtime/out_s3_error_handling.c (2)
1-13: LGTM! Clean test structure with proper includes.The test file uses the cross-platform helpers from
s3_test_helpers.hand avoids direct POSIX includes, addressing previous portability concerns.
44-106: LGTM! Robust test pattern with proper cleanup.The test follows best practices with proper NULL checks (lines 71-75), early return on failure, and comprehensive cleanup including
flb_aws_client_mock_clear_generator_instance().tests/runtime/out_s3_format.c (3)
9-10: Direct.cinclude is intentional for test architecture.The
#include "../include/aws_client_mock.c"pattern is documented in the mock header: each test is built as a standalone executable, so this avoids the need for a separate test library while preventing symbol conflicts.
84-86: LGTM! Return value properly checked.All
flb_lib_push()calls now capture and validate the return value, addressing the previous review concern.
851-910: LGTM! Good negative test for configuration validation.The test properly verifies that invalid compression types cause startup failure, with correct handling of the unexpected success path.
tests/runtime/out_s3_multipart.c (4)
69-126: LGTM! Metadata test now covers both ACL and storage class.The test properly expects both
x-amz-aclandx-amz-storage-classheaders on the CreateMultipartUpload request, and configures bothcanned_aclandstorage_classoptions, addressing the previous review concern.
336-388: LGTM! Proper abort flow testing.The test correctly expects the DELETE request (AbortMultipartUpload) after CompleteMultipartUpload fails, validating the cleanup behavior.
487-491: LGTM! Push return values properly checked in loop.Each
flb_lib_push()call in the loop now validates the return value, addressing the previous review concern about silent failures.
503-515: LGTM! Complete test registration.All test functions are properly registered in the
TEST_LISTwith correct null terminator.tests/internal/aws_util.c (2)
47-67: Confirm whether$FILE_PATHshould normalize internal./segments (current tests expect them preserved).
If the intent is “safe + normalized path,” consider collapsing./and../segments (after initial stripping) rather than emitting them into S3 keys. If the intent is “literal path minus dangerous prefix,” current tests are consistent.Also applies to: 450-533
199-233: Nice coverage expansion for$FILE_PATH/$FILE_NAMEand edge prefixes.
These cases line up with the current implementation’s prefix-stripping behavior and should help prevent regressions.Also applies to: 237-267, 269-305
tests/internal/parquet.c (2)
25-35: Build gating for Parquet/Arrow dependencies is already properly configured.The parquet test is correctly gated by
FLB_PARQUET_ENCODERin the CMakeLists.txt: the test file is only added to compilation when the flag is enabled (lines 138-143), and linking withARROW_LIBRARIESandPARQUET_LIBRARIESis also conditional on the same flag (lines 240-241). The test will not be built if the encoder is disabled, preventing any linker failures.
69-136: CRC fields are safe to hardcode to 0 in this test fixture.The test writes a raw file that is read via
flb_chunk_file_open, which validates only the magic bytes and metadata length—it does not validate CRC. Additionally, ChunkIO's CRC validation is optional (controlled by theCIO_CHECKSUMflag) and is only applied during write operations through the ChunkIO API, not when reading files with standard file I/O. Since the test fixture is not created through ChunkIO's normal initialization, CRC validation will not be triggered. This approach is consistent with how test fixtures safely bypass validation checks.Likely an incorrect or invalid review comment.
tests/runtime/CMakeLists.txt (2)
238-246: LGTM! Well-organized modular test structure.Splitting the monolithic S3 test file into nine focused test files improves test maintainability and makes it easier to identify failures in specific S3 subsystems (config, multipart, format, error handling, etc.).
300-306: LGTM! Correct Windows library linkage for UUID functions.The
ole32andrpcrt4libraries are appropriate for the Windows UUID APIs mentioned in the comment. These functions are actually used ins3_test_helpers.hin thes3_test_create_temp_store_dir()function for generating unique temporary directories on Windows. TheMATCHES "out_s3"pattern correctly captures all nine S3 test files.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| flb_stop(ctx); | ||
| flb_destroy(ctx); | ||
| flb_aws_client_mock_clear_generator_instance(); | ||
| s3_test_cleanup_temp_db(db_path); | ||
| s3_test_cleanup_temp_store_dir(store_dir); | ||
| flb_free(db_path); | ||
| flb_free(store_dir); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix mock teardown: flb_aws_client_mock_clear_generator_instance() likely leaks request chains; use flb_aws_client_mock_destroy_generator() instead.
clear_generator_instance() only NULLs the global pointer; it doesn’t free the mock wrapper / request chain. Given how often these tests run, this can accumulate and also undermines leak-checking (Valgrind).
Proposed patch
- flb_aws_client_mock_clear_generator_instance();
+ flb_aws_client_mock_destroy_generator();Also applies to: 156-164, 231-239, 298-306, 365-373, 435-443, 501-509, 571-579
🤖 Prompt for AI Agents
In `@tests/runtime/out_s3_blob.c` around lines 75 - 82, Replace calls to
flb_aws_client_mock_clear_generator_instance() with
flb_aws_client_mock_destroy_generator() to properly free the mock wrapper and
request chain; update every teardown block (e.g., the instance at the shown
lines and the other occurrences around the test cleanup blocks) so that after
flb_stop(ctx) and flb_destroy(ctx) you call
flb_aws_client_mock_destroy_generator() instead of
flb_aws_client_mock_clear_generator_instance(), preserving the surrounding
cleanup calls (s3_test_cleanup_temp_db, s3_test_cleanup_temp_store_dir,
flb_free) in the same order.
| /* Test: storage_class variations */ | ||
| void flb_test_storage_class_variations(void) | ||
| { | ||
| flb_ctx_t *ctx; | ||
| int in_ffd, out_ffd, ret; | ||
| char *db_path, *store_dir; | ||
| struct flb_out_s3_init_options init_options = {0}; | ||
| struct flb_aws_client_mock_request_chain *chain; | ||
| const char *classes[] = { | ||
| "STANDARD", | ||
| "STANDARD_IA", | ||
| "GLACIER", | ||
| "INTELLIGENT_TIERING" | ||
| }; | ||
| int i; | ||
|
|
||
| for (i = 0; i < 4; i++) { | ||
| s3_test_next(); | ||
|
|
||
| chain = FLB_AWS_CLIENT_MOCK( | ||
| response(expect(METHOD, FLB_HTTP_POST), | ||
| set(STATUS, 200), set(PAYLOAD, S3_TEST_MOCK_CREATE_RESP), | ||
| set(PAYLOAD_SIZE, sizeof(S3_TEST_MOCK_CREATE_RESP) - 1)), | ||
| response(expect(METHOD, FLB_HTTP_PUT), | ||
| set(STATUS, 200), set(DATA, S3_TEST_MOCK_UPLOAD_PART_RESP), | ||
| set(DATA_SIZE, sizeof(S3_TEST_MOCK_UPLOAD_PART_RESP) - 1)), | ||
| response(expect(METHOD, FLB_HTTP_POST), | ||
| set(STATUS, 200), set(PAYLOAD, S3_TEST_MOCK_COMPLETE_RESP), | ||
| set(PAYLOAD_SIZE, sizeof(S3_TEST_MOCK_COMPLETE_RESP) - 1)) | ||
| ); | ||
|
|
||
| flb_aws_client_mock_configure_generator(chain); | ||
| init_options.client_generator = flb_aws_client_get_mock_generator(); | ||
|
|
||
| db_path = s3_test_create_temp_db_path("cfg"); | ||
| store_dir = s3_test_create_temp_store_dir("cfg"); | ||
| TEST_CHECK(db_path != NULL); | ||
| TEST_CHECK(store_dir != NULL); | ||
|
|
||
| ctx = flb_create(); | ||
| TEST_CHECK(ctx != NULL); | ||
| flb_service_set(ctx, "flush", "1", "grace", "1", NULL); | ||
|
|
||
| in_ffd = flb_input(ctx, (char *)"lib", NULL); | ||
| TEST_CHECK(in_ffd >= 0); | ||
| flb_input_set(ctx, in_ffd, "tag", "test", NULL); | ||
|
|
||
| out_ffd = flb_output(ctx, (char *)"s3", (struct flb_lib_out_cb *)&init_options); | ||
| TEST_CHECK(out_ffd >= 0); | ||
| flb_output_set(ctx, out_ffd, "match", "*", | ||
| "region", S3_TEST_DEFAULT_REGION, | ||
| "bucket", S3_TEST_DEFAULT_BUCKET, | ||
| "blob_database_file", db_path, | ||
| "store_dir", store_dir, | ||
| "storage_class", classes[i], | ||
| "total_file_size", "1M", | ||
| "upload_timeout", "1s", NULL); | ||
|
|
||
| ret = flb_start(ctx); | ||
| TEST_CHECK(ret == 0); | ||
|
|
||
| ret = flb_lib_push(ctx, in_ffd, (char *)JSON_TD, sizeof(JSON_TD) - 1); | ||
| TEST_CHECK(ret >= 0); | ||
|
|
||
| S3_TEST_WAIT_MOCK_EXHAUSTED(5, "storage_class_variations"); | ||
| flb_stop(ctx); | ||
| flb_destroy(ctx); | ||
| flb_aws_client_mock_clear_generator_instance(); | ||
| s3_test_cleanup_temp_db(db_path); | ||
| s3_test_cleanup_temp_store_dir(store_dir); | ||
| flb_free(db_path); | ||
| flb_free(store_dir); | ||
| } | ||
| } | ||
|
|
||
| /* Test: canned_acl options */ | ||
| void flb_test_canned_acl_options(void) | ||
| { | ||
| flb_ctx_t *ctx; | ||
| int in_ffd, out_ffd, ret; | ||
| char *db_path, *store_dir; | ||
| struct flb_out_s3_init_options init_options = {0}; | ||
| struct flb_aws_client_mock_request_chain *chain; | ||
| const char *acls[] = { | ||
| "private", | ||
| "public-read", | ||
| "bucket-owner-full-control" | ||
| }; | ||
| int i; | ||
|
|
||
| for (i = 0; i < 3; i++) { | ||
| s3_test_next(); | ||
|
|
||
| chain = FLB_AWS_CLIENT_MOCK( | ||
| response(expect(METHOD, FLB_HTTP_POST), | ||
| expect(HEADER, "x-amz-acl", acls[i]), | ||
| set(STATUS, 200), set(PAYLOAD, S3_TEST_MOCK_CREATE_RESP), | ||
| set(PAYLOAD_SIZE, sizeof(S3_TEST_MOCK_CREATE_RESP) - 1)), | ||
| response(expect(METHOD, FLB_HTTP_PUT), | ||
| set(STATUS, 200), set(DATA, S3_TEST_MOCK_UPLOAD_PART_RESP), | ||
| set(DATA_SIZE, sizeof(S3_TEST_MOCK_UPLOAD_PART_RESP) - 1)), | ||
| response(expect(METHOD, FLB_HTTP_POST), | ||
| set(STATUS, 200), set(PAYLOAD, S3_TEST_MOCK_COMPLETE_RESP), | ||
| set(PAYLOAD_SIZE, sizeof(S3_TEST_MOCK_COMPLETE_RESP) - 1)) | ||
| ); | ||
|
|
||
| flb_aws_client_mock_configure_generator(chain); | ||
| init_options.client_generator = flb_aws_client_get_mock_generator(); | ||
|
|
||
| db_path = s3_test_create_temp_db_path("cfg"); | ||
| store_dir = s3_test_create_temp_store_dir("cfg"); | ||
| TEST_CHECK(db_path != NULL); | ||
| TEST_CHECK(store_dir != NULL); | ||
|
|
||
| ctx = flb_create(); | ||
| TEST_CHECK(ctx != NULL); | ||
| flb_service_set(ctx, "flush", "1", "grace", "1", NULL); | ||
|
|
||
| in_ffd = flb_input(ctx, (char *)"lib", NULL); | ||
| TEST_CHECK(in_ffd >= 0); | ||
| flb_input_set(ctx, in_ffd, "tag", "test", NULL); | ||
|
|
||
| out_ffd = flb_output(ctx, (char *)"s3", (struct flb_lib_out_cb *)&init_options); | ||
| TEST_CHECK(out_ffd >= 0); | ||
| flb_output_set(ctx, out_ffd, "match", "*", | ||
| "region", S3_TEST_DEFAULT_REGION, | ||
| "bucket", S3_TEST_DEFAULT_BUCKET, | ||
| "blob_database_file", db_path, | ||
| "store_dir", store_dir, | ||
| "canned_acl", acls[i], | ||
| "total_file_size", "1M", | ||
| "upload_timeout", "1s", NULL); | ||
|
|
||
| ret = flb_start(ctx); | ||
| TEST_CHECK(ret == 0); | ||
|
|
||
| ret = flb_lib_push(ctx, in_ffd, (char *)JSON_TD, sizeof(JSON_TD) - 1); | ||
| TEST_CHECK(ret >= 0); | ||
|
|
||
| S3_TEST_WAIT_MOCK_EXHAUSTED(5, "canned_acl_options"); | ||
| flb_stop(ctx); | ||
| flb_destroy(ctx); | ||
| flb_aws_client_mock_clear_generator_instance(); | ||
| s3_test_cleanup_temp_db(db_path); | ||
| s3_test_cleanup_temp_store_dir(store_dir); | ||
| flb_free(db_path); | ||
| flb_free(store_dir); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strengthen config tests with header expectations (low-flake, validates behavior).
Right now these mostly validate “plugin starts + uploads,” not that the option affected requests. Suggest:
storage_class: expectx-amz-storage-class: <class>on CreateMultipartUploadcontent_type: expectContent-Type: <type>(where set)send_content_md5: when true, expectContent-MD5header is present (and absent when false), if that’s how the plugin implements it
Example (storage_class_variations)
- chain = FLB_AWS_CLIENT_MOCK(
- response(expect(METHOD, FLB_HTTP_POST),
+ chain = FLB_AWS_CLIENT_MOCK(
+ response(expect(METHOD, FLB_HTTP_POST),
+ expect(HEADER, "x-amz-storage-class", classes[i]),
set(STATUS, 200), set(PAYLOAD, S3_TEST_MOCK_CREATE_RESP),
set(PAYLOAD_SIZE, sizeof(S3_TEST_MOCK_CREATE_RESP) - 1)),
response(expect(METHOD, FLB_HTTP_PUT),
set(STATUS, 200), set(DATA, S3_TEST_MOCK_UPLOAD_PART_RESP),
set(DATA_SIZE, sizeof(S3_TEST_MOCK_UPLOAD_PART_RESP) - 1)),
response(expect(METHOD, FLB_HTTP_POST),
set(STATUS, 200), set(PAYLOAD, S3_TEST_MOCK_COMPLETE_RESP),
set(PAYLOAD_SIZE, sizeof(S3_TEST_MOCK_COMPLETE_RESP) - 1))
);Also applies to: 841-983
| flb_stop(ctx); | ||
| flb_destroy(ctx); | ||
| flb_aws_client_mock_clear_generator_instance(); | ||
|
|
||
| /* Verify no crash occurred - reaching here is the test */ | ||
| TEST_CHECK(ret == 0); | ||
|
|
||
| s3_test_cleanup_temp_db(db_path); | ||
| s3_test_cleanup_temp_store_dir(store_dir); | ||
| flb_free(db_path); | ||
| flb_free(store_dir); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix mock teardown: flb_aws_client_mock_clear_generator_instance() should be flb_aws_client_mock_destroy_generator() (leak prevention).
This is especially important in flb_test_empty_data_upload() where the mock chain may never be consumed.
Proposed patch
- flb_aws_client_mock_clear_generator_instance();
+ flb_aws_client_mock_destroy_generator();Also applies to: 200-208, 282-290, 363-371, 447-455, 530-538, 617-625, 698-706, 782-790, 866-874, 949-957, 1028-1036, 1092-1103
🤖 Prompt for AI Agents
In `@tests/runtime/out_s3_edge_cases.c` around lines 111 - 122, Replace calls to
flb_aws_client_mock_clear_generator_instance() with
flb_aws_client_mock_destroy_generator() to properly tear down the AWS client
mock and avoid leaks; update every test teardown that currently calls
flb_aws_client_mock_clear_generator_instance() (e.g., in
flb_test_empty_data_upload and the other teardown blocks around the occurrences
listed) so they call flb_aws_client_mock_destroy_generator() instead, ensuring
the mock generator is destroyed during flb_stop()/flb_destroy() cleanup.
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
New Features
Bug Fixes / Reliability
Tests
Deprecated
✏️ Tip: You can customize this high-level summary in your review settings.