Skip to content

Conversation

@ShelbyZ
Copy link

@ShelbyZ ShelbyZ commented Dec 12, 2025

Documentation for simple_aggregation added to out_kinesis_firehose and out_kinesis_streams - fluent/fluent-bit#11284

Summary by CodeRabbit

  • Documentation
    • Added simple_aggregation configuration option for Firehose and Kinesis outputs. When enabled, multiple records are concatenated with newlines and sent as a single API call (up to 1,024,000 bytes per record). Defaults to false.

✏️ Tip: You can customize this high-level summary in your review settings.

@ShelbyZ ShelbyZ requested review from a team as code owners December 12, 2025 23:45
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 12, 2025

Walkthrough

Adds documentation for a new simple_aggregation configuration parameter to the Firehose and Kinesis output docs. The parameter documents aggregating multiple records into single API calls by newline-concatenation, up to 1,024,000 bytes, defaulting to false.

Changes

Cohort / File(s) Summary
Documentation: Simple Aggregation Parameter
pipeline/outputs/firehose.md, pipeline/outputs/kinesis.md
Added documentation for a new simple_aggregation configuration parameter that enables aggregating multiple records into single API calls by concatenating records with newlines, with a 1,024,000 byte size limit. Default: false.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Review the two Markdown files for clarity, correctness of size limit and default value, and consistent wording between Firehose and Kinesis docs.

Poem

🐰 I nibble lines and stitch them tight,
Newlines bind records through the night,
Firehose, Kinesis — one packet to send,
Simple aggregation, a rabbit's friend,
✨📦

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly matches the changeset: it documents the simple_aggregation parameter addition for both Kinesis and Firehose outputs.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f352e35 and ce08c15.

📒 Files selected for processing (2)
  • pipeline/outputs/firehose.md (1 hunks)
  • pipeline/outputs/kinesis.md (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • pipeline/outputs/firehose.md
  • pipeline/outputs/kinesis.md

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pipeline/outputs/firehose.md (1)

24-25: Fix minor typos in adjacent rows (compression, role_arn).
Line 24: “arrow is only an available…” → “only available…”. Line 25: remove the stray backtick in “(for cross account access`)”.

🧹 Nitpick comments (1)
pipeline/outputs/firehose.md (1)

30-30: Clarify that aggregated records concatenated with newlines require downstream consumers to split on newlines to recover individual events.
The documented byte limit (1,024,000 bytes) is correct per AWS Firehose limits. However, the documentation should note that when simple_aggregation concatenates multiple log records with newlines into a single Firehose record, downstream consumers must split the received record on \n to recover individual events. Also mention that if log payloads themselves contain embedded newlines, this framing approach may require additional handling depending on serialization.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6d31204 and f352e35.

📒 Files selected for processing (2)
  • pipeline/outputs/firehose.md (1 hunks)
  • pipeline/outputs/kinesis.md (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-12T14:30:10.698Z
Learnt from: kalavt
Repo: fluent/fluent-bit-docs PR: 2294
File: pipeline/inputs/kafka.md:147-168
Timestamp: 2025-12-12T14:30:10.698Z
Learning: In Fluent Bit v4.0.4+, when using AWS MSK IAM authentication (rdkafka.sasl.mechanism: aws_msk_iam), the rdkafka.security.protocol is automatically set to SASL_SSL and the AWS region is auto-detected from the broker hostname for standard MSK endpoints. The aws_msk_iam_cluster_arn parameter was removed - users only need to set rdkafka.sasl.mechanism: aws_msk_iam (and optionally aws_region for custom DNS/PrivateLink).

Applied to files:

  • pipeline/outputs/kinesis.md

| `sts_endpoint` | Custom endpoint for the STS API. | _none_ |
| `auto_retry_requests` | Immediately retry failed requests to AWS services once. This option doesn't affect the normal Fluent Bit retry mechanism with backoff. Instead, it enables an immediate retry with no delay for networking errors, which might help improve throughput when there are transient/random networking issues. | `true` |
| `external_id` | Specify an external ID for the STS API, can be used with the `role_arn` parameter if your role requires an external ID. | _none_ |
| `simple_aggregation` | Enable simple aggregation to combine multiple records into single API calls. This reduces the number of requests and can improve throughput. When enabled, multiple log records are concatenated with newlines and sent as a single record to Kinesis, up to the maximum record size limit (1,048,556 bytes). | `false` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

AWS Kinesis Data Streams maximum record size limit bytes PutRecords

💡 Result:

Maximum per-record size for PutRecords: 10 MiB (10,485,760 bytes). [1][2]

Sources:
[1] AWS Kinesis Data Streams quotas and limits.
[2] PutRecords API reference (each record can be as large as 10 MiB; whole request limit 10 MiB).


🌐 Web query:

Fluent Bit out_kinesis_streams simple_aggregation maximum record size 1048556 bytes

💡 Result:

Fluent Bit's Firehose/Kinesis output aggregation limit is 1 MiB (1,048,576 bytes) — the plugin's simple_aggregation will join records only while the aggregate doesn't exceed that 1 MiB maximum. [1][2]

If you see 1,048,556 bytes (1048556) that's almost certainly a typo/off-by-20 error; the documented limit is 1,048,576 bytes (1 MiB). [1][2]

Sources:

  • amazon-kinesis-firehose-for-fluent-bit plugin docs. [1]
  • AWS Kinesis Data Streams limits (data payload up to 10 MiB; per-record/aggregation guidance). [2]

Correct the Kinesis max record size to 1,048,576 bytes (1 MiB), not 1,048,556.
The value 1,048,556 appears to be a typo. AWS Kinesis Data Streams supports up to 10 MiB per record; Fluent Bit's simple_aggregation enforces a 1 MiB (1,048,576 bytes) limit for aggregated payloads. Also note that when simple_aggregation concatenates records with newlines, downstream consumers must split and parse them individually.

🤖 Prompt for AI Agents
In pipeline/outputs/kinesis.md around line 30, the documented maximum record
size is incorrect (1,048,556 bytes); update the value to 1,048,576 bytes (1 MiB)
to match Fluent Bit's simple_aggregation enforced limit, and keep the note that
concatenated records are newline-delimited so downstream consumers must split
and parse each record individually.

Signed-off-by: Shelby Hagman <shelbyzh@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant