Skip to content

fix(kafka source): track contiguous offsets before committing to Kafka#25130

Open
rohitmanohar wants to merge 1 commit intovectordotdev:masterfrom
rohitmanohar:kafka-offset-tracking
Open

fix(kafka source): track contiguous offsets before committing to Kafka#25130
rohitmanohar wants to merge 1 commit intovectordotdev:masterfrom
rohitmanohar:kafka-offset-tracking

Conversation

@rohitmanohar
Copy link
Copy Markdown

Summary

The Kafka source previously committed each successfully delivered offset directly, without considering whether earlier offsets in the partition had also been delivered. If message N failed downstream but message N+1 succeeded, offset N+1 would be committed, causing message N to be skipped on consumer restart.

This change introduces a per-partition offset tracker that maintains a high watermark representing the last contiguously delivered offset. The watermark only advances when the next sequential offset is delivered; any gap (caused by a failed batch) holds it back. Only the watermark value is passed to store_offset, ensuring Kafka replays from the correct position on restart.

This provides at-least-once delivery semantics when end-to-end acknowledgements are enabled. When acknowledgements are disabled, every batch resolves as Delivered immediately, so the watermark always advances and behavior is unchanged from before.

How did you test this PR?

Manually tested against a Kafka source to a non-local sink (e.g. S3)

T0: Everything is functional, Vector reads from Kafka and writes to sink. Kafka offsets increment as expected.
T1: Break connectivity to sink. Vector retries and eventually gives up writing to sink. Offset doesn't increment
T2: Recover connectivity to sink. Vector moves past the failed events and delivers new events to the sink, but the offset isn't incremented.

The offset remains stuck at the event that wasn't delivered to the sink, to ensure "at-least" once semantics are honored.

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details on the dd-rust-license-tool.

@rohitmanohar rohitmanohar requested a review from a team as a code owner April 6, 2026 19:26
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@github-actions github-actions bot added the domain: sources Anything related to the Vector's sources label Apr 6, 2026
@rohitmanohar
Copy link
Copy Markdown
Author

I have read the CLA Document and I hereby sign the CLA

@rohitmanohar rohitmanohar force-pushed the kafka-offset-tracking branch from 3f28186 to 4da6160 Compare April 6, 2026 19:31
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3f28186138

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@rohitmanohar rohitmanohar force-pushed the kafka-offset-tracking branch from 4da6160 to e2733e6 Compare April 6, 2026 19:39
@rohitmanohar
Copy link
Copy Markdown
Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e2733e6bf9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

The Kafka source previously committed each successfully delivered
offset directly, without considering whether earlier offsets in the
partition had also been delivered. If message N failed downstream but
message N+1 succeeded, offset N+1 would be committed, causing message
N to be skipped on consumer restart.

This change introduces a per-partition offset tracker that maintains a
high watermark representing the last contiguously delivered offset.
The watermark only advances when the next sequential offset is
delivered; any gap (caused by a failed batch) holds it back. Only the
watermark value is passed to store_offset, ensuring Kafka replays from
the correct position on restart.

This provides at-least-once delivery semantics when end-to-end
acknowledgements are enabled. When acknowledgements are disabled,
every batch resolves as Delivered immediately, so the watermark always
advances and behavior is unchanged from before.
@rohitmanohar rohitmanohar force-pushed the kafka-offset-tracking branch from e2733e6 to f816418 Compare April 6, 2026 19:50
@rohitmanohar
Copy link
Copy Markdown
Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f816418e14

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: sources Anything related to the Vector's sources

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant