Skip to content

Conversation

@esensar
Copy link
Contributor

@esensar esensar commented Jan 27, 2026

Summary

This adds a new configuration parameter for kafka source: multithreading.

When enabled, message parsing will be run in separate tasks (limited by the max_message_handling_tasks configuration), to boost throughput. All results are processed in order, to ensure that acknowledgements are correctly processed.

To reduce the overhead of all acknowledgements, which still caused issues even with multiple message processing threads, messages are now processed in chunks, up to CHUNK_SIZE (1000). This holds true even if multithreading is disabled. This might slightly change behavior, by not committing on each individual message, but on batches.

Vector configuration

api:
  enabled: true

sources:
  kafka:
    type: "kafka"
    bootstrap_servers: "kafka:9092"
    group_id: "aeohkh4k1j"
    topics: ["conn"]
    auto_offset_reset: "beginning"
    commit_interval_ms: 5000
    session_timeout_ms: 30000
    decoding:
      codec: bytes
    metrics:
      topic_lag_metric: true
    multithreading: {}


sinks:
  my_sink_id:
    type: "blackhole"
    inputs: ["kafka"]

How did you test this PR?

Ran the above configuration (based on #22958), using the producer from #22958, set to produce 300k events per second. When multithreading is disabled, the source processes ~190k messages per second (observed via vector top). When it is enabled (with the default of 4 tasks, but even just 1 task makes an improvement) it runs at ~350k messages per second.

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details here.

Sponsored by Quad9

This adds a new configuration parameter for `kafka` source: multithreading.

When enabled, message parsing will be run in separate tasks (limited by the
`max_message_handling_tasks` configuration), to boost throughput. All results are processed in
order, to ensure that acknowledgements are correctly processed.

To reduce the overhead of all acknowledgements, which still caused issues even with multiple message
processing threads, messages are now processed in chunks, up to CHUNK_SIZE (1000). This holds true
even if multithreading is disabled. This might slightly change behavior, by not committing on each
individual message, but on batches.

Fixes: vectordotdev#22958
@esensar esensar requested review from a team as code owners January 27, 2026 10:47
@github-actions github-actions bot added domain: sources Anything related to the Vector's sources domain: external docs Anything related to Vector's external, public documentation labels Jan 27, 2026
@freejool
Copy link

It has a significant effect!

How about setting the thread num to host's cpu core num by default?

@esensar
Copy link
Contributor Author

esensar commented Jan 28, 2026

It has a significant effect!

How about setting the thread num to host's cpu core num by default?

That makes sense. The only reason I kept it hardcoded is because I based this on dnstap source config options, but core num makes more sense. I will update it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: external docs Anything related to Vector's external, public documentation domain: sources Anything related to the Vector's sources

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Poor kafka source consumer performance

3 participants