enhancement(kafka source): add multithreading option for kafka source #24550

esensar · 2026-01-27T10:47:09Z

Summary

This adds a new configuration parameter for kafka source: multithreading.

When enabled, message parsing will be run in separate tasks (limited by the max_message_handling_tasks configuration), to boost throughput. All results are processed in order, to ensure that acknowledgements are correctly processed.

To reduce the overhead of all acknowledgements, which still caused issues even with multiple message processing threads, messages are now processed in chunks, up to CHUNK_SIZE (1000). This holds true even if multithreading is disabled. This might slightly change behavior, by not committing on each individual message, but on batches.

Vector configuration

api:
  enabled: true

sources:
  kafka:
    type: "kafka"
    bootstrap_servers: "kafka:9092"
    group_id: "aeohkh4k1j"
    topics: ["conn"]
    auto_offset_reset: "beginning"
    commit_interval_ms: 5000
    session_timeout_ms: 30000
    decoding:
      codec: bytes
    metrics:
      topic_lag_metric: true
    multithreading: {}


sinks:
  my_sink_id:
    type: "blackhole"
    inputs: ["kafka"]

How did you test this PR?

Ran the above configuration (based on #22958), using the producer from #22958, set to produce 300k events per second. When multithreading is disabled, the source processes ~190k messages per second (observed via vector top). When it is enabled (with the default of 4 tasks, but even just 1 task makes an improvement) it runs at ~350k messages per second.

Change Type

Bug fix
New feature
Non-functional (chore, refactoring, docs)
Performance

Is this a breaking change?

Yes
No

Does this PR include user facing changes?

Yes. Please add a changelog fragment based on our guidelines.
No. A maintainer will apply the no-changelog label to this PR.

References

Notes

Please read our Vector contributor resources.
Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
Some CI checks run only after we manually approve them.
- We recommend adding a pre-push hook, please see this template.
- Alternatively, we recommend running the following locally before pushing to the remote branch:
  - make fmt
  - make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
  - make test
After a review is requested, please avoid force pushes to help us review incrementally.
- Feel free to push as many commits as you want. They will be squashed into one before merging.
- For example, you can run git merge origin master and git push.
If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
run make build-licenses to regenerate the license inventory and commit the changes (if any). More details here.

Sponsored by Quad9

This adds a new configuration parameter for `kafka` source: multithreading. When enabled, message parsing will be run in separate tasks (limited by the `max_message_handling_tasks` configuration), to boost throughput. All results are processed in order, to ensure that acknowledgements are correctly processed. To reduce the overhead of all acknowledgements, which still caused issues even with multiple message processing threads, messages are now processed in chunks, up to CHUNK_SIZE (1000). This holds true even if multithreading is disabled. This might slightly change behavior, by not committing on each individual message, but on batches. Fixes: vectordotdev#22958

freejool · 2026-01-28T08:22:32Z

It has a significant effect!

How about setting the thread num to host's cpu core num by default?

esensar · 2026-01-28T10:44:14Z

It has a significant effect!

How about setting the thread num to host's cpu core num by default?

That makes sense. The only reason I kept it hardcoded is because I based this on dnstap source config options, but core num makes more sense. I will update it.

esensar requested review from a team as code owners January 27, 2026 10:47

github-actions bot added domain: sources Anything related to the Vector's sources domain: external docs Anything related to Vector's external, public documentation labels Jan 27, 2026

Add changelog entry

43bd002

esensar mentioned this pull request Jan 27, 2026

Poor kafka source consumer performance #22958

Open

esensar added 3 commits January 27, 2026 12:07

Make active tasks num naming more consistent

23502e1

Add a custom OwnedMessage type to reduce payload copying

62b550b

Merge branch 'master' into enhancement/kafka-source-multithreading

78c9b92

domalessi approved these changes Jan 27, 2026

View reviewed changes

Default to available cores for max_message_handling_tasks

0c00ec8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enhancement(kafka source): add multithreading option for kafka source #24550

enhancement(kafka source): add multithreading option for kafka source #24550

esensar commented Jan 27, 2026 •

edited

Loading

Uh oh!

freejool commented Jan 28, 2026

Uh oh!

esensar commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

enhancement(kafka source): add multithreading option for kafka source #24550

Are you sure you want to change the base?

enhancement(kafka source): add multithreading option for kafka source #24550

Conversation

esensar commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Vector configuration

How did you test this PR?

Change Type

Is this a breaking change?

Does this PR include user facing changes?

References

Notes

Uh oh!

freejool commented Jan 28, 2026

Uh oh!

esensar commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

esensar commented Jan 27, 2026 •

edited

Loading