Skip to content

fix(topology): prevent source pump deadlock during sink config reload#6

Open
joshcoughlan wants to merge 3 commits intomasterfrom
fix/topology-reload-deadlock
Open

fix(topology): prevent source pump deadlock during sink config reload#6
joshcoughlan wants to merge 3 commits intomasterfrom
fix/topology-reload-deadlock

Conversation

@joshcoughlan
Copy link
Copy Markdown

Summary

During config reload, when a sink in wait_for_sinks is being changed, remove_inputs previously sent Pause to the upstream fanout. This caused the source pump to block in wait_for_replacements — waiting for a Replace message that only arrives after connect_diff, which runs after shutdown_diff completes. Since shutdown_diff waits for the old sink to finish, this created a circular dependency that stalled all sources.

The fix uses Remove instead of Pause for sinks in wait_for_sinks during reload, and correspondingly uses Add instead of Replace when reconnecting them in connect_diff. This allows the source pump to continue sending events to other sinks while the old sink drains, breaking the circular dependency.

This is buffer-type agnostic — it fixes the deadlock for memory and disk buffered sinks alike, and only affects the reload path.

Vector configuration

sources:
  source_socket:
    type: socket
    address: 0.0.0.0:9000
    mode: tcp
    decoding:
      codec: json

sinks:
  sink_http:
    type: http
    inputs:
      - source_socket
    uri: http://localhost:9222
    method: get
    encoding:
      codec: json

How did you test this PR?

  • Added two new regression tests:
    • topology_reload_conflicting_sink_does_not_stall — verifies reload completes when a sink has conflicting resources
    • topology_reload_reuse_buffer_does_not_stall — verifies reload completes when buffer is reused (SIGHUP-style)
  • Both tests use a 10-second timeout to detect the deadlock
  • All 11 reload tests pass (9 existing + 2 new)
  • make check-clippy passes with all features (Docker x86_64 environment)
  • make test passes (Docker x86_64 environment)
  • make check-fmt passes

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

During config reload, when a sink in wait_for_sinks is being changed,
remove_inputs previously sent Pause to the upstream fanout. This caused
the source pump to block in wait_for_replacements (waiting for a Replace
message that only arrives after connect_diff, which runs after
shutdown_diff completes). Since shutdown_diff waits for the old sink to
finish, this created a circular dependency that stalled all sources.

The fix uses Remove instead of Pause for sinks in wait_for_sinks during
reload, and correspondingly uses Add instead of Replace when
reconnecting them in connect_diff. This allows the source pump to
continue sending events to other sinks while the old sink drains,
breaking the circular dependency.

This is buffer-type agnostic — it fixes the deadlock for memory and disk
buffered sinks alike, and only affects the reload path.

Refs: vectordotdev#24125

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
joshcoughlan and others added 2 commits April 10, 2026 13:45
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Vector fails to reload sink with failed events

1 participant