Skip to content

fix(api): restart api server during reload with address from the new config if necessary#25131

Open
garryliu-stripe wants to merge 8 commits intovectordotdev:masterfrom
garryliu-stripe:garryliu/fix_server_addr_reloading
Open

fix(api): restart api server during reload with address from the new config if necessary#25131
garryliu-stripe wants to merge 8 commits intovectordotdev:masterfrom
garryliu-stripe:garryliu/fix_server_addr_reloading

Conversation

@garryliu-stripe
Copy link
Copy Markdown

Summary

Vector configuration

During config reload, the API server was started using self.topology.config() (the old config) instead of new_config. This meant that changing api.address in the config and triggering a reload would either:

  1. Start the API server on the old/default address (port 8686) when enabling the API during reload
  2. Not restart the API server at all when only the address changed, since the old code only started the server when self.api_server.is_none()

This fix passes &new_config to GrpcServer::start() and adds logic to detect address changes, stopping the old server and starting a new one on the correct port.

How did you test this PR?

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details on the dd-rust-license-tool.

@garryliu-stripe garryliu-stripe requested a review from a team as a code owner April 6, 2026 19:32
@github-actions github-actions bot added the domain: topology Anything related to Vector's topology code label Apr 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@pront
Copy link
Copy Markdown
Member

pront commented Apr 6, 2026

@codex review

@pront pront added the domain: api Anything related to Vector's GraphQL API label Apr 6, 2026
@pront
Copy link
Copy Markdown
Member

pront commented Apr 6, 2026

Hi @garryliu-stripe please sign the CLA and we will take a look. To sign post a PR comment with the following:

I have read the CLA Document and I hereby sign the CLA

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1f0d45be29

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@garryliu-stripe
Copy link
Copy Markdown
Author

@pront checking with our legal for CLA. will update shortly

@garryliu-stripe
Copy link
Copy Markdown
Author

I have read the CLA Document and I hereby sign the CLA

@garryliu-stripe
Copy link
Copy Markdown
Author

recheck

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fb41c2f1ba

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

garryliu-stripe and others added 2 commits April 10, 2026 12:50
Move the API server start/stop/restart logic to run only after the
topology reload succeeds. This prevents the API server from being
bounced when the topology rolls back, keeping it consistent with the
active config.

Also compare configured addresses (old vs new) instead of the bound
socket address, which avoids spurious restarts when port 0 or similar
configs resolve to a different address on bind.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
@garryliu-stripe
Copy link
Copy Markdown
Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a8d1f42b0b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +101 to +103
if let Some(server) = self.api_server.take() {
debug!("Stopping gRPC API server for restart with new address.");
drop(server);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Await API shutdown before starting replacement listener

When api.address changes and a server already exists, this block drops the old GrpcServer and then immediately starts a new one. GrpcServer shutdown is asynchronous (src/api/grpc_server.rs, the listener lives in a spawned task and only exits after the oneshot is observed), so on reloads like 0.0.0.0:8686 -> 127.0.0.1:8686 the old socket can still be bound when restart happens, causing a transient EADDRINUSE and ReloadOutcome::FatalError (process shutdown) even though the new config is valid. The restart path needs to wait for listener teardown (or otherwise coordinate) before rebinding.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should disallow hot reload of api server. If we want to change the address, we should first set api.enable: false and then set new address and re-enable api-server

garryliu-stripe and others added 2 commits April 10, 2026 13:08
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
Pass &new_config instead of self.topology.config() when starting the
API server during a reload (disabled->enabled). The old code used the
pre-reload config, so the server would bind to the old address instead
of the newly configured one.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8983bcacd2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".


match api::GrpcServer::start(
self.topology.config(),
&new_config,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Start gRPC server only after reload succeeds

Passing &new_config to GrpcServer::start before reload_config_and_respawn means a later ReloadOutcome::RolledBack can leave self.api_server running with settings from a rejected config while the topology stays on the previous config. Fresh evidence in this commit is the changed argument at this call site, which now applies new API settings prior to confirming reload success; this can expose or move the API endpoint even though the reload failed.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

@garryliu-stripe garryliu-stripe Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an acceptable compromise? The current behavior can cause vector to bind the wrong port during hot reload.

Since only allow changing address when the api is disabled (the existing behaviour), the api server will be dropped during Rollback

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will have to do something better here, either throw a hard error to indicate it's not support or even better, detect proper grpc server shutdown and rebind.

@garryliu-stripe
Copy link
Copy Markdown
Author

@pront mind taking a look?

Move the API server start/stop logic into the Ok arm of the topology
reload so it only runs after a successful reload. Use
self.topology.config() (which reflects the new config post-reload) to
bind to the correct address.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Committed-By-Agent: claude
@garryliu-stripe
Copy link
Copy Markdown
Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 335090fd3c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

debug!("Stopping gRPC API server.");
drop(server);
}
} else if self.api_server.is_none() {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restart API server when enabled address changes

The reload path still skips API restart whenever a server is already running because it only starts in the self.api_server.is_none() branch. In the common case where api.enabled stays true and only api.address changes, reload succeeds but the gRPC server remains bound to the old address, so the configured new endpoint is not actually applied until a full process restart.

Useful? React with 👍 / 👎.

@@ -0,0 +1 @@
Fixed a bug where the API server would not respect a configured `api.address` on config reload. The API server now starts with the correct address when re-enabled, and only starts after a successful topology reload to prevent inconsistent state on rollback.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this PR handle the case where the API is already running and the address changes?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does not. It still follow the current behavior that if the API is already running, address change will no bounce the server. To change address, need to first disable api, then re-enable with updated address.

The change primarily solves below situation

initial config:

# not actually set, below are default values
# api:
#   enabled: false
#   address: 127.0.0.1:8686

try enable api server on port 18686 with reload

api:
  enabled: true
  address: 127.0.0.1:18686

however, since we are starting the server with self.topology.config() instead of new_config

match api::GrpcServer::start(
    self.topology.config(),
    self.topology.watch(),
    Arc::clone(&self.topology.running),
)

the server will be using the default address.

In addition, it solves potential inconsistent address when topology needs to rollback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: api Anything related to Vector's GraphQL API domain: topology Anything related to Vector's topology code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

api server binds to the wrong port during config reloading

2 participants