Skip to content

fix(config): watch vector config paths for sinks and transforms#25133

Open
powerumc wants to merge 5 commits intovectordotdev:masterfrom
powerumc:powerumc/fix-config-watcher-reload
Open

fix(config): watch vector config paths for sinks and transforms#25133
powerumc wants to merge 5 commits intovectordotdev:masterfrom
powerumc:powerumc/fix-config-watcher-reload

Conversation

@powerumc
Copy link
Copy Markdown
Contributor

@powerumc powerumc commented Apr 7, 2026

Summary

Fixed an issue where concurrent modifications to Vector configuration files and enrichment tables resulted in only enrichment table changes being reloaded.

When ComponentConfig.component_type is Sink or Transform, config_paths is empty. As a result, simultaneous changes to Vector config files and enrichment tables are not fully detected, and only enrichment table changes are picked up.

This PR fixes that by adding the Vector configuration file (or directory) to config_paths, and by checking whether the changed Vector configuration path lies under a directory listed in config_paths.

Vector configuration

For testing, add a Vector configuration file example.toml under the test directory. This configuration is a simple setup that prints Hello, Vector! every 2 seconds.

[sources.example_source]
type = "exec"
command = ["echo", "Hello, Vector!"]
mode = "scheduled"
scheduled.exec_interval_secs = 2

[sinks.console]
type = "console"
inputs = ["example_source"]
encoding.codec = "text"

[sinks.console2]
type = "console"
inputs = ["example_source"]
encoding.codec = "text"

[enrichment_tables.example1_csv]
type = "file"
file.path = "/path/to/test/example1.csv"
file.encoding.type = "csv"
file.encoding.include_headers = true
schema."allow_header_fields" = "string"

[enrichment_tables.example2_csv]
type = "file"
file.path = "/path/to/test/example2.csv"
file.encoding.type = "csv"
file.encoding.include_headers = true
schema."allow_header_fields" = "string"

In the same test directory, create two Enrichment Table (.csv) files: example1.csv and example2.csv.

name, value
aaa, 111

How did you test this PR?

Run Vector:

cargo run -- -C test/ --watch-config

In another terminal session, from the test directory, change the Vector configuration file and the Enrichment Table (.csv) files at the same time:

echo '' >> example.toml && echo '' >> example1.csv && echo '' >> example2.csv

You would expect a reload because the Vector configuration changed, but only the enrichment table files reload:

Hello, Vector!
Hello, Vector!
INFO vector::config::watcher: Configuration file changed.
INFO vector::config::watcher: Component [ComponentKey { id: "example1_csv" }, ComponentKey { id: "example2_csv" }] configuration changed.
INFO vector::config::watcher: Only enrichment tables have changed.
Hello, Vector!
Hello, Vector!

After this PR, the Vector configuration reloads as expected. (When the Vector configuration reloads, enrichment tables are loaded again as well.)

# echo '' >> example.toml && echo '' >> example1.csv && echo '' >> example2.csv

Hello, Vector!
Hello, Vector!
INFO vector::config::watcher: Configuration file changed.
INFO vector::config::watcher: Component [ComponentKey { id: "example1_csv" }, ComponentKey { id: "console1" }, ComponentKey { id: "console2" }, ComponentKey { id: "example2_csv" }] configuration changed.
INFO vector::topology::running: Reloading running topology with new configuration.
INFO vector::topology::running: Running healthchecks.
INFO vector::topology::builder: Internal log [Healthcheck passed.] has been suppressed 1 times.
INFO vector::topology::builder: Healthcheck passed.
INFO vector::topology::builder: Internal log [Healthcheck passed.] is being suppressed to avoid flooding.
INFO vector::topology::running: New configuration loaded successfully.
INFO vector: Vector has reloaded. path=[Dir("test")] internal_log_rate_limit=false
Hello, Vector!
Hello, Vector!

When only the enrichment tables change, only the enrichment tables reload, as expected:

# echo '' >> example1.csv && echo '' >> example2.csv

Hello, Vector!
Hello, Vector!
INFO vector::config::watcher: Configuration file changed.
INFO vector::config::watcher: Component [ComponentKey { id: "example1_csv" }, ComponentKey { id: "example2_csv" }] configuration changed.
INFO vector::config::watcher: Only enrichment tables have changed.
Hello, Vector!
Hello, Vector!

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • make fmt
      • make check-clippy (if there are failures it's possible some of them can be fixed with make clippy-fix)
      • make test
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run make build-licenses to regenerate the license inventory and commit the changes (if any). More details on the dd-rust-license-tool.

powerumc added 2 commits April 7, 2026 15:47
* Append vector `config_paths` to the list of watched files for transforms and sinks.
* Enhance `ComponentConfig::contains` to correctly match modified files against watched directory paths (e.g., when using `--config-dir`).
@powerumc powerumc changed the title fix(config): watch vector config paths for sinks and transforms with enrichment table fix(config): watch vector config paths for sinks and transforms Apr 7, 2026
@powerumc powerumc force-pushed the powerumc/fix-config-watcher-reload branch from d3bf255 to d089638 Compare April 7, 2026 11:54
@powerumc powerumc marked this pull request as ready for review April 7, 2026 11:56
@powerumc powerumc requested a review from a team as a code owner April 7, 2026 11:56
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d089638b6a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

&& config_paths
.iter()
.filter(|p| Format::from_path(p).is_ok())
.any(|p| p.parent() == Some(path.as_path()))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Match config-dir updates by ancestry, not immediate parent

In ComponentConfig::contains, directory-based config paths only match when p.parent() == path, so changes under nested files like --config-dir .../transforms/foo/bar.toml are not attributed to sink/transform components. Because load_from_dir supports component subdirectories (and recursive transform subdirectories), a concurrent enrichment-table edit can still produce changed_components containing only enrichment tables, which triggers ReloadEnrichmentTables and skips reloading the changed Vector config.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR, Vector reloads even when only enrichment table files are modified within subdirectories (including nested subdirectories) of the config directory.

For example, given the following directory structure, modifying only the .csv files still triggers a reload:

test -- example.toml
     -- example1.csv
     -- example2.csv
     -- sink -- example2.toml

$ echo '' >> example1.csv && echo '' >> example2.csv

INFO vector::config::watcher: Configuration file changed.
INFO vector::topology::running: Reloading running topology with new configuration.
INFO vector::topology::running: Running healthchecks.
INFO vector::topology::running: New configuration loaded successfully.
INFO vector: Vector has reloaded. path=[Dir("test")] internal_log_rate_limit=false

I’m not sure whether this is intended behavior in Vector or not.
If this is intended, then changes in subdirectories (including nested ones) may not need to be checked in contains().

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's worth addressing this finding by doing the following:

  • Add a small helper in src/config/watcher.rs that checks whether a changed_path either exactly matches a watched config file or is a recognized config file under a watched config dir.
  • Call it before the "Only enrichment tables have changed branch"

And for testing coverage:

  • A test involving a transforms/foo/bar.toml which is changed in the same window as an enrichment-table file and assert that we get a ReloadFromDisk signal.
  • A test showing enrichment-only changes still use ReloadEnrichmentTables.

For simplicity, I would defer discussion on whether config-file changes under --config-dir should go straight to ReloadFromDisk instead of piggybacking on component attribution.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, instead of adding a new helper in watcher.rs, it might be sufficient to adjust the condition in the existing ComponentConfig.contains() function as follows:

//.any(|p| p.parent() == Some(path.as_path()))
.any(|p| p.starts_with(path))

(I haven’t tested this yet.)

This way, we would only need to add tests for the contains() function.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's take a step back and look at the broader approach. Bear with me please, since this is a complex area due to the number of possible scenarios (especially when a config dir contains multiple configs but let's stick to a single config file for now.). That's why my first hunch was to fix the watcher design.

Bug in current implementation

Walkthrough the provided reproduction steps starting from:

echo '' >> example.toml && echo '' >> example1.csv && echo '' >> example2.csv

This appends a newline to all three files, but example.toml's parsed config is unchanged - zero components changed.

After the PR, the log shows:

Component [ComponentKey { id: "example1_csv" }, ComponentKey { id: "console1"
}, ComponentKey { id: "console2" }, ComponentKey { id: "example2_csv" }]
configuration changed.

But console1 and console2 didn't actually change. This is a bug.

The current master behavior of sending ReloadEnrichmentTables is actually
correct here - only enrichment data files changed, and the config didn't actually change.

Next steps

Change the trigger to:

sed -i '' 's/console2/console3/' test/example.toml && echo '' >> test/example1.csv

Does Vector reload the full config (picking up the codec change) or (bug) it only sends ReloadEnrichmentTables and the sink change is lost?

Copy link
Copy Markdown
Contributor Author

@powerumc powerumc Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the case where both the Configuration File and Enrichment Table are changed on the master branch. The config_paths must be converted to absolute paths in order to determine whether the .toml file in changed_paths has been modified.

As shown in the image below, config_paths (relative paths) that point to a .toml file or directory and changed_paths (absolute paths) can never match.

cargo run -- -c test/example.toml --watch-config:
image

cargo run -- -C test/ --watch-config:
image

Therefore, the part you identified as a bug is not actually a bug. This is the expected log output, since the change in the .toml file was correctly detected.

After the PR, the log shows:

Component [ComponentKey { id: "example1_csv" }, ComponentKey { id: "console1"
}, ComponentKey { id: "console2" }, ComponentKey { id: "example2_csv" }]
configuration changed.

But console1 and console2 didn't actually change. This is a bug.

So, this PR updates config_paths to store absolute paths instead of relative ones, and ensures they are correctly compared with the absolute paths in changed_paths.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The relative-vs-absolute path mismatch may be a separate bug for cases with a real .toml change but doesn't address (please discuss further if I am missing something) the case I'm pointing out. In my reproduction, example.toml is touched, but its parsed config is unchanged. In that case, reporting console1/console2 as changed is still a false positive.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the false positive you mentioned refers to the log output, I’ve modified it so that only components with changes in the Enrichment table are reported.

However, if your intention includes detecting changes in the parsed configuration, I believe that falls outside the scope of this PR. The current implementation on the master branch only detects file changes, so I had to assume that the purpose of the implementation is limited to tracking file-level modifications.

Therefore, I think the bug you pointed out should be discussed as a separate issue rather than within this PR. That said, if you believe that changes in the parsed configuration should also be included in this PR, I respect your opinion.

@pront
Copy link
Copy Markdown
Member

pront commented Apr 7, 2026

Hi @powerumc and thank you for contributing this fix. I will take a look shortly.

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d089638b6a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4386ea0487

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +593 to +595
.into_iter()
.chain(config_paths.iter().map(<&PathBuf>::from))
.cloned()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid forcing all sinks/transforms into reload set

Adding the global config_paths into each component watch list causes any config-file write to classify every sink/transform as changed, which switches the watcher path to SignalTo::ReloadComponents instead of ReloadFromDisk. In that path, extend_reload_set feeds pending_reload, and ConfigDiff::new then forces those keys into to_change via need_change.contains(...), so even no-op edits (comments/whitespace) restart all sinks/transforms and can keep doing so on later reloads. This is a functional regression that can introduce unnecessary topology churn and data-path disruption.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants