fix(config): watch vector config paths for sinks and transforms#25133
fix(config): watch vector config paths for sinks and transforms#25133powerumc wants to merge 5 commits intovectordotdev:masterfrom
Conversation
* Append vector `config_paths` to the list of watched files for transforms and sinks. * Enhance `ComponentConfig::contains` to correctly match modified files against watched directory paths (e.g., when using `--config-dir`).
d3bf255 to
d089638
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d089638b6a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| && config_paths | ||
| .iter() | ||
| .filter(|p| Format::from_path(p).is_ok()) | ||
| .any(|p| p.parent() == Some(path.as_path())) |
There was a problem hiding this comment.
Match config-dir updates by ancestry, not immediate parent
In ComponentConfig::contains, directory-based config paths only match when p.parent() == path, so changes under nested files like --config-dir .../transforms/foo/bar.toml are not attributed to sink/transform components. Because load_from_dir supports component subdirectories (and recursive transform subdirectories), a concurrent enrichment-table edit can still produce changed_components containing only enrichment tables, which triggers ReloadEnrichmentTables and skips reloading the changed Vector config.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Before this PR, Vector reloads even when only enrichment table files are modified within subdirectories (including nested subdirectories) of the config directory.
For example, given the following directory structure, modifying only the .csv files still triggers a reload:
test -- example.toml
-- example1.csv
-- example2.csv
-- sink -- example2.toml
$ echo '' >> example1.csv && echo '' >> example2.csv
INFO vector::config::watcher: Configuration file changed.
INFO vector::topology::running: Reloading running topology with new configuration.
INFO vector::topology::running: Running healthchecks.
INFO vector::topology::running: New configuration loaded successfully.
INFO vector: Vector has reloaded. path=[Dir("test")] internal_log_rate_limit=false
I’m not sure whether this is intended behavior in Vector or not.
If this is intended, then changes in subdirectories (including nested ones) may not need to be checked in contains().
There was a problem hiding this comment.
IMO it's worth addressing this finding by doing the following:
- Add a small helper in
src/config/watcher.rsthat checks whether a changed_path either exactly matches a watched config file or is a recognized config file under a watched config dir. - Call it before the "Only enrichment tables have changed branch"
And for testing coverage:
- A test involving a
transforms/foo/bar.tomlwhich is changed in the same window as an enrichment-table file and assert that we get aReloadFromDisksignal. - A test showing enrichment-only changes still use
ReloadEnrichmentTables.
For simplicity, I would defer discussion on whether config-file changes under --config-dir should go straight to ReloadFromDisk instead of piggybacking on component attribution.
There was a problem hiding this comment.
In my opinion, instead of adding a new helper in watcher.rs, it might be sufficient to adjust the condition in the existing ComponentConfig.contains() function as follows:
//.any(|p| p.parent() == Some(path.as_path()))
.any(|p| p.starts_with(path))(I haven’t tested this yet.)
This way, we would only need to add tests for the contains() function.
There was a problem hiding this comment.
Let's take a step back and look at the broader approach. Bear with me please, since this is a complex area due to the number of possible scenarios (especially when a config dir contains multiple configs but let's stick to a single config file for now.). That's why my first hunch was to fix the watcher design.
Bug in current implementation
Walkthrough the provided reproduction steps starting from:
echo '' >> example.toml && echo '' >> example1.csv && echo '' >> example2.csvThis appends a newline to all three files, but example.toml's parsed config is unchanged - zero components changed.
After the PR, the log shows:
Component [ComponentKey { id: "example1_csv" }, ComponentKey { id: "console1"
}, ComponentKey { id: "console2" }, ComponentKey { id: "example2_csv" }]
configuration changed.
But console1 and console2 didn't actually change. This is a bug.
The current master behavior of sending ReloadEnrichmentTables is actually
correct here - only enrichment data files changed, and the config didn't actually change.
Next steps
Change the trigger to:
sed -i '' 's/console2/console3/' test/example.toml && echo '' >> test/example1.csvDoes Vector reload the full config (picking up the codec change) or (bug) it only sends ReloadEnrichmentTables and the sink change is lost?
There was a problem hiding this comment.
This is the case where both the Configuration File and Enrichment Table are changed on the master branch. The config_paths must be converted to absolute paths in order to determine whether the .toml file in changed_paths has been modified.
As shown in the image below, config_paths (relative paths) that point to a .toml file or directory and changed_paths (absolute paths) can never match.
cargo run -- -c test/example.toml --watch-config:

cargo run -- -C test/ --watch-config:

Therefore, the part you identified as a bug is not actually a bug. This is the expected log output, since the change in the .toml file was correctly detected.
After the PR, the log shows:
Component [ComponentKey { id: "example1_csv" }, ComponentKey { id: "console1" }, ComponentKey { id: "console2" }, ComponentKey { id: "example2_csv" }] configuration changed.But console1 and console2 didn't actually change. This is a bug.
So, this PR updates config_paths to store absolute paths instead of relative ones, and ensures they are correctly compared with the absolute paths in changed_paths.
There was a problem hiding this comment.
The relative-vs-absolute path mismatch may be a separate bug for cases with a real .toml change but doesn't address (please discuss further if I am missing something) the case I'm pointing out. In my reproduction, example.toml is touched, but its parsed config is unchanged. In that case, reporting console1/console2 as changed is still a false positive.
There was a problem hiding this comment.
If the false positive you mentioned refers to the log output, I’ve modified it so that only components with changes in the Enrichment table are reported.
However, if your intention includes detecting changes in the parsed configuration, I believe that falls outside the scope of this PR. The current implementation on the master branch only detects file changes, so I had to assume that the purpose of the implementation is limited to tracking file-level modifications.
Therefore, I think the bug you pointed out should be discussed as a separate issue rather than within this PR. That said, if you believe that changes in the parsed configuration should also be included in this PR, I respect your opinion.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d089638b6a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4386ea0487
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| .into_iter() | ||
| .chain(config_paths.iter().map(<&PathBuf>::from)) | ||
| .cloned() |
There was a problem hiding this comment.
Avoid forcing all sinks/transforms into reload set
Adding the global config_paths into each component watch list causes any config-file write to classify every sink/transform as changed, which switches the watcher path to SignalTo::ReloadComponents instead of ReloadFromDisk. In that path, extend_reload_set feeds pending_reload, and ConfigDiff::new then forces those keys into to_change via need_change.contains(...), so even no-op edits (comments/whitespace) restart all sinks/transforms and can keep doing so on later reloads. This is a functional regression that can introduce unnecessary topology churn and data-path disruption.
Useful? React with 👍 / 👎.
Summary
Fixed an issue where concurrent modifications to Vector configuration files and enrichment tables resulted in only enrichment table changes being reloaded.
When
ComponentConfig.component_typeisSinkorTransform,config_pathsis empty. As a result, simultaneous changes to Vector config files and enrichment tables are not fully detected, and only enrichment table changes are picked up.This PR fixes that by adding the Vector configuration file (or directory) to
config_paths, and by checking whether the changed Vector configuration path lies under a directory listed inconfig_paths.Vector configuration
For testing, add a Vector configuration file
example.tomlunder thetestdirectory. This configuration is a simple setup that printsHello, Vector!every 2 seconds.In the same
testdirectory, create two Enrichment Table (.csv) files:example1.csvandexample2.csv.How did you test this PR?
Run Vector:
In another terminal session, from the test directory, change the Vector configuration file and the Enrichment Table (.csv) files at the same time:
You would expect a reload because the Vector configuration changed, but only the enrichment table files reload:
After this PR, the Vector configuration reloads as expected. (When the Vector configuration reloads, enrichment tables are loaded again as well.)
When only the enrichment tables change, only the enrichment tables reload, as expected:
Change Type
Is this a breaking change?
Does this PR include user facing changes?
no-changeloglabel to this PR.References
Notes
@vectordotdev/vectorto reach out to us regarding this PR.pre-pushhook, please see this template.make fmtmake check-clippy(if there are failures it's possible some of them can be fixed withmake clippy-fix)make testgit merge origin masterandgit push.Cargo.lock), pleaserun
make build-licensesto regenerate the license inventory and commit the changes (if any). More details on the dd-rust-license-tool.