Skip to content

feat(cli): add --data-dir flag and VECTOR_DATA_DIR env var#25537

Open
xfocus3 wants to merge 1 commit into
vectordotdev:masterfrom
xfocus3:feat/data-dir-cli-env-9766
Open

feat(cli): add --data-dir flag and VECTOR_DATA_DIR env var#25537
xfocus3 wants to merge 1 commit into
vectordotdev:masterfrom
xfocus3:feat/data-dir-cli-env-9766

Conversation

@xfocus3
Copy link
Copy Markdown

@xfocus3 xfocus3 commented May 30, 2026

Summary

Closes #9766.

Adds a --data-dir command-line flag and corresponding VECTOR_DATA_DIR environment variable to override the data_dir global option.

This follows the same pattern Vector already uses for other deployment-level settings such as --require-healthy / VECTOR_REQUIRE_HEALTHY. As @jszwedko noted in the issue, exposing VECTOR_DATA_DIR is desirable.

Motivation

As described in the issue, data_dir is a deployment concern rather than a pipeline concern. Today users work around this by templating it inside the config file:

data_dir: ${VECTOR_DATA_DIR:-/var/lib/vector}

This also makes vector validate painful in CI, where the configured data_dir often does not exist:

$ vector validate vector.yaml
√ Loaded ["vector.yaml"]
x data_dir "/var/lib/vector/" does not exist

With this change, the path can be supplied out-of-band without editing the config:

$ VECTOR_DATA_DIR=/tmp vector validate vector.yaml
# or
$ vector validate --data-dir /tmp vector.yaml

Behavior

  • When --data-dir / VECTOR_DATA_DIR is set, it overrides the data_dir value from the configuration file (env/CLI takes precedence), matching the precedence requested in the issue.
  • When unset, behavior is unchanged — the config file value (or the /var/lib/vector default) is used.
  • The override is applied once after the config is loaded, so it works for every load path (run, validate, etc.).

Scope

This PR intentionally scopes to data_dir, which was the primary request. The other globals mentioned in the issue (timezone, healthchecks.*) can follow the same pattern in separate PRs if desired.

How did you test this PR?

  • Added unit tests verifying the flag parses into Option<PathBuf> and defaults to None.
  • cargo check, cargo clippy -- -D warnings, and cargo fmt --check all pass locally.

Add a --data-dir command-line flag (and corresponding VECTOR_DATA_DIR
environment variable) to override the data_dir global configuration
option. When set, it takes precedence over any data_dir value in the
configuration file.

This keeps deployment-specific paths out of the configuration file and
makes it possible to validate a configuration in a CI environment where
the configured data_dir may not exist.

Closes vectordotdev#9766
@xfocus3 xfocus3 requested a review from a team as a code owner May 30, 2026 14:05
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 946a6e4464

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/cli.rs
Comment on lines +134 to +135
#[arg(long, env = "VECTOR_DATA_DIR")]
pub data_dir: Option<PathBuf>,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Apply data_dir override to validate

This option is added only to RootOpts, but the validate subcommand is executed with its own validate::Opts and SubCommand::execute passes only those subcommand opts into validate::validate. As a result, VECTOR_DATA_DIR=/tmp vector validate ... (and vector --data-dir /tmp validate ...) parses the value but never applies it before validate_config/create_tmp_directory, so the documented CI validation case still uses the config file's data_dir and can fail on the nonexistent path.

Useful? React with 👍 / 👎.

Comment thread src/app.rs
Comment on lines +654 to +659
if let Some(data_dir) = data_dir {
debug!(
message = "Overriding data_dir from command line.",
?data_dir
);
config.global.data_dir = Some(data_dir);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve data_dir override during reloads

This override is applied only to the initial config returned by load_configs; reloads in handle_signal call config::load_from_paths_with_provider_and_secrets directly and TopologyController::reload rejects changes to global options. With --watch-config/SIGHUP and a CLI/env data_dir that differs from the file (or default), the reloaded config loses this assignment, differs from the running global options, and reload fails with GlobalOptionsChanged instead of continuing to use the requested deployment-level data directory.

Useful? React with 👍 / 👎.

Comment thread src/cli.rs
Comment on lines +134 to +135
#[arg(long, env = "VECTOR_DATA_DIR")]
pub data_dir: Option<PathBuf>,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Mark data_dir as a global arg

Because this is a top-level RootOpts argument without global = true, clap only accepts it before a subcommand, so the advertised vector validate --data-dir /tmp vector.yaml form is rejected as an unexpected argument before any override can happen. If this flag is meant to work with subcommands in the position shown in the PR description, it needs to be global or defined on those subcommands.

Useful? React with 👍 / 👎.

Comment thread src/app.rs
Comment on lines +654 to +659
if let Some(data_dir) = data_dir {
debug!(
message = "Overriding data_dir from command line.",
?data_dir
);
config.global.data_dir = Some(data_dir);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Apply data_dir before merging configs

The override happens only after load_from_paths_with_provider_and_secrets has fully built the config, but GlobalOptions::merge rejects two config files that both set different non-default data_dir values during that build. In a split configuration where --data-dir/VECTOR_DATA_DIR is supplied specifically to take precedence, startup still fails with conflicting values for 'data_dir' found before this assignment runs, so the deployment-level override cannot actually supersede all config-file values.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

configuration: global options settable via environment variables natively

1 participant