Skip to content

config: load and validate generated config models#4898

Open
MikeGoldsmith wants to merge 26 commits intoopen-telemetry:mainfrom
honeycombio:mike/otel-config/1-loading-validation
Open

config: load and validate generated config models#4898
MikeGoldsmith wants to merge 26 commits intoopen-telemetry:mainfrom
honeycombio:mike/otel-config/1-loading-validation

Conversation

@MikeGoldsmith
Copy link
Member

@MikeGoldsmith MikeGoldsmith commented Feb 6, 2026

Description

The next step of implementing configuration file support. This PR loads and validates the generated config models. Using the model to configure the SDK will come in later PRs.

Based on the following PR which added model generation from the schema:

Contributes to:

Type of change

  • New feature (non-breaking change which adds functionality)

Changes

  • Stores a local version of the schema for use with data model generation and schema validation
  • Updates code generation tooling to use the local bundled schema instead of fetching from URL
  • Adds file configuration support: YAML/JSON loading, env var substitution, and schema validation against the vendored OTel config JSON schema
  • Adds jsonschema >= 4.0 to the file-configuration optional extra
  • Bumps typing_extensions pin to 4.12.0 in test requirements for Python 3.13+ compatibility (referencing, a transitive dep of jsonschema, uses TypeVar defaults which require this version)

How Has This Been Tested?

36 unit tests covering file loading, parsing (YAML/JSON), env substitution, and schema validation.

  • Configuration Loading Tests (test_loader.py - 18 tests)
  • Environment Variable Substitution Tests (test_env_substitution.py - 15 tests)
  • Model Tests (test_models.py - 3 tests)

Does This PR Require a Contrib Repo Change?

  • No.

Checklist:

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

codeboten and others added 13 commits January 15, 2026 09:57
Proposing that the first step towards implementing OpenTelemetry Configuration is to
produce the model code from the json schema. I did a quick search for tools available
to do this and came across datamodel-codegen which seems to do what i expected.

Will open following pull requests (in draft) to use this model code, i just want
to keep these as clearly separated as possible to make reviewing them easier.

Signed-off-by: alex boten <223565+codeboten@users.noreply.github.com>
… into codeboten/generate-config-model-from-schema
…ate-config-model-from-schema

fix code-generation command and regenerate models
…ate-config-model-from-schema.2

Fix lint errors and update uv.lock file
Signed-off-by: alex boten <223565+codeboten@users.noreply.github.com>
… into codeboten/generate-config-model-from-schema
…com:codeboten/opentelemetry-python into mike/otel-config/1-loading-validation
datamodel-codegen section was inserted between [tool.pyright] and its
include/exclude config, causing pyright to check entire repo (599 files)
instead of just included paths. Moved datamodel-codegen section after
pyright config.
…com:codeboten/opentelemetry-python into mike/otel-config/1-loading-validation
- Fix re.sub callback return type (must return str, not str | None)
- Rename DOLLAR_PLACEHOLDER to dollar_placeholder (pylint naming)
- Rename f to temp_file/config_file (pylint naming)
- Update uv.lock
Fixes yaml import warning by installing PyYAML type stubs and the
file-configuration optional dependencies.
@github-actions
Copy link

github-actions bot commented Mar 3, 2026

This PR has been automatically marked as stale because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 days of this comment.
If you're still working on this, please add a comment or push new commits.

@github-actions github-actions bot added the Stale label Mar 3, 2026
- Bundle schema.json (v1.0.0-rc.3) alongside models.py
- Add jsonschema >= 4.0 to file-configuration optional extra
- Validate parsed config against schema before constructing model,
  with field path included in error messages for nested violations
- Switch datamodel-codegen to use local schema (drop [http] extra)
- Add schema validation tests (wrong type, missing required, nested
  path, enum violation)

Assisted-by: Claude Sonnet 4.6
@MikeGoldsmith MikeGoldsmith force-pushed the mike/otel-config/1-loading-validation branch from b22fc34 to 9778ae5 Compare March 3, 2026 13:51
- Replace global statement with list cache in _get_schema
- Extract _validate_schema helper to reduce branches/statements
  in load_config_file

Assisted-by: Claude Sonnet 4.6
jsonschema's `referencing` dep uses TypeVar defaults which requires
typing_extensions>=4.12.0 on Python 3.13+ (4.10.0 raises AttributeError
on TypeVar.__default__).

Assisted-by: Claude Sonnet 4.6
@MikeGoldsmith MikeGoldsmith marked this pull request as ready for review March 3, 2026 14:18
@MikeGoldsmith MikeGoldsmith requested a review from a team as a code owner March 3, 2026 14:18
@tammy-baylis-swi
Copy link
Contributor

Thank you for this @MikeGoldsmith ! I'm new to opentelemetry-configuration so figuring this out.

The local schema copy is nice for performance. Is this something to semi-automatically manage later, kinda like the semconvgen?

Once this gets merged, is this correct?: opentelemetry-sdk[file-configuration] is optional. It can be used to load config from yaml/json but not both. The local schema defines which env vars can be substituted -- these may or may not overlap with current SDK env vars. Config is loaded but values are not applied to SDK components, i.e. that'll be later.

There's mention of required_env_var in a few places. Where do I look to know which are required?

@MikeGoldsmith
Copy link
Member Author

The local schema copy is nice for performance. Is this something to semi-automatically manage later, kinda like the semconvgen?

We added the model generation code in a previous PR, this PR builds on top of it to support loading a customer config and validating it. Previously we were looking up the schema from a URL, in this PR I decided to vendor it so we can do runtime validation too. There is a script here that can be used to regenerate the schema models:


Once this gets merged, is this correct?: opentelemetry-sdk[file-configuration] is optional. It can be used to load config from yaml/json but not both.

It supports both YAML and JSON — the loader detects the format based on file extension (.yaml/.yml for YAML, .json for JSON).


The local schema defines which env vars can be substituted

The schema defines the expected structure of the config file. Which env vars get substituted depends on what ${VAR} references appear in the user's config file — the schema doesn't control those.


Config is loaded but values are not applied to SDK components, i.e. that'll be later.

Correct - I've put an implementation plan in this issue to help break up the work into smaller chunks:


There's mention of required_env_var in a few places. Where do I look to know which are required?

"Required" here means any ${VAR} reference without a default value (i.e. ${VAR} vs ${VAR:-default}). There's no fixed list — it's determined by what's in the user's config file. If a referenced env var is unset and has no default, a ConfigurationError is raised.

Hope that helps, thanks @tammy-baylis-swi 😄

@tammy-baylis-swi
Copy link
Contributor

Thank you @MikeGoldsmith , especially for those links. If only I had scrolled down 😁

It supports both YAML and JSON — the loader detects the format based on file extension (.yaml/.yml for YAML, .json for JSON).

Yes of course! Noticed that I had poorly-worded my original question, so (basic) follow-up: these changes only support loading from a single config file, not multiple?

"Required" here means any ${VAR} reference without a default value (i.e. ${VAR} vs ${VAR:-default}). There's no fixed list — it's determined by what's in the user's config file. If a referenced env var is unset and has no default, a ConfigurationError is raised.

Got it, thanks! This is implementation of validating config file like this example against schema.

Copy link
Contributor

@tammy-baylis-swi tammy-baylis-swi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm! I think the error raising behaviour makes sense with https://opentelemetry.io/docs/specs/otel/error-handling/#basic-error-handling-principles

@MikeGoldsmith
Copy link
Member Author

Yes of course! Noticed that I had poorly-worded my original question, so (basic) follow-up: these changes only support loading from a single config file, not multiple?

Correct, only one configuration file can be used to configure the SDK - either using the env var or by importing and using the parse functions directly.

@MikeGoldsmith MikeGoldsmith moved this from Ready for review to Approved PRs in Python PR digest Mar 9, 2026
>>> from opentelemetry.sdk._configuration.file import load_config_file
>>> config = load_config_file("otel-config.yaml")
>>> print(config.file_format)
'1.0-rc.3'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'1.0-rc.3'
'1.0.0'


# Pattern matches: ${VAR_NAME} or ${VAR_NAME:-default_value}
# Variable names must start with letter or underscore, followed by alphanumerics or underscores
pattern = r"\$\{([A-Za-z_][A-Za-z0-9_]*)(:-([^}]*))?\}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pattern = r"\$\{([A-Za-z_][A-Za-z0-9_]*)(:-([^}]*))?\}"
pattern = r"[^\$]\$\{([A-Za-z_][A-Za-z0-9_]*)(:-([^}]*))?\}"

Will something like this avoid the need for replacing the $$?


## Unreleased

- `opentelemetry-sdk`: Add file configuration support with YAML/JSON loading, environment variable substitution, and schema validation against the vendored OTel config JSON schema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the link to the PR on a newline

py-cpuinfo==9.0.0
pytest==7.4.4
jsonschema==4.23.0
pyyaml==6.0.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pyyaml==6.0.2
pyyaml==6.0.3

psutil==7.2.2
py-cpuinfo==9.0.0
pytest==7.4.4
jsonschema==4.23.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
jsonschema==4.23.0
jsonschema==4.26.0

@@ -0,0 +1 @@
file_format: "1.0-rc.3"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we bump to 1.0.0?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Approved PRs

Development

Successfully merging this pull request may close these issues.

4 participants