Skip to content

Fix broken dag_processing.total_parse_time metric#62128

Open
nickstenning wants to merge 1 commit intoapache:mainfrom
nickstenning:fix-dag-processor-metrics
Open

Fix broken dag_processing.total_parse_time metric#62128
nickstenning wants to merge 1 commit intoapache:mainfrom
nickstenning:fix-dag-processor-metrics

Conversation

@nickstenning
Copy link
Contributor

@nickstenning nickstenning commented Feb 18, 2026

DagFileProcessorManager has been emitting a nonsense value for dag_processing.total_parse_time since 8774f28, which reversed the order in which emit_metrics and prepare_file_queue (then named prepare_file_path_queue) were called.

As prepare_file_path_queue was responsible for resetting the value of self._parsing_start_time, the assumption made by emit_metrics was that it would be called once the file queue had been cleared, but crucially before prepare_file_queue was called to refill the queue.

I've put the order of these calls back to one that generates valid metrics, and restructured the code in order to make the data dependencies explicit so that it's harder to make the same mistake in future refactoring.

DagFileProcessorManager has been emitting a nonsense value for
`dag_processing.total_parse_time` since 8774f28, which reversed the
order in which `emit_metrics` and `prepare_file_queue` (then called
`prepare_file_path_queue`) were called.

As `prepare_file_path_queue` was responsible for resetting the value of
`self._parsing_start_time`, the assumption made by `emit_metrics` was
that it would be called once the file queue had been cleared, but
crucially before `prepare_file_queue` was called to refill the queue.

I've put the order of these calls back to one that generates valid
metrics, and restructured the code in order to make the data
dependencies explicit so that it's harder to make the same mistake in
future refactoring.
self._refresh_dag_bundles(known_files=known_files)

if not self._file_queue:
if self._parsing_start_time:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a simple test:

  1. First cycle with empty queue does not emit parse-time metric and does not crash,
  2. Subsequent empty-queue transition emits dag_processing.total_parse_time from the previous cycle window,
  3. _parsing_start_time reset semantics are correct.

self._refresh_dag_bundles(known_files=known_files)

if not self._file_queue:
if self._parsing_start_time:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self._parsing_start_time is defined with init=False:

_parsing_start_time: float = attrs.field(init=False)

So this will fail because it hasn't been assigned yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, yes. It should be a hasattr or, more likely, we should update to have a default of None for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments