Skip to content

Fix missing task.queued_duration metric in Airflow 3#67592

Open
myps6415 wants to merge 1 commit into
apache:mainfrom
myps6415:fix-queued-duration-metric-63503
Open

Fix missing task.queued_duration metric in Airflow 3#67592
myps6415 wants to merge 1 commit into
apache:mainfrom
myps6415:fix-queued-duration-metric-63503

Conversation

@myps6415
Copy link
Copy Markdown

Summary

task.queued_duration (and its registry-derived legacy name dag.<dag_id>.<task_id>.queued_duration) stopped firing entirely after the Airflow 3 worker switched to the Task SDK / supervisor / Execution API.

The metric was only emitted by TaskInstance.emit_state_change_metric, which is only reachable from _check_and_change_state_before_execution — the legacy LocalTaskJob path. Airflow 3 workers flip TI state to RUNNING through the ti_run Execution API endpoint instead, which bypasses the emit site.

This is the same regression pattern as #62019 (missing ti.start / ti.finish).

Fix

Emit task.queued_duration from ti_run at the moment it transitions the TI from QUEUED to RUNNING. The skip guards (end_date is None, queued_dttm is not None) mirror the existing logic in emit_state_change_metric so that deferral resumes don't get a misleading second reading. The legacy dotted name is emitted automatically by stats.timing via the metrics_template.yaml registry — no manual second call needed.

Test plan

  • New test_ti_run_emits_queued_duration_metric confirmed to fail before the fix and pass after (verified by stashing the production change and re-running the test).
  • New parametrized test_ti_run_skips_queued_duration_metric covers both skip conditions (end_date set / queued_dttm missing).
  • All 33 existing TestTIRunState tests still pass.
  • ruff format / ruff check / mypy-airflow-core / prek run --from-ref upstream/main --stage pre-commit all green.

closes: #63503


Was generative AI tooling used to co-author this PR?
  • Yes — Claude Code (Opus 4.7)

Generated-by: Claude Code (Opus 4.7) following the guidelines

The metric was only emitted by TaskInstance.emit_state_change_metric,
which was only called from the legacy LocalTaskJob path
(_check_and_change_state_before_execution). Airflow 3 workers run via
the Task SDK and supervisor, which flip TI state to RUNNING through the
ti_run Execution API endpoint instead — that path bypassed the emit
site, so task.queued_duration (and its registry-derived legacy name
dag.<dag_id>.<task_id>.queued_duration) stopped firing entirely.

Emit the metric from the ti_run endpoint at the same moment it flips
state from QUEUED to RUNNING. Skip the emit when end_date is already
set (deferral resume) or queued_dttm is missing, mirroring the existing
guards in emit_state_change_metric.
@boring-cyborg boring-cyborg Bot added area:API Airflow's REST/HTTP API area:task-sdk labels May 27, 2026
@boring-cyborg
Copy link
Copy Markdown

boring-cyborg Bot commented May 27, 2026

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our prek-hooks will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example Dag that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

Copy link
Copy Markdown
Contributor

@henry3260 henry3260 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

# this is a deferral resume or similar, and the timing would be misleading).
# The registry-based legacy name dag.<dag_id>.<task_id>.queued_duration is
# emitted automatically by stats.timing via metrics_template.yaml.
if ti.queued_dttm is not None and ti.end_date is None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we include ti.next_method is None in the guard to avoid emitting this metric again for deferred task resumes? In the Execution API path, deferred tasks are marked by next_method / next_kwargs, and end_date may still be None, so end_date is None alone does not reliably identify the first run. wdyt?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:API Airflow's REST/HTTP API area:task-sdk

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Metric dag.dag_id.task_id.queued_duration missing

2 participants