Fix in-process Execution API secrets routing in client contexts#65587
Fix in-process Execution API secrets routing in client contexts#65587henry3260 wants to merge 3 commits into
Conversation
6758f3a to
c8ece7f
Compare
c8ece7f to
6dc8178
Compare
|
Could someone please take a look at this PR when they have a chance? Thanks! |
d68c027 to
73528bb
Compare
ca97e7f to
5fc80b5
Compare
jason810496
left a comment
There was a problem hiding this comment.
Nice! Thanks for the fix.
Additionally, could we add test for the scenario mentioned in the issue (#65482) to make sure the expected behavior?
Thanks.
b21962b to
0a6ed36
Compare
jason810496
left a comment
There was a problem hiding this comment.
Please correct me if my understanding is wrong.
Another direction: How about adding lifespan to ensure SUPERVISOR_COMMS will be initialized for in-process Execution API?
Additionally, it would be nice to add airflow-e2e test for airflow dag test command with common scenarios as follow-up.
I think lifespan is the wrong scope for this. It initializes app-scoped state once at startup, but the issue here is request-scoped |
There was a problem hiding this comment.
How about adding
lifespanto ensureSUPERVISOR_COMMSwill be initialized for in-process Execution API?I think lifespan is the wrong scope for this. It initializes app-scoped state once at startup, but the issue here is request-scoped we need to isolate an in-process Execution API request from the outer client context. Initializing
SUPERVISOR_COMMSin lifespan would make that client marker app-wide and could leak it into server-side request handling wdyt?
For dag.test() command, it will use InProcessTestSupervisor under the hook, which means all the request will be processed in process. I'm not sure will there be race condition for override_process_context (when the _PROCESS_CONTEXT_OVERRIDE.reset teardown).
IMO, it's no harm to set the lifespan for InProcessTestSupervisor only app (override the app instance in _api_client factory method)
I understand your concern! Since |
0a6ed36 to
7bf1b8e
Compare
|
@henry3260 — There are 3 unresolved review threads on this PR from @jason810496. Could you either push a fix or reply in each thread explaining why the feedback doesn't apply? Once you believe the feedback is addressed, mark the thread as resolved so the reviewer isn't re-pinged needlessly. Thanks! Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you. |
8a56c4b to
321df41
Compare
|
This early return fixes the hang by avoiding With this return, Can we avoid the deadlock without suppressing mask propagation for the reinitialized virtualenv channel? For example, skip only when supervisor comms are unavailable/stale, or make the send bounded, but continue sending |
b81948c to
cad3cc5
Compare
|
There are conflicts to resolve |
544a6ec to
2adee89
Compare
There was a problem hiding this comment.
Pull request overview
This PR fixes a hang/recursion risk when the Execution API is run in-process and Task SDK “client context” signals (e.g. SUPERVISOR_COMMS) are present, by introducing a request-scoped server-context override so Variable/Connection secret lookups stay on the server-side path.
Changes:
- Add
airflow.process_contexthelpers (including a request-scoped override viaContextVar) and use them to decide whether core Variable/Connection APIs should route through Task SDK. - Add an in-process Execution API wrapper that forces server context per request (
request_scoped_server_context=True) and wire task-sdk’s in-process client to use it. - Add regression tests covering server-context routing and the virtualenv/supervisor-comms hang scenario.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| task-sdk/tests/task_sdk/execution_time/test_supervisor.py | Adds coverage ensuring the in-process client requests server-scoped context. |
| task-sdk/tests/task_sdk/definitions/test_variables.py | Adds regression tests for virtualenv Variable.get hang avoidance + masking behavior. |
| task-sdk/src/airflow/sdk/log.py | Avoids sending MaskSecret via stale virtualenv comms. |
| task-sdk/src/airflow/sdk/execution_time/supervisor.py | Passes request-scoped server context when creating the in-process Execution API server/client. |
| airflow-core/tests/unit/models/test_connection.py | Adds isolation fixture + test ensuring server context uses core Connection.from_json. |
| airflow-core/tests/unit/api_fastapi/execution_api/versions/head/test_variables.py | Ensures Execution API variable endpoints do not route through Task SDK in server context. |
| airflow-core/tests/unit/api_fastapi/execution_api/versions/head/test_connections.py | Ensures Execution API connection endpoint does not route through Task SDK in server context. |
| airflow-core/src/airflow/process_context.py | Introduces request-scoped process-context overrides and routing decision helper. |
| airflow-core/src/airflow/models/variable.py | Switches Task SDK routing condition to should_use_task_sdk_api_path(). |
| airflow-core/src/airflow/models/connection.py | Switches Task SDK routing condition to should_use_task_sdk_api_path(). |
| airflow-core/src/airflow/api_fastapi/execution_api/app.py | Adds request-scoped server-context ASGI wrapper and exposes it via asgi_app. |
|
@henry3260 — There are 3 unresolved review thread(s) on this PR from @jason810496, @jscheffl. Could you either push a fix or reply in each thread explaining why the feedback doesn't apply? When you believe the feedback is addressed, please mark the threads as resolved and ping the reviewer (@jason810496, @jscheffl) for a final look. Thanks! Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you. |
2adee89 to
b4c9600
Compare
b4c9600 to
348bc84
Compare
|
PTAL @kaxil |
Why
Execution API requests can run inside in-process paths where
SUPERVISOR_COMMSis present, which may incorrectly classify server-side code as client-side Task SDK execution.When that
happens, Variable or Connectionreads can route back into Task SDK paths instead of staying server-side, creating a recursive chain and potential hangs.The previous context signal was primarily environment-based, which is not always safe for request-scoped behavior in threaded in-process execution.
Because of that,
VariableandConnectionoperations triggered while serving in-process Execution APIrequests could still be routed back through Task SDK APIs instead of using the server-side secrets/backend
path, leading to recursive self-calls and hanging lookups.
closes: #65482
Was generative AI tooling used to co-author this PR?
{pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.