Add async connection testing via workers for security isolation#62343
Add async connection testing via workers for security isolation#62343anishgirianish wants to merge 3 commits intoapache:mainfrom
Conversation
39ba192 to
3efcd26
Compare
airflow-core/src/airflow/api_fastapi/core_api/routes/public/connections.py
Outdated
Show resolved
Hide resolved
airflow-core/src/airflow/api_fastapi/core_api/routes/public/connections.py
Show resolved
Hide resolved
airflow-core/src/airflow/api_fastapi/execution_api/routes/connection_tests.py
Show resolved
Hide resolved
|
@jason810496 Thanks for the thorough review! Addressed your feedback in the latest push:
Could you please take another look when you get a chance? Thanks! |
33392ec to
59d2c88
Compare
o-nikolas
left a comment
There was a problem hiding this comment.
The scheduler/executor code is looking better, thanks for making those changes so quickly. I removed my request for changes 🙂
4997d07 to
c093320
Compare
|
Hi, @kaxil @pierrejeambrun, I would like to request your review whenever you get a chance. Thank you so much. |
bd68d4c to
7d0ea9b
Compare
|
@anishgirianish This PR has been converted to draft because it does not yet meet our Pull Request quality criteria. Issues found:
What to do next:
Converting a PR to draft is not a rejection — it is an invitation to bring the PR up to the project's standards so that maintainer review time is spent productively. If you have questions, feel free to ask on the Airflow Slack. |
7d0ea9b to
e8185f7
Compare
7e93e3a to
911cfb1
Compare
There was a problem hiding this comment.
First thank you for this big effort 🎉 , that looks amazing and I'm sure it will be really helpful to have this new feature in.
I find the current implementation hard to follow, mostly because there is a number of hidden interaction between the 'Connection' and 'ConnectionTest` entity. And also this forces multiple 'models' on the API layer and different endpoint for 'testing' before creation, or testing before 'edit' which I think we can avoid.
Instead of having a test_connection table that act as a buffer to save and restore values from the main table which are both patched through a single 'test' call and then copied back and forth old states, new states etc...
Why don't we instead rely on the test_connection table only to store all of the 'test' requests. Nothing is happening on the real 'connections' until a test connection is actually successful. (so basically the patch_test endpoint shoudn't exist)
Users can 'test' a connection, and they can specify. 'commit on success' for the 'test request' and this will override the real Connection resource if the test is successful. You can only store 1 connection test request at a time otherwise -> 409 for the same conn_id.
Worker reads the 'test request' from the test_connection table and not the real connection table. Also test_connection table could be test_connection_request similarly to dag_priority_parsing_request we already have.
I believe this has the advantage of keeping concerns separate. test connection and connection are two separate entities and avoiding multiple complex interaction sounds preferable to me. (until the very end of the workflow where it can eventually commit the successful test connection nothing happens on the real Connection table). And it is somewhat consistent with parsing request.
In the UI this could be a 'test connection before creating the resource'. (Where we instead call the test connection endpoint instead of the 'POST connectionone withcommit_on_success=True`, similarly for edits etc.)
edit: secret backend might be an issue I'm not sure here.
airflow-core/src/airflow/api_fastapi/core_api/openapi/v2-rest-api-generated.yaml
Outdated
Show resolved
Hide resolved
airflow-core/src/airflow/api_fastapi/core_api/routes/public/connections.py
Outdated
Show resolved
Hide resolved
airflow-core/src/airflow/api_fastapi/core_api/routes/public/connections.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
On Pierre's architectural feedback (March 13): I agree with the direction he proposed. The current PATCH-then-revert design introduces several edge cases (snapshot security, concurrent edit races, revert reliability) that would be eliminated by using connection_test as a pure request buffer — workers read connection details at test time rather than relying on the real connection table being pre-modified. That simpler model would remove the snapshot/revert machinery entirely.
airflow-core/src/airflow/api_fastapi/core_api/routes/public/connections.py
Outdated
Show resolved
Hide resolved
airflow-core/src/airflow/migrations/versions/0112_3_3_0_add_connection_test_table.py
Show resolved
Hide resolved
airflow-core/src/airflow/api_fastapi/core_api/routes/public/connections.py
Show resolved
Hide resolved
|
@anishgirianish This PR has a few issues that need to be addressed before it can be reviewed — please see our Pull Request quality criteria. Issues found:
What to do next:
There is no rush — take your time and work at your own pace. We appreciate your contribution and are happy to wait for updates. If you have questions, feel free to ask on the Airflow Slack. |
There was a problem hiding this comment.
Pull request overview
This PR introduces asynchronous connection testing that runs on executors/workers (instead of the API server) by adding a dedicated “TestConnection” workload type, scheduler dispatch/reaping logic, and new API surfaces for queueing/polling and worker result reporting.
Changes:
- Add
ConnectionTestRequestpersistence model + migration, plus scheduler dispatch/reaper logic to route pending tests to executors that support them. - Add worker-side execution path (LocalExecutor) and Execution API endpoints for workers to fetch connection details and report results.
- Add Core API endpoints for queueing/polling async tests, plus generated SDK/CTL/UI OpenAPI types and comprehensive unit tests.
Reviewed changes
Copilot reviewed 40 out of 41 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
uv.lock |
Lockfile updates for dependencies/extras. |
task-sdk/src/airflow/sdk/api/datamodels/_generated.py |
SDK datamodels for connection-test API. |
task-sdk/src/airflow/sdk/api/client.py |
SDK client ops for connection tests. |
devel-common/src/tests_common/test_utils/db.py |
Test DB cleanup for connection tests. |
airflow-ctl/src/airflowctl/api/datamodels/generated.py |
CLI datamodels for async test endpoints. |
airflow-core/tests/unit/models/test_connection_test.py |
Unit tests for connection test model/helpers. |
airflow-core/tests/unit/jobs/test_scheduler_job.py |
Scheduler dispatch/reaper unit tests. |
airflow-core/tests/unit/executors/test_local_executor.py |
LocalExecutor connection-test execution tests. |
airflow-core/tests/unit/executors/test_base_executor.py |
BaseExecutor workload acceptance tests. |
airflow-core/tests/unit/api_fastapi/execution_api/versions/v2026_04_06/test_connection_tests.py |
Execution API versioning coverage for new endpoints. |
airflow-core/tests/unit/api_fastapi/execution_api/versions/head/test_connection_tests.py |
Execution API tests for GET/PATCH endpoints. |
airflow-core/tests/unit/api_fastapi/core_api/routes/public/test_connections.py |
Core API tests for async queue/poll + edit/delete blocking. |
airflow-core/src/airflow/utils/db.py |
Revision-head mapping update. |
airflow-core/src/airflow/utils/db_cleanup.py |
Add cleanup config for connection test table. |
airflow-core/src/airflow/ui/openapi-gen/requests/types.gen.ts |
UI OpenAPI request/response TS types. |
airflow-core/src/airflow/ui/openapi-gen/requests/services.gen.ts |
UI OpenAPI service wrappers for new endpoints. |
airflow-core/src/airflow/ui/openapi-gen/requests/schemas.gen.ts |
UI OpenAPI schemas for async test endpoints. |
airflow-core/src/airflow/ui/openapi-gen/queries/suspense.ts |
React-query suspense hook for polling endpoint. |
airflow-core/src/airflow/ui/openapi-gen/queries/queries.ts |
React-query hooks for async test endpoints. |
airflow-core/src/airflow/ui/openapi-gen/queries/prefetch.ts |
Prefetch helper for polling endpoint. |
airflow-core/src/airflow/ui/openapi-gen/queries/ensureQueryData.ts |
ensureQueryData helper for polling endpoint. |
airflow-core/src/airflow/ui/openapi-gen/queries/common.ts |
Query key/types for new endpoints. |
airflow-core/src/airflow/models/connection_test.py |
New ORM model + state helpers + worker test runner helper. |
airflow-core/src/airflow/models/__init__.py |
Ensure model import for metadata registration. |
airflow-core/src/airflow/migrations/versions/0111_3_2_0_add_connection_test_table.py |
Alembic migration creating connection test table. |
airflow-core/src/airflow/jobs/scheduler_job_runner.py |
Scheduler enqueue + stale reaper for connection tests. |
airflow-core/src/airflow/executors/workloads/types.py |
Add ConnectionTestRequest as scheduler workload type. |
airflow-core/src/airflow/executors/workloads/connection_test.py |
New TestConnection workload schema. |
airflow-core/src/airflow/executors/workloads/__init__.py |
Export/include TestConnection workload. |
airflow-core/src/airflow/executors/local_executor.py |
Worker-side execution implementation with timeout + reporting. |
airflow-core/src/airflow/executors/base_executor.py |
Add queueing/slot accounting + trigger for connection tests. |
airflow-core/src/airflow/config_templates/config.yml |
New config knobs for timeout/concurrency/reaper interval. |
airflow-core/src/airflow/api_fastapi/execution_api/versions/v2026_04_06.py |
Cadwyn version change for new execution endpoints. |
airflow-core/src/airflow/api_fastapi/execution_api/versions/__init__.py |
Register new version change. |
airflow-core/src/airflow/api_fastapi/execution_api/routes/connection_tests.py |
New Execution API routes for workers (GET connection, PATCH result). |
airflow-core/src/airflow/api_fastapi/execution_api/routes/__init__.py |
Include connection test router. |
airflow-core/src/airflow/api_fastapi/execution_api/datamodels/connection_test.py |
Execution API pydantic models for new endpoints. |
airflow-core/src/airflow/api_fastapi/core_api/routes/public/connections.py |
Add async queue/poll endpoints + block edits during active test. |
airflow-core/src/airflow/api_fastapi/core_api/openapi/v2-rest-api-generated.yaml |
Generated OpenAPI spec updates for new endpoints. |
airflow-core/src/airflow/api_fastapi/core_api/datamodels/connections.py |
Core API datamodels for async test request/response. |
airflow-core/docs/migrations-ref.rst |
Auto-updated migration reference table. |
Comments suppressed due to low confidence (1)
airflow-core/src/airflow/api_fastapi/core_api/routes/public/connections.py:284
- PR description says connection testing is moved off the API server onto workers for isolation, but the existing synchronous endpoint (
POST /connections/test) still executesconn.test_connection()in-process on the API server. If the intent is to fully avoid executing provider/driver code on the API server, this endpoint should either be reimplemented to enqueue a worker test as well, or explicitly documented/deprecated as unsafe even when enabled.
@connections_router.post("/test", dependencies=[Depends(requires_access_connection(method="POST"))])
def test_connection(test_body: ConnectionBody) -> ConnectionTestResponse:
"""
Test an API connection.
This method first creates an in-memory transient conn_id & exports that to an env var,
as some hook classes tries to find out the `conn` from their __init__ method & errors out if not found.
It also deletes the conn id env connection after the test.
"""
_ensure_test_connection_enabled()
transient_conn_id = get_random_string()
conn_env_var = f"{CONN_ENV_PREFIX}{transient_conn_id.upper()}"
try:
# Try to get existing connection and merge with provided values
try:
existing_conn = Connection.get_connection_from_secrets(test_body.connection_id)
existing_conn.conn_id = transient_conn_id
update_orm_from_pydantic(existing_conn, test_body)
conn = existing_conn
except AirflowNotFoundException:
data = test_body.model_dump(by_alias=True)
data["conn_id"] = transient_conn_id
conn = Connection(**data)
os.environ[conn_env_var] = conn.get_uri()
test_status, test_message = conn.test_connection()
return ConnectionTestResponse.model_validate({"status": test_status, "message": test_message})
airflow-core/src/airflow/migrations/versions/0111_3_2_0_add_connection_test_table.py
Outdated
Show resolved
Hide resolved
airflow-core/src/airflow/api_fastapi/core_api/routes/public/connections.py
Outdated
Show resolved
Hide resolved
airflow-core/src/airflow/api_fastapi/core_api/routes/public/connections.py
Outdated
Show resolved
Hide resolved
pierrejeambrun
left a comment
There was a problem hiding this comment.
As @o-nikolas we might still address the executor code drift part.
Was generative AI tooling used to co-author this PR?
Summary
Follows the direction proposed by @potiuk in #59643 to move connection testing off the API server and onto workers.
Connection testing has been disabled by default since Airflow 2.7.0 because executing user-supplied driver code (ODBC/JDBC) on the API server poses security risks, and workers typically have network access to external systems that API servers don't.
This moves the whole thing onto workers. A dedicated
TestConnectionworkload goes through the scheduler, gets dispatched to a supporting executor, and the worker runs test_connection()` with a proper timeout. Results come back through the Execution API. Design was discussed on dev@ : "[DISCUSS] Move connection testing to workers" (Feb 2026).Demo
breeze-e2e-rundown-compressed.mp4
Overview
ExecuteCallback, so connection tests never compete with correctness-critical callbacksmax_connection_test_concurrency(default 4). A reaper catches stuck tests after timeout + grace periodsignal.alarmenforcement in LocalExecutor, results reported back via Execution APIqueuefield on the API, wired through to scheduler dispatch.supports_connection_testflag on BaseExecutor, immediate FAILED if no executor supports itConfig
[scheduler] connection_test_timeout: worker timeout, default 60s[scheduler] max_connection_test_concurrency: dispatch budget, default 4[scheduler] connection_test_reaper_interval: reaper frequency, default 30sNot in this PR
References
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.