Skip to content

Remote logging fix#68370

Open
m8719-github wants to merge 1 commit into
apache:mainfrom
m8719-github:remote-logging-fix
Open

Remote logging fix#68370
m8719-github wants to merge 1 commit into
apache:mainfrom
m8719-github:remote-logging-fix

Conversation

@m8719-github

@m8719-github m8719-github commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

closes: #68366
related discussion on Slack: https://apache-airflow.slack.com/archives/CCQ7EGB1P/p1780575542424439

Do not cache None connection values at the start of the worker lifecycle so that the worker is able to upload task logs upon task completion. Please see linked issue for full details.

Tested via monkey patching on a test environment, logs showed up in the UI and in S3 with this change.


Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

Signed-off-by: AndreiLeib <andrei.leibovski@appdirect.com>

@jason810496 jason810496 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! The patch make sense to me overall.
Would appreciate to add corresponding unit test when you have a moment, thanks.

Comment on lines 1161 to 1162
if conn_id in _REMOTE_LOGGING_CONN_CACHE:
return _REMOTE_LOGGING_CONN_CACHE[conn_id]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to change the cache read side instead of writing side.

Suggested change
if (cached_conn_id := _REMOTE_LOGGING_CONN_CACHE.get(conn_id, None)) is not None:
return cached_conn_id

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the value in storing None in the dict at all?

@ashb ashb marked this pull request as ready for review June 11, 2026 11:02
@ashb ashb requested review from amoghrajesh and kaxil as code owners June 11, 2026 11:02
@ashb ashb added the type:bug-fix Changelog: Bug Fixes label Jun 11, 2026
@ashb ashb added this to the Airflow 3.3.0 milestone Jun 11, 2026
@ashb

ashb commented Jun 11, 2026

Copy link
Copy Markdown
Member

However the entire pre-fetch might be moot now. This is caching to a single supervisor process which shuts down when the task finishes. There's not much point in trying to pre-cache as it will always fail now?

However this leads to an interesting problem -- some loggers such as GCP's StackDriver or AWS's Cloudwatch Logs want to upload logs as they come in, not in a batch at the end, and for those to work we need to know that we weren't able to get logs and try again once the task has started running and we can access connections.

(Which isn't to say we shouldn't have a tactical fix like this to help most loggers, but the main issue is not resolved until we fix it for all)

@m8719-github

Copy link
Copy Markdown
Contributor Author

However the entire pre-fetch might be moot now. This is caching to a single supervisor process which shuts down when the task finishes. There's not much point in trying to pre-cache as it will always fail now?

However this leads to an interesting problem -- some loggers such as GCP's StackDriver or AWS's Cloudwatch Logs want to upload logs as they come in, not in a batch at the end, and for those to work we need to know that we weren't able to get logs and try again once the task has started running and we can access connections.

(Which isn't to say we shouldn't have a tactical fix like this to help most loggers, but the main issue is not resolved until we fix it for all)

Yes, this occurred to me as well but I wanted to be as tactical with the fix as possible. I'll take a closer look at GCP StackDriver and AWS Cloudwatch remote logging sub-systems in a separate branch and see what can be done there to fix this issue.

@m8719-github

Copy link
Copy Markdown
Contributor Author

Nice! The patch make sense to me overall. Would appreciate to add corresponding unit test when you have a moment, thanks.

yep, will add one before marking ready for review; should be done in a day or two.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:task-sdk type:bug-fix Changelog: Bug Fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remote logging not working in Airflow 3.2.2

4 participants