Skip to content

Conversation

@MarijnS95
Copy link
Member

@MarijnS95 MarijnS95 commented Mar 12, 2025

We see that some Android callbacks like onStart() deadlock, specifically when returning out of the main thread before running any event loop (but likely also whenever terminating the event loop), because they don't check if the thread is still even running and are otherwise guaranteed receive an activity_state update or other state change to unblock themselves.

This is a followup to #94 which only concerned itself with a deadlock caused by a destructor not running because that very object was kept alive to poll on the destroyed field that destructor was supposed to set, but its new thread_state can be reused to disable these condvar waits when the "sending" thread has disappeared.

Separately, that PR mentions Activity recreates because of configuration changes which isn't supported anyway because Activity is still wrongly assumed to be a global singleton (I am still working on fixing all this 🤞!).


Note that I'm not at all going to claim this to be really fixing anything, it's more of a bandaid on top of another bandaid with doubts if it'll stick.

#94 seems to have only considered thread state, which makes sense given that finish() is also unconditionally called when the "main" thread created by android-activity returns, but the mentioned missing drop() handling of WaitableNativeActivityState also seems to very specifically exist because AndroidApp is explicitly and intentionally cheaply Cloneable and Send/Sync. If anything the user could move things to a different thread to handle their looping behaviour there. Liveness of these objects is an indicator that the user may/will still poll elsewhere, or at least has the ability to. That should likely have been the predicate to exit out of various while loops rather than the thread state, but here we are.

Add to that the request to not start a thread for the user in the first place (requires a redesign to signify that users must timely return from their not-really-main()-anymore), so that they can load things from JNIEnv which uses prior Java stackframes to resolve classes that cannot otherwise be found without poking at Context.getClassLoader()...

@MarijnS95 MarijnS95 force-pushed the fix-deadlock-on-terminate branch from c841f66 to 5a82603 Compare March 12, 2025 14:12
@MarijnS95 MarijnS95 requested a review from rib March 13, 2025 13:18
guard.write_cmd(AppCmd::InitWindow);
}
while guard.window != guard.pending_window {
while guard.thread_state == NativeThreadState::Running
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess all of these (maybe with an exception for input events) should probably be checking for != NativeThreadState::Stopped so that callbacks effectively buffer during initialization and allow the app to start handling events or else exit and have this transition to Stopped.

The protocol for notifying the app about a window is supposed to be synchronous and it would be a significant change to have these start being async until the thread is initialized and 'running'.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure about that. Android so far seems to service all callbacks into Activity from the same "thread", i.e. the one that also calls onCreate. After all, before onCreate returns we haven't assigned any callbacks yet to ANativeActivity::callbacks, so it "must" wait.

That function only returns when the state moved on from Init, so I don't think any other function should be waiting for this:

// Don't specifically wait for `Running` just in case `android_main` returns
// immediately and the state is set to `Stopped`
while guard.thread_state == NativeThreadState::Init {
guard = jvm_glue.cond.wait(guard).unwrap();
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah, yep that makes sense.

@rib
Copy link
Member

rib commented Mar 20, 2025

If anything the user could move things to a different thread to handle their looping behaviour there. Liveness of these objects is an indicator that the user may/will still poll elsewhere, or at least has the ability to.

Btw, it's explicitly documented that polling is only allowed on the android_main thread. I had been meaning to add explicit assertions for that, but I guess maybe didn't do that.

Ref:

//! # Send and Sync [AndroidApp]
//!
//! Although an [AndroidApp] implements Send and Sync you do need to take
//! into consideration that some APIs, such as [AndroidApp::poll_events()] are
//! explicitly documented to only be usable from your android_main() thread.

@rib
Copy link
Member

rib commented Mar 20, 2025

It looks like I put a warning that poll_events can panic if used from a non-main thread but it's a hollow threat currently :)

/// # Panics
///
/// This must only be called from your android_main() thread and it may panic if called
/// from another thread.

…unning

We see that some Android callbacks like `onStart()` deadlock,
specifically when returning out of the main thread before running
any event loop (but likely also whenever terminating the event loop),
because they don't check if the thread is still even running and are
otherwise guaranteed receive an `activity_state` update or other state
change to unblock themselves.

This is a followup to [#94] which only concerned itself with a deadlock
caused by a destructor not running because that very object was kept
alive to poll on the `destroyed` field that destructor was supposed to
set, but its new `thread_state` can be reused to disable these condvar
waits when the "sending" thread has disappeared.

Separately, that PR mentions `Activity` recreates because of
configuration changes which isn't supported anyway because `Activity` is
still wrongly assumed to be a global singleton.

[#94]: https://togithub.com/rust-mobile/android-activity/pull/94
@MarijnS95 MarijnS95 force-pushed the fix-deadlock-on-terminate branch from 5a82603 to 9afaca8 Compare December 17, 2025 12:51
@MarijnS95
Copy link
Member Author

Right @rib, I may have written that threading possibility out in my PR/commit description from the perspective of what is technically possible on Android, not what android-activity is currently doing by preparing a looper and attaching specific fds to it, in the specific thread we create.

In that sense it's funny that this API portion of AndroidApp is Send/Sync, perhaps for clarity these handles/structs should be split rather than documenting it?

Anyway, it is very easy to make that panicking threat real: the check for ALooper_forThread() shouldn't check against is_null() (and only in native_activity), but if it is equal to the stored looper which was prepared when the object was first set up in the "main" thread right before calling android_main().

Copy link
Member

@rib rib left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good to me!

@MarijnS95
Copy link
Member Author

Awesome! I'll merge and backport this when back at my own computer :)

@rib rib merged commit 9e8c85c into main Dec 18, 2025
10 checks passed
@rib rib deleted the fix-deadlock-on-terminate branch December 18, 2025 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants