[SDK-4725] Preserve user identity across iOS prewarm launches#1666
[SDK-4725] Preserve user identity across iOS prewarm launches#1666nan-li wants to merge 8 commits into
Conversation
|
@claude review |
| } else if !hasCachedUser, | ||
| _user == nil, | ||
| let cachedSubscriptionId = OSResilientStorage.string(forKey: OSResilientStorage.keySubscriptionId), | ||
| !cachedSubscriptionId.isEmpty | ||
| { | ||
| // TODO: In this state, we don't want to persist anything to UserDefaults and overwrite correct data with half data. | ||
| // TODO: However, in this state where we can't read UserDefaults, does writing to it fail automatically? | ||
| // Path 2.5. Model stores loaded empty but the resilient cache shows we've | ||
| // initialized before: the prewarm-before-first-unlock case. Recover identity | ||
| // locally; missing properties/aliases will rehydrate on next fetchUser. | ||
| let cachedOnesignalId = OSResilientStorage.string(forKey: OSResilientStorage.keyOneSignalId) | ||
| OneSignalLog.onesignalLog(.LL_DEBUG, message: "OneSignalUserManager: recovering user from cache subscription_id=\(cachedSubscriptionId) onesignal_id=\(cachedOnesignalId ?? "(nil)")") | ||
|
|
||
| let pushSubscriptionModel = createDefaultPushSubscription(subscriptionId: cachedSubscriptionId) | ||
| let recoveredUser = setNewInternalUser(externalId: nil, pushSubscriptionModel: pushSubscriptionModel) | ||
| if let onesignalId = cachedOnesignalId, !onesignalId.isEmpty { | ||
| recoveredUser.identityModel.hydrate([OS_ONESIGNAL_ID: onesignalId]) | ||
| } |
There was a problem hiding this comment.
🔴 Path 2.5 recovery (OneSignalUserManagerImpl.swift lines 269-286) introduces a data-loss risk: setNewInternalUser() triggers identityModelStore.add()/propertiesModelStore.add()/pushSubscriptionModelStore.add(), each of which writes the freshly-created stub models to shared UserDefaults via saveCodeableData. But Path 2.5 only fires when those same UserDefaults reads returned nil — the prewarm-before-first-unlock case — and per Apple/cfprefsd behavior the answer to the inline TODO ("does writing to it fail automatically?") is unfortunately no: writes are typically buffered in memory and flushed to disk after first unlock, replacing the real cached models with stubs that have only subscription_id/onesignal_id. As a secondary concern at the same site, setNewInternalUser(externalId: nil, ...) makes a previously-logged-in user transiently appear anonymous (user.isAnonymous == true), so an app calling OneSignal.login(savedExternalId) during the window hits the identifyUser path at line 328 instead of being a no-op, firing prepareForNewUser(), notifying OS_ON_USER_WILL_CHANGE observers, and sending an unnecessary Identify request. Consider gating Path 2.5 behind UIApplication.isProtectedDataAvailable (or waiting for UIApplicationProtectedDataDidBecomeAvailable), keeping the recovered models in-memory only until protected data is readable, or skipping the store.add() writes in the recovery path.
Extended reasoning...
What the bug is
Path 2.5 (OneSignalUserManagerImpl.swift:269-286) is reached precisely when shared-UserDefaults reads for the cached identity/properties/push-subscription models returned empty — the prewarm-before-first-unlock case under NSFileProtectionCompleteUntilFirstUserAuthentication. In that state the SDK builds a stub user via setNewInternalUser(externalId: nil, pushSubscriptionModel: ...) and hydrates only OS_ONESIGNAL_ID. Two separable problems follow.
Problem 1 — Stub writes can clobber the real cached models on disk
setNewInternalUser invokes identityModelStore.add(), propertiesModelStore.add(), and pushSubscriptionModelStore.add(). Each OSModelStore.add() calls OneSignalUserDefaults.initShared().saveCodeableData(forKey: storeKey, withValue: self.models), which in OneSignalUserDefaults.m performs setObject:[NSKeyedArchiver archivedDataWithRootObject:value] forKey:key followed by synchronize. So Path 2.5 unconditionally writes freshly-created stub models (only subscription_id + onesignal_id populated; fresh modelIds; nil external_id, aliases, language, tags, timezone, token, deviceModel, appVersion, notificationTypes) to the shared App Group UserDefaults plist.
The author flagged this exact concern inline:
// TODO: In this state, we don't want to persist anything to UserDefaults and overwrite correct data with half data.
// TODO: However, in this state where we can't read UserDefaults, does writing to it fail automatically?The answer to the second TODO is no. Under NSFileProtectionCompleteUntilFirstUserAuthentication, NSUserDefaults / cfprefsd buffers writes in memory while the file is locked and flushes them to disk after first unlock. The write is not silently dropped — it just gets deferred. When the device unlocks, the stub overwrites the real on-disk plist.
Problem 2 — Recovered user looks anonymous to login()
setNewInternalUser is called with externalId: nil because external_id is intentionally not mirrored to OSResilientStorage (PII; see comment in OSIdentityModel.swift). So during the recovery window:
OSUserInternalImpl.isAnonymous(identityModel.externalId == nil) returnstrue.login()at line 328 branches onuser.isAnonymous: an app callingOneSignal.login(savedExternalId)(a very common per-launch pattern) takes theidentifyUserpath instead of being a no-op.identifyUsercallsprepareForNewUser()(postsOS_ON_USER_WILL_CHANGE, clears the subscription model store) and dispatches anOSRequestIdentifyUsernetwork call.fireUserStateChanged(viahydrate([OS_ONESIGNAL_ID: onesignalId])) notifiesOSUserStateObservers withnewExternalId=nil, even though the original user had one.
Step-by-step proof
- Launch 0 (normal): User logged in with externalId "alice". Cached: identity model with aliases={onesignal_id: "abc", external_id: "alice"}; properties with language="en", tags={tier: "gold"}; push sub with token="xyz...", subscription_id="sub-123". OSResilientStorage holds {subscription_id: "sub-123", onesignal_id: "abc"}.
- Launch 1 (prewarm before first unlock): App is prewarmed by iOS. Shared UserDefaults plist is unreadable →
identityModelStore.getModels()[OS_IDENTITY_MODEL_KEY]returns nil →hasCachedUser=false. OSResilientStorage (file-protection.none) IS readable →cachedSubscriptionId="sub-123"→ Path 2.5 entered. - Path 2.5 executes
setNewInternalUser(externalId: nil, pushSubscriptionModel: createDefaultPushSubscription(subscriptionId: "sub-123")). This callsidentityModelStore.add(...),propertiesModelStore.add(...),pushSubscriptionModelStore.add(...). Each invokesOneSignalUserDefaults.initShared().saveCodeableData. These writes are buffered bycfprefsdin memory (file still locked). - App invokes
OneSignal.login("alice")from its launch routine._user.isAnonymous == true(externalId is nil) → branches intoidentifyUser→prepareForNewUser()postsOS_ON_USER_WILL_CHANGEand clearssubscriptionModelStore→ anOSRequestIdentifyUseris sent for a user that already has that external_id (should have been a no-op). - First unlock.
cfprefsdflushes buffered writes to disk. The on-disk identity/properties/pushSubscription plists are now stub models. external_id="alice", language="en", tags={tier: "gold"}, token="xyz..." are gone from disk. Onlysubscription_idandonesignal_idsurvive (in OSResilientStorage). - Launch 2. Shared UserDefaults is readable; the SDK loads the stub models from cache. External_id of "alice" is no longer available locally. It will rehydrate eventually via
fetchUser, but tags/language/timezone deltas that were queued but not yet sent are permanently lost — and any in-flight executor request keyed on the originalmodelIdis now disconnected from the new model.
Addressing the bug_004 refutation
One verifier argued bug_004 is a known design constraint (external_id intentionally not cached for PII) and largely overlaps with bug_001. That's correct that the two share the same code site and the fix likely addresses both — which is why this is filed as a single merged comment. But the two angles aren't redundant: bug_001 is the on-disk overwrite risk; bug_004 is the in-memory misclassification that misfires regardless of whether writes are buffered or dropped. The author's TODOs flag the write concern as unresolved (not as accepted), and isAnonymous returning the wrong answer for a previously-identified user is a real behavioral regression independent of the on-disk question. Mentioning both in one comment lets the author choose a single fix that covers both (e.g., gating Path 2.5 behind isProtectedDataAvailable).
How to fix
Pick one (or combine):
- Gate Path 2.5 on
UIApplication.shared.isProtectedDataAvailable. If protected data is unavailable, do not enter the recovery branch; subscribe toUIApplicationProtectedDataDidBecomeAvailableand re-runstart()then. This eliminates both the buffered-write risk and the anonymous-misclassification window because the SDK has no "recovered" user until UserDefaults is genuinely readable. - Keep recovered models in-memory only. Build the stub user but do NOT call
identityModelStore.add()/propertiesModelStore.add()/pushSubscriptionModelStore.add()until first unlock has fired. Persist on transition to readable state, after re-reading the real cache to merge. - Skip Path 2.5 entirely until protected data is available and stay in the no-user state for the prewarm process — most NSE/prewarm work doesn't need the user object at all, only
subscription_idandapp_id(already covered separately by the NSE-side OSResilientStorage fallback in this PR).
| // Drop the cached identifiers: a real app-id change invalidates them. | ||
| [OSResilientStorage setStrings:@{ | ||
| OSResilientStorage.keySubscriptionId: @"", | ||
| OSResilientStorage.keyOneSignalId: @"", | ||
| }]; | ||
|
|
||
| // Clear all cached data, does not start User Module nor call logout. |
There was a problem hiding this comment.
🟡 In handleAppIdChange, the setStrings: call only clears keySubscriptionId and keyOneSignalId from OSResilientStorage, but keyReceiveReceiptsEnabled is left stale. Since receive_receipts_enabled is a per-app remote parameter (fetched by downloadIOSParamsWithAppId), the previous app's cached value persists until the new app's iOS params download completes. Fix: add OSResilientStorage.keyReceiveReceiptsEnabled: @"" to the existing setStrings dictionary.
Extended reasoning...
What the bug is
This PR introduces a mirror of the receive_receipts_enabled flag in OSResilientStorage (in downloadIOSParamsWithAppId) and adds a fallback read in OneSignalReceiveReceiptsController.isReceiveReceiptsEnabled so the NSE can read it while the device is locked. However, handleAppIdChange in OneSignal.m (lines 562-566) only invalidates a subset of cached keys on app id change:
[OSResilientStorage setStrings:@{
OSResilientStorage.keySubscriptionId: @"",
OSResilientStorage.keyOneSignalId: @"",
}];keyReceiveReceiptsEnabled is not included, so the previous app's value sticks around.
Why existing code doesn't prevent it
receive_receipts_enabled is per-app — downloadIOSParamsWithAppId fetches it for the current appId and writes it to both shared UserDefaults and OSResilientStorage. It's only overwritten when the new app's iOS params download completes. Between app-id change and that download, the old value is what isReceiveReceiptsEnabled will see via the new fallback path.
Step-by-step proof
- Developer/tester runs the app with App A (
receive_receipts_enabled=YES).downloadIOSParamsWithAppIdwrites"1"toOSResilientStorage.keyReceiveReceiptsEnabled. - Developer switches the SDK to App B (
receive_receipts_enabled=NO). - App launches →
handleAppIdChangeruns. It clearskeySubscriptionIdandkeyOneSignalIdfromOSResilientStoragebut leaveskeyReceiveReceiptsEnabled = "1".appIdis updated to App B. - Before the new app's iOS params download completes, a notification arrives and the NSE is invoked while the device is in the prewarm/locked state (shared
UserDefaultsreturnsNO/nil). OneSignalReceiveReceiptsController.isReceiveReceiptsEnabledreadsNOfromUserDefaults, then falls back toOSResilientStorage.string(forKey: keyReceiveReceiptsEnabled)which returns"1"→ returnsYES.- A confirmed delivery receipt is sent for App B even though App B's configuration disables it.
Pre-PR, the equivalent flow would have returned the safer NO default during the locked window, so this PR slightly worsens the staleness exposure for this specific flag.
Impact
Low. The trigger requires (a) an app-id change — overwhelmingly a developer/test scenario — AND (b) an NSE invocation in the booted-but-locked window AND (c) before the new app's iOS params download completes. The blast radius is a stray confirmed-delivery receipt (or a suppressed one in the opposite direction).
Fix
Trivial — include the key with an empty string in the existing setStrings dict in handleAppIdChange:
[OSResilientStorage setStrings:@{
OSResilientStorage.keySubscriptionId: @"",
OSResilientStorage.keyOneSignalId: @"",
OSResilientStorage.keyReceiveReceiptsEnabled: @"",
}];Note: a symmetric staleness exists in the shared UserDefaults copy (OSUD_RECEIVE_RECEIPTS_ENABLED) which this PR also doesn't clear, so the underlying inconsistency is partially pre-existing — but the PR adds a new code path (the resilient-storage fallback) that makes the cache value load-bearing during the lockup window, so it's worth tightening here.
READ AND DELETE THIS SECTION BEFORE SUBMITTING PR
Description
One Line Summary
REQUIRED - Very short description that summaries the changes in this PR.
Details
Motivation
REQUIRED - Why is this code change being made? Or what is the goal of this PR? Examples: Fixes a specific bug, provides additional logging to debug future issues, feature to allow X.
Scope
RECOMMEND - OPTIONAL - What is intended to be effected. What is known not to change. Example: Notifications are grouped when parameter X is set, not enabled by default.
OPTIONAL - Other
OPTIONAL - Feel free to add any other sections or sub-sections that can explain your PR better.
Testing
Unit testing
OPTIONAL - Explain unit tests added, if not clear in the code.
Manual testing
RECOMMEND - OPTIONAL - Explain what scenarios were tested and the environment.
Example: Tested opening a notification while the app was foregrounded, app build with Android Studio 2020.3 with a fresh install of the OneSignal example app on a Pixel 6 with Android 12.
Affected code checklist
Checklist
Overview
Testing
Final pass