wip#1668
Conversation
|
@claude review |
| BOOL enabled = [OneSignalUserDefaults.initShared getSavedBoolForKey:OSUD_RECEIVE_RECEIPTS_ENABLED defaultValue:NO]; | ||
| if (enabled) { | ||
| return YES; | ||
| } | ||
| // UserDefaults can return NO for two reasons: the flag is genuinely off, or the shared | ||
| // UserDefaults file isn't readable right now (NSE running while device is locked under | ||
| // NSFileProtectionCompleteUntilFirstUserAuthentication). Fall back to the unencrypted cache. | ||
| NSString *cached = [OSResilientStorage stringForKey:OSResilientStorage.keyReceiveReceiptsEnabled]; | ||
| return [cached isEqualToString:@"1"]; |
There was a problem hiding this comment.
🟡 🟡 Minor consistency window in isReceiveReceiptsEnabled (OneSignalReceiveReceiptsController.m:38-46): when the server flips IOS_RECEIVE_RECEIPTS_ENABLE from YES→NO in downloadIOSParamsWithAppId, the UserDefaults write is synchronous but OSResilientStorage.setString is queue.async — between those two writes an NSE reading the flag sees UD=NO, falls back to the cache, reads stale "1", and returns YES. The window is microseconds-to-milliseconds and the worst-case impact is one extra receive_receipt request the server is free to ignore. Easy mitigation if you want it tight: write OSResilientStorage before UserDefaults at OneSignal.m:631-637 so the race direction is safe, or make this specific setString synchronous.
Extended reasoning...
The race. The new fallback in isReceiveReceiptsEnabled treats a UD value of NO as ambiguous and consults OSResilientStorage as a tiebreaker — this is the right call for the locked-NSE case where cfprefsd returns nil under NSFileProtectionCompleteUntilFirstUserAuthentication. But UD can also legitimately be NO because the server just disabled the flag. The two storage layers are not updated atomically:
// OneSignal.m:631-637 (server response handler)
[OneSignalUserDefaults.initShared saveBoolForKey:OSUD_RECEIVE_RECEIPTS_ENABLED withValue:enabled]; // synchronous
[OSResilientStorage setString:enabled ? @"1" : @"0" forKey:...]; // queue.asyncOneSignalUserDefaults.saveBoolForKey is -setBool:forKey: + synchronize — synchronous from the caller's view, and cross-process visible after cfprefsd propagation. OSResilientStorage.setString (OSResilientStorage.swift:132-142) hops to a serial queue, then does a Data(contentsOf:) → mutate → atomic data.write cycle.
Step-by-step proof of a stale YES.
- Initial state on disk: UD says
OSUD_RECEIVE_RECEIPTS_ENABLED = YES, file saysreceive_receipts_enabled = "1". NSE returns YES correctly. - Server flips the flag to disabled. Main app receives
downloadIOSParamsWithAppIdresponse. - T₀: Main app calls
saveBoolForKey:withValue:NO. UD now says NO; cfprefsd propagates this to the NSE within microseconds. - T₀ + ε: Main app calls
[OSResilientStorage setString:@"0" ...]. The work is enqueued on the serial queue but not yet executed — the file on disk still says"1". - T₀ + ε + δ: An NSE wakes for a push delivery (this is the realistic timing — NSEs fire on push arrival, not on a schedule).
- NSE's
isReceiveReceiptsEnabledreads UD → getsNO(enabled = NO, falls through theif (enabled) return YESearly-out). - NSE reads
OSResilientStorage.stringForKey:keyReceiveReceiptsEnabled→ file still says"1"→ returns YES. - NSE proceeds to send
report_received, which the server has just disabled.
Why existing code doesn't prevent it. The whole point of the fallback added at OneSignalReceiveReceiptsController.m:38-46 is to not trust UD's NO. So the fallback unconditionally consults the file, with no way to distinguish "UD was unreadable" from "UD was explicitly set to NO." OSResilientStorage.setString's queue.async ensures the file write isn't synchronous with the UD write that just preceded it — there is no barrier between the two stores.
Why this is a nit, not a blocker. The refutation captures it well and I agree:
- The window is bounded by one serial dispatch + one file write, roughly microseconds to a few ms. Cross-process cfprefsd propagation widens it slightly but it's still a narrow window.
- It requires the rare YES→NO server flip and an NSE waking inside that window.
- The impact is at most one extra
report_receivedrequest per occurrence — server-idempotent, no data corruption, no user-visible effect, no security implication. The server is the authoritative source and will reject the receipt if it cares. - The PR fixes the dominant locked-NSE false-negative case, which is frequent and deterministic; this is a small, asymmetric trade-off in the opposite direction.
Cheap mitigation if you want a clean story. Reverse the write order in downloadIOSParamsWithAppId so the cache is updated before UD — then the race direction becomes "UD still says YES, file already says 0" → the early-return at line 40 still hits and the fallback isn't consulted, so the stale read can't manifest. Or, only for this one key, dispatch the OSResilientStorage write via queue.sync. Either is a one-line change and neither affects the locked-NSE fix.
Not a merge blocker.
|
@claude review |
| if let subId = pushSubscription.subscriptionId, !subId.isEmpty { | ||
| OSResilientStorage.setString(subId, forKey: OSResilientStorage.keySubscriptionId) | ||
| } |
There was a problem hiding this comment.
🟡 Path 1's OSResilientStorage backfill at OneSignalUserManagerImpl.swift:243-245 is one-way — it writes the mirror only when the cached pushSubscription.subscriptionId is non-empty, but never clears it when nil. If a 'missing user' server response resets subscriptionId to nil and the process is force-killed within the OSSubscriptionModel.didSet queue.async clear window (OSSubscriptionModel.swift:141), the on-disk mirror retains the stale id; next launch's Path 1 skips backfill and an NSE wake during prewarm reads the stale subscription_id via OneSignalIdentifiers.subscriptionId. Nit — narrow trigger window, self-heals on the next subscriptionId update, but the symmetric fix is one line: add an else branch that clears the mirror when the cached value is nil/empty so Path 1 is bidirectional.
Extended reasoning...
What goes wrong
Path 1 of OneSignalUserManagerImpl.start() (lines 243-245 in this PR) populates the OSResilientStorage.keySubscriptionId mirror whenever it loads a cached user, but only when the cached id is present:
if let subId = pushSubscription.subscriptionId, !subId.isEmpty {
OSResilientStorage.setString(subId, forKey: OSResilientStorage.keySubscriptionId)
}The asymmetry matters because the only other writer of the mirror — OSSubscriptionModel.subscriptionId.didSet at OSSubscriptionModel.swift:141 — uses OSResilientStorage.setString(subscriptionId ?? "", forKey: ...) which dispatches via queue.async (OSResilientStorage.swift:136). The UD save on the preceding line is synchronous, but the mirror clear is enqueued, not executed.
The race
When a server response (OSUserExecutor lines 336/421/472/489, OSIdentityOperationExecutor:235, OSPropertyOperationExecutor:290, OSSubscriptionOperationExecutor:337) signals 'missing user' and resets subscriptionId to nil:
didSetfires: UD write happens synchronously (saveString→synchronize), soOSUD_PUSH_SUBSCRIPTION_IDis cleared in cfprefsd.OSResilientStorage.setString("", forKey: keySubscriptionId)is dispatched to the serialcom.onesignal.resilient-storagequeue but not yet drained.- Process is force-killed (SIGKILL, OOM, debugger detach) within the microsecond-to-low-ms drain window. The mirror clear is lost.
- The push subscription model itself was persisted synchronously through the model store's
saveCodeableData(viaset(property:)→onModelUpdated), so the next launch's cachedpushSubscription.subscriptionIdis legitimately nil.
Why Path 1 misses the reconciliation opportunity
On the next launch, Path 1 takes (cached user exists), but at lines 243-245 the if let subId = ..., !subId.isEmpty guard fails for nil/empty — so the backfill is a no-op. The stale id sitting in the OSResilientStorage file is never overwritten. hasCalledStart is set; subsequent start() calls early-return.
Why an NSE in this state misroutes
OneSignalIdentifiers.subscriptionId (OneSignalIdentifiers.swift:65-75) reads shared UD first, then falls back to OSResilientStorage. During prewarm-before-first-unlock, the UD read returns nil because the file is locked under NSFileProtectionCompleteUntilFirstUserAuthentication. Falls through to OSResilientStorage.string(forKey: keySubscriptionId) → returns the stale X. The NSE's OneSignalReceiveReceiptsController sends report_received with the wrong subscription_id.
Step-by-step proof
- Session 1: server issues
subscription_id = X. UD has X, OSResilientStorage file has X. - App makes a request; server returns 'missing user'.
OSUserExecutorresetspushSubscriptionModel.subscriptionId = nil. subscriptionId.didSetfires (line 127 guardsubscriptionId != oldValuepasses since X → nil):- Line 140:
saveString(forKey: OSUD_PUSH_SUBSCRIPTION_ID, withValue: nil)— synchronous, UD cleared. - Line 141:
OSResilientStorage.setString("", forKey: keySubscriptionId)— enqueued on serial queue, file still says X.
- Line 140:
- Process is killed before the queue drains (e.g. user force-quits, OOM, OS terminates background slice). File on disk retains X.
- Some time later, an NSE wakes for a push during prewarm-before-first-unlock.
OneSignalIdentifiers.subscriptionId→ UD locked → nil → OSResilientStorage fallback → returns stale X. - NSE sends
report_receivedwith player_id=X. Server may 404, attribute to a deleted/stale player, or accept under the wrong subscription.
If instead the next launch is a normal main-app launch:
- Path 1 hits; cached pushSubscription has subscriptionId = nil.
- The
if let subIdguard fails; the mirror clear is never re-issued. - The OSResilientStorage file continues to hold X until a future legitimate
subscriptionIdchange fires didSet again.
Why existing code does not heal it
OSSubscriptionModel.subscriptionId.didSetis the only place that writes the mirror after start(), and it short-circuits onsubscriptionId == oldValue(line 127). After the dropped clear, oldValue is nil on the next session and the guard prevents re-issuing the clear.handleAppIdChange's destructive clear (OneSignal.m:625) re-issues the mirror clear, but only on a true app-id change.- Path 1's backfill is one-way as established above.
- The next forward write that actually updates the mirror is whenever the server issues a new subscription_id (didSet fires with a different value, queue drains, file updated). That self-heals the issue but only after a legitimate id rotation.
Impact / severity
Bounded:
- Compound trigger probability is small — requires a 'missing user' reset, a process kill within the queue drain window (microseconds-to-low-ms), and a subsequent NSE in the prewarm-before-first-unlock window before the next clean main-app launch reissues an id.
- Worst-case impact is one extra/incorrect
receive_receiptrequest that the server is free to reject. No data corruption, no security implication. - Self-heals on the next normal
subscriptionIdupdate.
Hence nit severity. The PR's correctness story rests on OSResilientStorage being a durable mirror — this is one of the cases where the mirror can drift from UD, so it's worth flagging.
Fix
One symmetric line — add an else branch (or simply unconditionally re-issue), mirroring the existing handleAppIdChange clear pattern:
if let subId = pushSubscription.subscriptionId, !subId.isEmpty {
OSResilientStorage.setString(subId, forKey: OSResilientStorage.keySubscriptionId)
} else {
OSResilientStorage.setString(nil, forKey: OSResilientStorage.keySubscriptionId)
}This makes Path 1 bidirectional and defensively re-issues the clear in case the prior session's async clear was dropped. Same shape as the existing setStrings clear used elsewhere in this PR.
| OSResilientStorage.keySubscriptionId: @"", | ||
| OSResilientStorage.keyReceiveReceiptsEnabled: @"", | ||
| OSResilientStorage.keyDidStart: @"" | ||
| }]; | ||
|
|
||
| // Clear all cached data, does not start User Module nor call logout. | ||
| [OneSignalUserManagerImpl.sharedInstance clearAllModelsFromStores]; |
There was a problem hiding this comment.
🟣 🟣 handleAppIdChange's destructive branch now clears the UD entry for OS_PUSH_SUBSCRIPTION_MODEL_STORE_KEY (closing the cross-launch gap), but pushSubscriptionModelStore.models in-memory is still never cleared in the same process. prepareForNewUser only clears the email/SMS subscriptionModelStore; pushSubscriptionModelStore deliberately omits .registerAsUserObserver() (OneSignalUserManagerImpl.swift:181) so OS_ON_USER_WILL_CHANGE never fires for it, and start() short-circuits on hasCalledStart. After a runtime setAppId(A→B) every subsequent SDK operation in that process uses the OLD app's subscription_id (404 / wrong-app routing) until process restart. Pre-existing — one-line follow-up: add pushSubscriptionModelStore.clearModelsFromStore() to clearAllModelsFromStores (or call it from handleAppIdChange right after line 661) so the in-memory and UD halves stay in lockstep.
Extended reasoning...
What is wrong
handleAppIdChanges destructive branch in this PR (OneSignal.m:629-661) now clears the shared-UD entry for OS_PUSH_SUBSCRIPTION_MODEL_STORE_KEY (the new line 651 [sharedUserDefaults removeValueForKey:OS_PUSH_SUBSCRIPTION_MODEL_STORE_KEY]) and then calls [sharedInstance clearAllModelsFromStores]. That closes the cross-launch persistence gap — on the next process launch pushSubscriptionModelStore loads empty and the bug self-heals. But the in-memory pushSubscriptionModelStore.models dict in OneSignalUserManagerImpl.sharedInstance is still never cleared during the same process, so any code that runs between the setAppId(A→B) call and the process exiting reads the OLD app As subscription model out of the in-memory dict.
clearAllModelsFromStores calls prepareForNewUser (OneSignalUserManagerImpl.swift:486-492), which only invokes subscriptionModelStore.clearModelsFromStore() (email/SMS) and posts OS_ON_USER_WILL_CHANGE. The notification observer registered via registerAsUserObserver() (OSModelStore.swift:173-176) only calls removeModelsFromUserDefaults — UserDefaults clear, NOT in-memory. And the push subscription store is uniquely declared without .registerAsUserObserver() at OneSignalUserManagerImpl.swift:181 (its push-sub is device-tied, so its preserved on login/logout) — so it doesnt even receive OS_ON_USER_WILL_CHANGE. The in-memory dict survives untouched.
Why nothing else self-heals the dict in-process
start()is guarded byhasCalledStartat OneSignalUserManagerImpl.swift:212 — once true, the re-entry call returns immediately andrefresh()is never re-driven.refresh()itself guards onmodels.isEmpty(OSModelStore.swift:108) so even if it were called it would not clobber the OLD entry.OSPushSubscriptionImpl.id/token/optedIn(lines 905-919) all readpushSubscriptionModelStore.getModel(key: OS_PUSH_SUBSCRIPTION_MODEL_KEY)directly — they return the OLD subscription.setNewInternalUsers guard at line 517-519 (.keys.contains(OS_PUSH_SUBSCRIPTION_MODEL_KEY)) sees the OLD entry and short-circuits — even a freshlogin()after the app-id change reuses the OLD model.- Mutations to the OLD model trigger
OSModelStore.onModelUpdated, which re-writes the OLD archived dict back to UD — silently defeating the PRs new line 651 UD-clear within the same process lifetime.
Step-by-step proof
- App is running with
appId = "A". Server has issuedsubscriptionId = S_A.pushSubscriptionModelStore.models == {push: <model with subscriptionId=S_A>}. - Developer calls
OneSignal.initialize("B", launchOptions)runtime — the OneSignal.m:624 comment scopes this as a developer-testing scenario, but its still a valid runtime path.setAppId→handleAppIdChange.prevAppId="A", newappId="B", differ → destructive branch. - Line 651 removes
OS_PUSH_SUBSCRIPTION_MODEL_STORE_KEYfrom shared UD. UD-side gap closed. - Line 661 calls
[sharedInstance clearAllModelsFromStores]→prepareForNewUser→ only clears email/SMSsubscriptionModelStorein-memory. PostsOS_ON_USER_WILL_CHANGE. Observers fire for identity/properties stores; push store gets nothing (no observer registered). pushSubscriptionModelStore.modelsstill holds{push: <model with subscriptionId=S_A>}.- Caller code or any subsequent SDK API path reads
pushSubscription.id→OSPushSubscriptionImpl.id→pushSubscriptionModelStore.getModel(key:)→ returns the OLD model →S_A. - Any update on that OLD model fires
OSModelStore.onModelUpdated→ re-writes the (still OLD) dict back to UD at OneSignalUserManagerImpl.swift:194 — undoing the line 651 clear in the same process lifetime. - API requests issued under
appId=Bcarrysubscription_id=S_A→ server returns 404 / wrong-app routing. - The next clean process restart reads an empty
pushSubscriptionModelStorefrom UD (because step 3 emptied UD, unless step 7 re-wrote it during the broken session — which is the additional in-process re-persistence hazard above) → Path 3 creates a fresh push subscription. Self-heals across process restart, but in-process is broken for the entire remaining session.
Why this is pre_existing
The in-memory leak predates this PR: prepareForNewUsers push-store-exclusion and the missing registerAsUserObserver on the push store are both unchanged. Pre-PR, the UD wasnt cleared either, so the bug was worse (UD-side also leaked). The PR addressed the UD half by adding the line 651 removeValueForKey:OS_PUSH_SUBSCRIPTION_MODEL_STORE_KEY and the OSResilientStorage clears — that fixes the cross-launch gap and is a real improvement. But the in-process half is unchanged. Distinct from the prior reviewer comment (inline-comment 3337791560) which was about the UD-side clear that this PR has now addressed.
How to fix
One line. Either inside clearAllModelsFromStores:
func clearAllModelsFromStores() {
prepareForNewUser()
pushSubscriptionModelStore.clearModelsFromStore() // <-- add this
}Or in the destructive branch at OneSignal.m:661 directly after the existing call. Closes the in-process gap symmetric with the UD-side clear at line 651, so both halves stay in lockstep on a runtime app-id change.
wip