Skip to content

wip#1668

Open
nan-li wants to merge 10 commits into
nan/identifier-accessorsfrom
nan/sdk-4725-v2
Open

wip#1668
nan-li wants to merge 10 commits into
nan/identifier-accessorsfrom
nan/sdk-4725-v2

Conversation

@nan-li
Copy link
Copy Markdown
Contributor

@nan-li nan-li commented Jun 1, 2026

wip

Comment thread iOS_SDK/OneSignalSDK/OneSignalUser/Source/OneSignalUserManagerImpl.swift Outdated
Comment thread iOS_SDK/OneSignalSDK/OneSignalUser/Source/OneSignalUserManagerImpl.swift Outdated
@nan-li nan-li force-pushed the nan/sdk-4725-v2 branch from 6f5845f to 309da62 Compare June 1, 2026 17:47
@nan-li
Copy link
Copy Markdown
Contributor Author

nan-li commented Jun 1, 2026

@claude review

Comment thread iOS_SDK/OneSignalSDK/OneSignalUser/Source/OSIdentityModel.swift Outdated
Comment on lines +38 to +46
BOOL enabled = [OneSignalUserDefaults.initShared getSavedBoolForKey:OSUD_RECEIVE_RECEIPTS_ENABLED defaultValue:NO];
if (enabled) {
return YES;
}
// UserDefaults can return NO for two reasons: the flag is genuinely off, or the shared
// UserDefaults file isn't readable right now (NSE running while device is locked under
// NSFileProtectionCompleteUntilFirstUserAuthentication). Fall back to the unencrypted cache.
NSString *cached = [OSResilientStorage stringForKey:OSResilientStorage.keyReceiveReceiptsEnabled];
return [cached isEqualToString:@"1"];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 🟡 Minor consistency window in isReceiveReceiptsEnabled (OneSignalReceiveReceiptsController.m:38-46): when the server flips IOS_RECEIVE_RECEIPTS_ENABLE from YES→NO in downloadIOSParamsWithAppId, the UserDefaults write is synchronous but OSResilientStorage.setString is queue.async — between those two writes an NSE reading the flag sees UD=NO, falls back to the cache, reads stale "1", and returns YES. The window is microseconds-to-milliseconds and the worst-case impact is one extra receive_receipt request the server is free to ignore. Easy mitigation if you want it tight: write OSResilientStorage before UserDefaults at OneSignal.m:631-637 so the race direction is safe, or make this specific setString synchronous.

Extended reasoning...

The race. The new fallback in isReceiveReceiptsEnabled treats a UD value of NO as ambiguous and consults OSResilientStorage as a tiebreaker — this is the right call for the locked-NSE case where cfprefsd returns nil under NSFileProtectionCompleteUntilFirstUserAuthentication. But UD can also legitimately be NO because the server just disabled the flag. The two storage layers are not updated atomically:

// OneSignal.m:631-637 (server response handler)
[OneSignalUserDefaults.initShared saveBoolForKey:OSUD_RECEIVE_RECEIPTS_ENABLED withValue:enabled]; // synchronous
[OSResilientStorage setString:enabled ? @"1" : @"0" forKey:...];                                   // queue.async

OneSignalUserDefaults.saveBoolForKey is -setBool:forKey: + synchronize — synchronous from the caller's view, and cross-process visible after cfprefsd propagation. OSResilientStorage.setString (OSResilientStorage.swift:132-142) hops to a serial queue, then does a Data(contentsOf:) → mutate → atomic data.write cycle.

Step-by-step proof of a stale YES.

  1. Initial state on disk: UD says OSUD_RECEIVE_RECEIPTS_ENABLED = YES, file says receive_receipts_enabled = "1". NSE returns YES correctly.
  2. Server flips the flag to disabled. Main app receives downloadIOSParamsWithAppId response.
  3. T₀: Main app calls saveBoolForKey:withValue:NO. UD now says NO; cfprefsd propagates this to the NSE within microseconds.
  4. T₀ + ε: Main app calls [OSResilientStorage setString:@"0" ...]. The work is enqueued on the serial queue but not yet executed — the file on disk still says "1".
  5. T₀ + ε + δ: An NSE wakes for a push delivery (this is the realistic timing — NSEs fire on push arrival, not on a schedule).
  6. NSE's isReceiveReceiptsEnabled reads UD → gets NO (enabled = NO, falls through the if (enabled) return YES early-out).
  7. NSE reads OSResilientStorage.stringForKey:keyReceiveReceiptsEnabled → file still says "1" → returns YES.
  8. NSE proceeds to send report_received, which the server has just disabled.

Why existing code doesn't prevent it. The whole point of the fallback added at OneSignalReceiveReceiptsController.m:38-46 is to not trust UD's NO. So the fallback unconditionally consults the file, with no way to distinguish "UD was unreadable" from "UD was explicitly set to NO." OSResilientStorage.setString's queue.async ensures the file write isn't synchronous with the UD write that just preceded it — there is no barrier between the two stores.

Why this is a nit, not a blocker. The refutation captures it well and I agree:

  • The window is bounded by one serial dispatch + one file write, roughly microseconds to a few ms. Cross-process cfprefsd propagation widens it slightly but it's still a narrow window.
  • It requires the rare YES→NO server flip and an NSE waking inside that window.
  • The impact is at most one extra report_received request per occurrence — server-idempotent, no data corruption, no user-visible effect, no security implication. The server is the authoritative source and will reject the receipt if it cares.
  • The PR fixes the dominant locked-NSE false-negative case, which is frequent and deterministic; this is a small, asymmetric trade-off in the opposite direction.

Cheap mitigation if you want a clean story. Reverse the write order in downloadIOSParamsWithAppId so the cache is updated before UD — then the race direction becomes "UD still says YES, file already says 0" → the early-return at line 40 still hits and the fallback isn't consulted, so the stale read can't manifest. Or, only for this one key, dispatch the OSResilientStorage write via queue.sync. Either is a one-line change and neither affects the locked-NSE fix.

Not a merge blocker.

Comment thread iOS_SDK/OneSignalSDK/Source/OneSignal.m Outdated
Comment thread iOS_SDK/OneSignalSDK/Source/OneSignal.m Outdated
Comment thread iOS_SDK/OneSignalSDK/Source/OneSignal.m Outdated
Comment thread iOS_SDK/OneSignalSDK/OneSignalUser/Source/OneSignalUserManagerImpl.swift Outdated
@nan-li nan-li force-pushed the nan/sdk-4725-v2 branch from 4070f88 to fa929fd Compare June 1, 2026 19:11
@nan-li
Copy link
Copy Markdown
Contributor Author

nan-li commented Jun 1, 2026

@claude review

Comment thread iOS_SDK/OneSignalSDK/Source/OneSignal.m
Comment thread iOS_SDK/OneSignalSDK/OneSignalOSCore/Source/OneSignalConfig.swift
Comment thread iOS_SDK/OneSignalSDK/Source/OneSignal.m Outdated
Comment thread iOS_SDK/OneSignalSDK/Source/OneSignal.m Outdated
Comment thread iOS_SDK/OneSignalSDK/Source/OneSignal.m Outdated
Comment thread iOS_SDK/OneSignalSDK/Source/OneSignal.m
Comment thread iOS_SDK/OneSignalSDK/OneSignalOSCore/Source/OneSignalConfig.swift Outdated
Comment thread iOS_SDK/OneSignalSDK/Source/OneSignal.m Outdated
Comment thread iOS_SDK/OneSignalSDK/Source/OneSignal.m
Comment thread iOS_SDK/OneSignalSDK/Source/OneSignal.m
Comment on lines +243 to +245
if let subId = pushSubscription.subscriptionId, !subId.isEmpty {
OSResilientStorage.setString(subId, forKey: OSResilientStorage.keySubscriptionId)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Path 1's OSResilientStorage backfill at OneSignalUserManagerImpl.swift:243-245 is one-way — it writes the mirror only when the cached pushSubscription.subscriptionId is non-empty, but never clears it when nil. If a 'missing user' server response resets subscriptionId to nil and the process is force-killed within the OSSubscriptionModel.didSet queue.async clear window (OSSubscriptionModel.swift:141), the on-disk mirror retains the stale id; next launch's Path 1 skips backfill and an NSE wake during prewarm reads the stale subscription_id via OneSignalIdentifiers.subscriptionId. Nit — narrow trigger window, self-heals on the next subscriptionId update, but the symmetric fix is one line: add an else branch that clears the mirror when the cached value is nil/empty so Path 1 is bidirectional.

Extended reasoning...

What goes wrong

Path 1 of OneSignalUserManagerImpl.start() (lines 243-245 in this PR) populates the OSResilientStorage.keySubscriptionId mirror whenever it loads a cached user, but only when the cached id is present:

if let subId = pushSubscription.subscriptionId, !subId.isEmpty {
    OSResilientStorage.setString(subId, forKey: OSResilientStorage.keySubscriptionId)
}

The asymmetry matters because the only other writer of the mirror — OSSubscriptionModel.subscriptionId.didSet at OSSubscriptionModel.swift:141 — uses OSResilientStorage.setString(subscriptionId ?? "", forKey: ...) which dispatches via queue.async (OSResilientStorage.swift:136). The UD save on the preceding line is synchronous, but the mirror clear is enqueued, not executed.

The race

When a server response (OSUserExecutor lines 336/421/472/489, OSIdentityOperationExecutor:235, OSPropertyOperationExecutor:290, OSSubscriptionOperationExecutor:337) signals 'missing user' and resets subscriptionId to nil:

  1. didSet fires: UD write happens synchronously (saveStringsynchronize), so OSUD_PUSH_SUBSCRIPTION_ID is cleared in cfprefsd.
  2. OSResilientStorage.setString("", forKey: keySubscriptionId) is dispatched to the serial com.onesignal.resilient-storage queue but not yet drained.
  3. Process is force-killed (SIGKILL, OOM, debugger detach) within the microsecond-to-low-ms drain window. The mirror clear is lost.
  4. The push subscription model itself was persisted synchronously through the model store's saveCodeableData (via set(property:)onModelUpdated), so the next launch's cached pushSubscription.subscriptionId is legitimately nil.

Why Path 1 misses the reconciliation opportunity

On the next launch, Path 1 takes (cached user exists), but at lines 243-245 the if let subId = ..., !subId.isEmpty guard fails for nil/empty — so the backfill is a no-op. The stale id sitting in the OSResilientStorage file is never overwritten. hasCalledStart is set; subsequent start() calls early-return.

Why an NSE in this state misroutes

OneSignalIdentifiers.subscriptionId (OneSignalIdentifiers.swift:65-75) reads shared UD first, then falls back to OSResilientStorage. During prewarm-before-first-unlock, the UD read returns nil because the file is locked under NSFileProtectionCompleteUntilFirstUserAuthentication. Falls through to OSResilientStorage.string(forKey: keySubscriptionId) → returns the stale X. The NSE's OneSignalReceiveReceiptsController sends report_received with the wrong subscription_id.

Step-by-step proof

  1. Session 1: server issues subscription_id = X. UD has X, OSResilientStorage file has X.
  2. App makes a request; server returns 'missing user'. OSUserExecutor resets pushSubscriptionModel.subscriptionId = nil.
  3. subscriptionId.didSet fires (line 127 guard subscriptionId != oldValue passes since X → nil):
    • Line 140: saveString(forKey: OSUD_PUSH_SUBSCRIPTION_ID, withValue: nil) — synchronous, UD cleared.
    • Line 141: OSResilientStorage.setString("", forKey: keySubscriptionId) — enqueued on serial queue, file still says X.
  4. Process is killed before the queue drains (e.g. user force-quits, OOM, OS terminates background slice). File on disk retains X.
  5. Some time later, an NSE wakes for a push during prewarm-before-first-unlock. OneSignalIdentifiers.subscriptionId → UD locked → nil → OSResilientStorage fallback → returns stale X.
  6. NSE sends report_received with player_id=X. Server may 404, attribute to a deleted/stale player, or accept under the wrong subscription.

If instead the next launch is a normal main-app launch:

  • Path 1 hits; cached pushSubscription has subscriptionId = nil.
  • The if let subId guard fails; the mirror clear is never re-issued.
  • The OSResilientStorage file continues to hold X until a future legitimate subscriptionId change fires didSet again.

Why existing code does not heal it

  • OSSubscriptionModel.subscriptionId.didSet is the only place that writes the mirror after start(), and it short-circuits on subscriptionId == oldValue (line 127). After the dropped clear, oldValue is nil on the next session and the guard prevents re-issuing the clear.
  • handleAppIdChange's destructive clear (OneSignal.m:625) re-issues the mirror clear, but only on a true app-id change.
  • Path 1's backfill is one-way as established above.
  • The next forward write that actually updates the mirror is whenever the server issues a new subscription_id (didSet fires with a different value, queue drains, file updated). That self-heals the issue but only after a legitimate id rotation.

Impact / severity

Bounded:

  • Compound trigger probability is small — requires a 'missing user' reset, a process kill within the queue drain window (microseconds-to-low-ms), and a subsequent NSE in the prewarm-before-first-unlock window before the next clean main-app launch reissues an id.
  • Worst-case impact is one extra/incorrect receive_receipt request that the server is free to reject. No data corruption, no security implication.
  • Self-heals on the next normal subscriptionId update.

Hence nit severity. The PR's correctness story rests on OSResilientStorage being a durable mirror — this is one of the cases where the mirror can drift from UD, so it's worth flagging.

Fix

One symmetric line — add an else branch (or simply unconditionally re-issue), mirroring the existing handleAppIdChange clear pattern:

if let subId = pushSubscription.subscriptionId, !subId.isEmpty {
    OSResilientStorage.setString(subId, forKey: OSResilientStorage.keySubscriptionId)
} else {
    OSResilientStorage.setString(nil, forKey: OSResilientStorage.keySubscriptionId)
}

This makes Path 1 bidirectional and defensively re-issues the clear in case the prior session's async clear was dropped. Same shape as the existing setStrings clear used elsewhere in this PR.

Comment thread iOS_SDK/OneSignalSDK/Source/OneSignal.m
Comment on lines +655 to 661
OSResilientStorage.keySubscriptionId: @"",
OSResilientStorage.keyReceiveReceiptsEnabled: @"",
OSResilientStorage.keyDidStart: @""
}];

// Clear all cached data, does not start User Module nor call logout.
[OneSignalUserManagerImpl.sharedInstance clearAllModelsFromStores];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟣 🟣 handleAppIdChange's destructive branch now clears the UD entry for OS_PUSH_SUBSCRIPTION_MODEL_STORE_KEY (closing the cross-launch gap), but pushSubscriptionModelStore.models in-memory is still never cleared in the same process. prepareForNewUser only clears the email/SMS subscriptionModelStore; pushSubscriptionModelStore deliberately omits .registerAsUserObserver() (OneSignalUserManagerImpl.swift:181) so OS_ON_USER_WILL_CHANGE never fires for it, and start() short-circuits on hasCalledStart. After a runtime setAppId(A→B) every subsequent SDK operation in that process uses the OLD app's subscription_id (404 / wrong-app routing) until process restart. Pre-existing — one-line follow-up: add pushSubscriptionModelStore.clearModelsFromStore() to clearAllModelsFromStores (or call it from handleAppIdChange right after line 661) so the in-memory and UD halves stay in lockstep.

Extended reasoning...

What is wrong

handleAppIdChanges destructive branch in this PR (OneSignal.m:629-661) now clears the shared-UD entry for OS_PUSH_SUBSCRIPTION_MODEL_STORE_KEY (the new line 651 [sharedUserDefaults removeValueForKey:OS_PUSH_SUBSCRIPTION_MODEL_STORE_KEY]) and then calls [sharedInstance clearAllModelsFromStores]. That closes the cross-launch persistence gap — on the next process launch pushSubscriptionModelStore loads empty and the bug self-heals. But the in-memory pushSubscriptionModelStore.models dict in OneSignalUserManagerImpl.sharedInstance is still never cleared during the same process, so any code that runs between the setAppId(A→B) call and the process exiting reads the OLD app As subscription model out of the in-memory dict.

clearAllModelsFromStores calls prepareForNewUser (OneSignalUserManagerImpl.swift:486-492), which only invokes subscriptionModelStore.clearModelsFromStore() (email/SMS) and posts OS_ON_USER_WILL_CHANGE. The notification observer registered via registerAsUserObserver() (OSModelStore.swift:173-176) only calls removeModelsFromUserDefaults — UserDefaults clear, NOT in-memory. And the push subscription store is uniquely declared without .registerAsUserObserver() at OneSignalUserManagerImpl.swift:181 (its push-sub is device-tied, so its preserved on login/logout) — so it doesnt even receive OS_ON_USER_WILL_CHANGE. The in-memory dict survives untouched.

Why nothing else self-heals the dict in-process

  • start() is guarded by hasCalledStart at OneSignalUserManagerImpl.swift:212 — once true, the re-entry call returns immediately and refresh() is never re-driven.
  • refresh() itself guards on models.isEmpty (OSModelStore.swift:108) so even if it were called it would not clobber the OLD entry.
  • OSPushSubscriptionImpl.id/token/optedIn (lines 905-919) all read pushSubscriptionModelStore.getModel(key: OS_PUSH_SUBSCRIPTION_MODEL_KEY) directly — they return the OLD subscription.
  • setNewInternalUsers guard at line 517-519 (.keys.contains(OS_PUSH_SUBSCRIPTION_MODEL_KEY)) sees the OLD entry and short-circuits — even a fresh login() after the app-id change reuses the OLD model.
  • Mutations to the OLD model trigger OSModelStore.onModelUpdated, which re-writes the OLD archived dict back to UD — silently defeating the PRs new line 651 UD-clear within the same process lifetime.

Step-by-step proof

  1. App is running with appId = "A". Server has issued subscriptionId = S_A. pushSubscriptionModelStore.models == {push: <model with subscriptionId=S_A>}.
  2. Developer calls OneSignal.initialize("B", launchOptions) runtime — the OneSignal.m:624 comment scopes this as a developer-testing scenario, but its still a valid runtime path. setAppIdhandleAppIdChange. prevAppId="A", new appId="B", differ → destructive branch.
  3. Line 651 removes OS_PUSH_SUBSCRIPTION_MODEL_STORE_KEY from shared UD. UD-side gap closed.
  4. Line 661 calls [sharedInstance clearAllModelsFromStores]prepareForNewUser → only clears email/SMS subscriptionModelStore in-memory. Posts OS_ON_USER_WILL_CHANGE. Observers fire for identity/properties stores; push store gets nothing (no observer registered).
  5. pushSubscriptionModelStore.models still holds {push: <model with subscriptionId=S_A>}.
  6. Caller code or any subsequent SDK API path reads pushSubscription.idOSPushSubscriptionImpl.idpushSubscriptionModelStore.getModel(key:) → returns the OLD model → S_A.
  7. Any update on that OLD model fires OSModelStore.onModelUpdated → re-writes the (still OLD) dict back to UD at OneSignalUserManagerImpl.swift:194 — undoing the line 651 clear in the same process lifetime.
  8. API requests issued under appId=B carry subscription_id=S_A → server returns 404 / wrong-app routing.
  9. The next clean process restart reads an empty pushSubscriptionModelStore from UD (because step 3 emptied UD, unless step 7 re-wrote it during the broken session — which is the additional in-process re-persistence hazard above) → Path 3 creates a fresh push subscription. Self-heals across process restart, but in-process is broken for the entire remaining session.

Why this is pre_existing

The in-memory leak predates this PR: prepareForNewUsers push-store-exclusion and the missing registerAsUserObserver on the push store are both unchanged. Pre-PR, the UD wasnt cleared either, so the bug was worse (UD-side also leaked). The PR addressed the UD half by adding the line 651 removeValueForKey:OS_PUSH_SUBSCRIPTION_MODEL_STORE_KEY and the OSResilientStorage clears — that fixes the cross-launch gap and is a real improvement. But the in-process half is unchanged. Distinct from the prior reviewer comment (inline-comment 3337791560) which was about the UD-side clear that this PR has now addressed.

How to fix

One line. Either inside clearAllModelsFromStores:

func clearAllModelsFromStores() {
    prepareForNewUser()
    pushSubscriptionModelStore.clearModelsFromStore()   // <-- add this
}

Or in the destructive branch at OneSignal.m:661 directly after the existing call. Closes the in-process gap symmetric with the UD-side clear at line 651, so both halves stay in lockstep on a runtime app-id change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant