Skip to content

chore: add downgrade compatibility CI#556

Draft
dustinbyrne wants to merge 3 commits into
mainfrom
chore/downgrade-compatibility-ci
Draft

chore: add downgrade compatibility CI#556
dustinbyrne wants to merge 3 commits into
mainfrom
chore/downgrade-compatibility-ci

Conversation

@dustinbyrne

Copy link
Copy Markdown
Contributor

💡 Motivation and Context

Adds Android parity for the iOS downgrade-compatibility guard in PostHog/posthog-ios#604.

The SDK can safely change persisted storage schemas, but a user downgrading to an older SDK should not crash when that older SDK starts against newer persisted data. This PR adds a CI smoke test that writes current-checkout persisted state, then starts pinned older Android SDK versions against that same state.

The pinned versions target meaningful persisted-state boundaries instead of the last N releases.

💚 How did you test it?

  • DOWNGRADE_VERSION=3.43.3 ./scripts/test-downgrade-compatibility.sh
  • DOWNGRADE_VERSION=3.45.1 ./scripts/test-downgrade-compatibility.sh
  • DOWNGRADE_VERSION=3.47.0 ./scripts/test-downgrade-compatibility.sh
  • make checkFormat
  • bash -n scripts/test-downgrade-compatibility.sh
  • git diff --check

📝 Checklist

  • I reviewed the submitted code.
  • I added tests to verify the changes.
  • I updated the docs if needed.
  • No breaking change or entry added to the changelog.

If releasing new changes

  • Ran pnpm changeset to generate a changeset file

@greptile-apps

greptile-apps Bot commented Jun 9, 2026

Copy link
Copy Markdown
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
scripts/test-downgrade-compatibility.sh:102
**Unconditional log-file requirement may mismatch write output**

`captureLogsIfAvailable()` in the Java write phase returns `false` (and writes zero log files) if the `captureLog` method is not found via reflection. The shell script then checks `LOGS_QUEUE_FILE_COUNT` unconditionally; if the API is ever renamed or its signature changes, the write step silently produces 0 log files while this line exits with a misleading "Expected queued log files to be persisted" error rather than pointing to the actual cause. Consider making this check conditional on whether the write phase actually produced log files (e.g., via a sentinel file or a non-zero exit-code signal from the Java task).

### Issue 2 of 2
scripts/downgrade-compatibility-smoke/src/test/java/com/posthog/downgrade/DowngradeCompatibilitySmokeTest.java:148-152
**Fixed-sleep window for background thread safety is sensitive to CI load**

The 1-second sleep after `flush()` is the only guard that background queue threads have had enough time to attempt deserialising the old persisted files and surface a crash via the uncaught-exception handler. On an overloaded runner the executor threads may not have been scheduled at all within 1 second, making the test pass while the actual crash goes undetected. A `CountDownLatch` or polling loop on the uncaught-exceptions list (up to a reasonable deadline) would make this deterministic rather than timing-dependent.

Reviews (1): Last reviewed commit: "chore: add downgrade compatibility CI" | Re-trigger Greptile


require_positive_count "queued analytics event files" "$EVENT_QUEUE_FILE_COUNT"
require_positive_count "queued replay snapshot files" "$REPLAY_QUEUE_FILE_COUNT"
require_positive_count "queued log files" "$LOGS_QUEUE_FILE_COUNT"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Unconditional log-file requirement may mismatch write output

captureLogsIfAvailable() in the Java write phase returns false (and writes zero log files) if the captureLog method is not found via reflection. The shell script then checks LOGS_QUEUE_FILE_COUNT unconditionally; if the API is ever renamed or its signature changes, the write step silently produces 0 log files while this line exits with a misleading "Expected queued log files to be persisted" error rather than pointing to the actual cause. Consider making this check conditional on whether the write phase actually produced log files (e.g., via a sentinel file or a non-zero exit-code signal from the Java task).

Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/test-downgrade-compatibility.sh
Line: 102

Comment:
**Unconditional log-file requirement may mismatch write output**

`captureLogsIfAvailable()` in the Java write phase returns `false` (and writes zero log files) if the `captureLog` method is not found via reflection. The shell script then checks `LOGS_QUEUE_FILE_COUNT` unconditionally; if the API is ever renamed or its signature changes, the write step silently produces 0 log files while this line exits with a misleading "Expected queued log files to be persisted" error rather than pointing to the actual cause. Consider making this check conditional on whether the write phase actually produced log files (e.g., via a sentinel file or a non-zero exit-code signal from the Java task).

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +148 to +152

// Give queue executors a chance to load, decode, and flush cached files. Real downgrade
// crashes generally surface on these daemon threads rather than on the test thread.
Thread.sleep(1_000);
assertNoUncaughtExceptions();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Fixed-sleep window for background thread safety is sensitive to CI load

The 1-second sleep after flush() is the only guard that background queue threads have had enough time to attempt deserialising the old persisted files and surface a crash via the uncaught-exception handler. On an overloaded runner the executor threads may not have been scheduled at all within 1 second, making the test pass while the actual crash goes undetected. A CountDownLatch or polling loop on the uncaught-exceptions list (up to a reasonable deadline) would make this deterministic rather than timing-dependent.

Prompt To Fix With AI
This is a comment left during a code review.
Path: scripts/downgrade-compatibility-smoke/src/test/java/com/posthog/downgrade/DowngradeCompatibilitySmokeTest.java
Line: 148-152

Comment:
**Fixed-sleep window for background thread safety is sensitive to CI load**

The 1-second sleep after `flush()` is the only guard that background queue threads have had enough time to attempt deserialising the old persisted files and surface a crash via the uncaught-exception handler. On an overloaded runner the executor threads may not have been scheduled at all within 1 second, making the test pass while the actual crash goes undetected. A `CountDownLatch` or polling loop on the uncaught-exceptions list (up to a reasonable deadline) would make this deterministic rather than timing-dependent.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant