Skip to content

Deflake upgrade/downgrade tests by making restart deterministic#2991

Open
masih wants to merge 7 commits intomainfrom
masih/fix-upgrade-test-flake
Open

Deflake upgrade/downgrade tests by making restart deterministic#2991
masih wants to merge 7 commits intomainfrom
masih/fix-upgrade-test-flake

Conversation

@masih
Copy link
Collaborator

@masih masih commented Feb 27, 2026

The flake was a restart race in upgrade tests. seid_upgrade.sh would pkill the old process, immediately launch the new one, and immediately print PASS. In CI the old process sometimes hadn't fully released DB/file locks, so the new process exited with "failed to initialize database: resource temporarily unavailable". Because the script still returned PASS, the test continued until verify_running.sh eventually reported FAIL.

Make restart deterministic in both seid_upgrade.sh and seid_downgrade.sh:

  • Wait for prior seid process to fully exit (pgrep loop) after pkill
  • Fail fast if old process is still alive after timeout
  • Start seid with a retry loop to handle the transient DB lock window
  • Only return PASS when pgrep confirms the process is actually running
  • Return FAIL after max attempts

Also fix a bug in seid_upgrade.sh where a missing $ meant the UPGRADE_VERSION_LIST emptiness check was always false.

Flaked on main.

The flake was a restart race in upgrade tests. seid_upgrade.sh would
pkill the old process, immediately launch the new one, and immediately
print PASS. In CI the old process sometimes hadn't fully released
DB/file locks, so the new process exited with "failed to initialize
database: resource temporarily unavailable". Because the script still
returned PASS, the test continued until verify_running.sh eventually
reported FAIL.

Make restart deterministic in both seid_upgrade.sh and seid_downgrade.sh:

- Wait for prior seid process to fully exit (pgrep loop) after pkill
- Fail fast if old process is still alive after timeout
- Start seid with a retry loop to handle the transient DB lock window
- Only return PASS when pgrep confirms the process is actually running
- Return FAIL after max attempts

Also fix a bug in seid_upgrade.sh where a missing $ meant the
UPGRADE_VERSION_LIST emptiness check was always false.

Flaked on [main](https://github.com/sei-protocol/sei-chain/actions/runs/22481411026/job/65120263775).
@github-actions
Copy link

github-actions bot commented Feb 27, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedFeb 27, 2026, 10:51 PM

@masih masih enabled auto-merge (squash) February 27, 2026 13:03
@codecov
Copy link

codecov bot commented Feb 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.18%. Comparing base (1c2910c) to head (ab731c8).

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff            @@
##             main    #2991    +/-   ##
========================================
  Coverage   58.17%   58.18%            
========================================
  Files        2113     2111     -2     
  Lines      173688   173001   -687     
========================================
- Hits       101037   100654   -383     
+ Misses      63622    63438   -184     
+ Partials     9029     8909   -120     
Flag Coverage Δ
sei-db 70.62% <ø> (+0.21%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 46 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants