SDSTOR-21465: scrubber phase 1#413
Conversation
0f155d8 to
f3cc39a
Compare
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## stable/v4.x #413 +/- ##
==============================================
Coverage ? 53.14%
==============================================
Files ? 39
Lines ? 6902
Branches ? 943
==============================================
Hits ? 3668
Misses ? 2823
Partials ? 411 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
b25a7a3 to
83d0375
Compare
83d0375 to
fdf3a0c
Compare
1cec0d0 to
96736e5
Compare
fe2f1c0 to
dfdb099
Compare
dfdb099 to
4ac2c8a
Compare
xiaoxichen
left a comment
There was a problem hiding this comment.
The NPE bug should be fixed before merging, other part LGTM.
The NPE can be triggered if some SM enabled scrubbing during config change/upgrade, then causing cluster wide crashing.
|
the ut failure is in scubber test, please take a look |
|
I triggered this UT again and CI passed. Moreover, I run this UT locally for several times , but can not reproduce this failure. I think it is a flaky case |
|
I tried several times again, but still can not reproduce this case. I suspect the root cause is the unexpected leader switch during UT, just like the flaky stuck case we can sometimes see in homestore_test_pg/shard/blob. if unexpected leader switch happens during UT, 2 for homestore_test_pg/shard/blob, follower will wait for something to happen, and leader think it is not leader any more( because of leader switch) and do not schedule some op, then all the member will sync and wait at some point, and thus the UT is stuck. so, I think thing we need try to handle the leader switch case in raft test framework. |
|
OK, lets wait a few days for other team members to review. If no further comment lets merge as is. |
No description provided.