IGNITE-28675 Fix flaky TxDeadlockCauseTest #testCause, #testCauseSeveralNodes#13136
IGNITE-28675 Fix flaky TxDeadlockCauseTest #testCause, #testCauseSeveralNodes#13136zstan wants to merge 7 commits into
Conversation
|
From documentation: |
TCBot Test Analysis
Possible Blockers (0)No blockers found. New Tests (0)No new tests found. |
|
Thanks for working on this flaky test. I have two small suggestions:
|
| } | ||
| catch (Exception e) { | ||
| ex.compareAndSet(null, e); | ||
| // TransactionDeadlockException raised at least for one transaction involved in the deadlock |
There was a problem hiding this comment.
this is from our own documentation
|
|
||
| /** Deadlock detection maximum iterations. */ | ||
| private static int deadLockTimeout = | ||
| private static final int DEAD_LOCK_TIMEOUT = |
There was a problem hiding this comment.
Let's roll back adding the final qualifier here, because it changes the semantics of TxDeadlockDetectionNoHangsTest and is not the main goal of this patch.
There was a problem hiding this comment.
and roll back renaming too
There was a problem hiding this comment.
disagree, why do we need to support non-final definition only for test cases ?
test work properly well without this reflection changes, i run it multiple times, you can check it through TC
There was a problem hiding this comment.
My position is not really about final/non-final definition itself. My point is that it is usually safer to keep PRs as minimal as possible for a specific purpose. The main goal here is to fix TxDeadlockCauseTest, and for now it is not very clear why we also need to refactor an unrelated field and affect existing test behavior.
Green TC runs are definitely a good sign, but some flaky tests fail once in weeks or even months. That's why I'd prefer to avoid introducing additional assumptions here unless this change is really required for the fix itself.
Could you provide more reasoning on why it is safe to remove before/afterTests from TxDeadlockDetectionNoHangsTest?
There was a problem hiding this comment.
I appent additional checks into test, plz check last commit
Also, tests restore after for hardcoded - 60000, that equal to current default but can be changed in future thus there can be a situation when test restores not a default.
|
|
||
| cfg.setCacheConfiguration(ccfg); | ||
|
|
||
| listeningLog.registerListener(lsnr); |
There was a problem hiding this comment.
public void registerListener(@NotNull LogListener lsnr) {
lsnr.reset();
Can we lose an already caught "Deadlock detection was timed out" message here? doTest starts grid(NODES_CNT) many times and each start calls getConfiguration(). But registerListener(LogListener) resets the listener, so a later node start can clear a timeout message that was caught before
There was a problem hiding this comment.
I missunderstand - lsnr.check() calls after each test on: afterTest()
Can you explain how it can be a problem ? Thanks !
There was a problem hiding this comment.
I mean that lsnr.reset() can clear the listener state. Can getConfiguration() be called after Deadlock detection was timed out was already logged?
doTest() runs for about 120 seconds. During this time, the restart thread starts NODES_CNT grids many times, and each start calls getConfiguration(). The deadlock detection timeout is 60 seconds, so the timeout log can appear first. After that, another node restart can call getConfiguration() and reset the listener before afterTest() calls lsnr.check()
There was a problem hiding this comment.
got it ! thanks, if now it`s ok i will repeatedly run a TC
There was a problem hiding this comment.
Theoretically, we still have a small race:
assertFalse(DEAD_LOCK_LSNR.check());
// catch the log
listeningLog.registerListener(DEAD_LOCK_LSNR);
Probably it would be better to use
public void registerListener(@NotNull Consumer<String> lsnr)
and make smth like
registerListener(log -> <update atomicBoolean if catch the log>)
so the atomicBoolean would be a common state for all grids
But on the other hand it makes the test more complicated. WDYT?
There was a problem hiding this comment.
you say the proper things of cource, thanks, change a tests.
|




No description provided.