Skip to content

test(kademlia): deflake TestAddressBookQuickPrune and TestOutofDepthPrune#5508

Open
martinconic wants to merge 2 commits into
masterfrom
fix/kademlia-test-flakes
Open

test(kademlia): deflake TestAddressBookQuickPrune and TestOutofDepthPrune#5508
martinconic wants to merge 2 commits into
masterfrom
fix/kademlia-test-flakes

Conversation

@martinconic

@martinconic martinconic commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Checklist

  • I have read the coding guide.
  • My change requires a documentation update, and I have done it.
  • I have added tests to cover my changes.
  • I have filled out the description and linked the related issues.

Description

Deflakes two timing-sensitive tests in pkg/topology/kademlia that intermittently fail in CI on unrelated branches:

  • TestAddressBookQuickPrune — e.g. kademlia_test.go: timed out waiting for counter ... got 3 want >= 4
  • TestOutofDepthPrune — e.g. kademlia_test.go: bin 5, got 14, want more than 16

Both failures are test-harness timing bugs: the tests gated async connect/prune progress on fixed time.Sleep barriers and on the assumption that each Trigger() yields exactly one dial, which breaks under load (a trigger landing inside a peer's TimeToRetry window is skipped; a 500ms barrier is not always enough on a loaded runner).

Product change required by this approach. synctest demands that every goroutine started inside the bubble terminates before the test ends. Kademlia's metrics collector opens an in-memory leveldb in New (shed.NewDB), but the *shed.DB handle was passed to the collector and then dropped — Close never closed it. This is a real (if low-severity) resource/goroutine leak that was masked in tests by the goleak ignores in main_test.go. We now retain the store handle on the Kad struct and close it in Close, after the collector is finalized. With the store closed, the leveldb compaction/session goroutines exit.

AI Disclosure

  • This PR contains code that has been generated by an LLM.
  • I have reviewed the AI generated code thoroughly.
  • I possess the technical expertise to responsibly review the code generated in this PR.

Comment thread pkg/topology/kademlia/kademlia_test.go Outdated
@martinconic martinconic requested a review from janos June 22, 2026 13:09
@martinconic martinconic requested a review from sbackend123 June 22, 2026 19:07
logger log.Logger // logger
bootnode bool // indicates whether the node is working in bootnode mode
collector *im.Collector
metricsDB *shed.DB // backing store for the metrics collector; closed on shutdown

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't understand why this is needed. it does not appear anywhere else in the diff. if this is added because a test needs it then it probably lives in the wrong place. if this is some leftover from another place then probably should be removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants