Skip to content

Commit 160d535

Browse files
authored
test(iroh): Fix test_active_relay_inactive test being flaky (#3680)
## Description This reverts a change from this PR: #3384 I originally thought I could make this test more reliable by pausing the tokio time across the `tokio::time::timeout` calls, but it turns out that actually makes the test *more* flaky: - When time is paused, the timeout will immediately fire once the tokio runtime has no more CPU work to do. - It's possible that there's no CPU work to do anymore, while there's something else that is actually still doing work, e.g. networking. - Before the `ActiveRelayActor` finishes its `run_connected` loop, it will call `client_sink.close().await`, which will do actual I/O. When the tokio runtime is paused at that moment, it'll immediately trigger the test's timeout. ## Notes & open questions I couldn't reproduce this problem even across a couple thousand runs of the test locally. I'm not super confident that this fixes things, but I've analyzed the logs and this seems to be the most likely thing that's happening to me. Closes #3613 ## Change checklist <!-- Remove any that are not relevant. --> - [x] Self-review.
1 parent d328bf2 commit 160d535

File tree

1 file changed

+11
-7
lines changed
  • iroh/src/magicsock/transports/relay

1 file changed

+11
-7
lines changed

iroh/src/magicsock/transports/relay/actor.rs

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1459,7 +1459,6 @@ mod tests {
14591459

14601460
#[tokio::test]
14611461
#[traced_test]
1462-
#[ignore = "flaky"]
14631462
async fn test_active_relay_inactive() -> Result {
14641463
let (_relay_map, relay_url, _server) = test_utils::run_relay_server().await?;
14651464

@@ -1481,11 +1480,11 @@ mod tests {
14811480
);
14821481

14831482
// Wait until the actor is connected to the relay server.
1484-
tokio::time::timeout(Duration::from_millis(200), async {
1483+
tokio::time::timeout(Duration::from_secs(5), async {
14851484
loop {
14861485
let (tx, rx) = oneshot::channel();
14871486
inbox_tx.send(ActiveRelayMessage::PingServer(tx)).await.ok();
1488-
if tokio::time::timeout(Duration::from_millis(100), rx)
1487+
if tokio::time::timeout(Duration::from_millis(200), rx)
14891488
.await
14901489
.map(|resp| resp.is_ok())
14911490
.unwrap_or_default()
@@ -1497,12 +1496,12 @@ mod tests {
14971496
.await
14981497
.std_context("timeout")?;
14991498

1500-
// From now on, we pause time
1501-
tokio::time::pause();
15021499
// We now have an idling ActiveRelayActor. If we advance time just a little it
15031500
// should stay alive.
15041501
info!("Stepping time forwards by RELAY_INACTIVE_CLEANUP_TIME / 2");
1502+
tokio::time::pause();
15051503
tokio::time::advance(RELAY_INACTIVE_CLEANUP_TIME / 2).await;
1504+
tokio::time::resume();
15061505

15071506
assert!(
15081507
tokio::time::timeout(Duration::from_millis(100), &mut task)
@@ -1513,15 +1512,20 @@ mod tests {
15131512

15141513
// If we advance time a lot it should finish.
15151514
info!("Stepping time forwards by RELAY_INACTIVE_CLEANUP_TIME");
1515+
tokio::time::pause();
15161516
tokio::time::advance(RELAY_INACTIVE_CLEANUP_TIME).await;
1517+
tokio::time::resume();
1518+
1519+
// We resume time for these timeouts, as there's actual I/O happening,
1520+
// for example closing the TCP stream, so we actually need the tokio
1521+
// runtime to idle a bit while the kernel is doing its thing.
15171522
assert!(
1518-
tokio::time::timeout(Duration::from_millis(1000), task)
1523+
tokio::time::timeout(Duration::from_secs(1), task)
15191524
.await
15201525
.is_ok(),
15211526
"actor task still running"
15221527
);
15231528

1524-
tokio::time::resume();
15251529
cancel_token.cancel();
15261530
Ok(())
15271531
}

0 commit comments

Comments
 (0)