Skip to content

Small fixes for reconnect policy and hearbeat#834

Open
Lorak-mmk wants to merge 3 commits intoscylladb:masterfrom
Lorak-mmk:fix-host-reconnect
Open

Small fixes for reconnect policy and hearbeat#834
Lorak-mmk wants to merge 3 commits intoscylladb:masterfrom
Lorak-mmk:fix-host-reconnect

Conversation

@Lorak-mmk
Copy link
Copy Markdown

When looking into #295 and https://scylladb.atlassian.net/browse/SCYLLADB-1251 I found some minor issues that this PR fixes.

  1. Heartbeat exceptions showed some absurd values like Connection heartbeat timeout after -56.143085956573486 seconds, last_host=127.0.16.3:19042. This is because they used timeout argument passed to wait, which is not the whole timeout duration, but the duration LEFT after waiting for other futures.
  2. Reconnect policies had by default a maximum number of attempts. After that, the driver would just give up connecting.

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

This was kept this way to preserve legacy behavior, but this behavior is
absurd and hurtful. We need to get rid of it now.
This better conveys what this is: not a timeut duration from config, but
how much of this timeout is left right now.
The timeout argument in `wait` tells how much we need to wait taking
into consideration that we already waited for some other futures.
This created confusing hearbeat messages, that could even show negative
wait times.
@Lorak-mmk Lorak-mmk self-assigned this Apr 28, 2026
@Lorak-mmk Lorak-mmk requested a review from dkropachev April 28, 2026 21:18
Comment thread cassandra/policies.py
"""

def __init__(self, delay, max_attempts=64):
def __init__(self, delay, max_attempts=None):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate in commit message why it is harmfull, what happens when max-attempts is reached ?

Comment thread cassandra/connection.py
raise self._exception
else:
raise OperationTimedOut("Connection heartbeat timeout after %s seconds" % (timeout,),
raise OperationTimedOut("Connection heartbeat timeout after %s seconds" % (original_timeout,),
Copy link
Copy Markdown
Collaborator

@dkropachev dkropachev Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now i find this exception message very confusing, it says Connection heartbeat timeout after ${original_timeout} seconds, but it waited ${timeout},

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants