Skip to content

fix(amber): avoids false negative on _check_heartbeat when connection is established but close raises exception#5027

Draft
nathant27 wants to merge 3 commits into
apache:mainfrom
nathant27:fix/heartbeat-reports-serverdown-on-close-error
Draft

fix(amber): avoids false negative on _check_heartbeat when connection is established but close raises exception#5027
nathant27 wants to merge 3 commits into
apache:mainfrom
nathant27:fix/heartbeat-reports-serverdown-on-close-error

Conversation

@nathant27
Copy link
Copy Markdown

What changes were proposed in this PR?

Old version of Heartbeat._check_heartbeat(self) wraps the socket connection and socket close in a single try except, and handles with the same try except. The issue is that if close raises after a successful connection, there is a false negative since it will handle it as if socket connection had failed. In Heartbeat.run(self), this could cause the server to shutdown on close raising exception since _check_heartbeat will return False in this case.
New version separates the connection handling and the close in two adjacent try except blocks, and returns true as long as connection is successfully established as well, getting rid of the false negative.
Old Version:

    def _check_heartbeat(self) -> bool:
        """
        Attempt to connect to JVM on the specific port. If succeeds, it means the
        socket is still available and the JVM is still alive. Otherwise, the JVM
        might have been gone.

        :return: bool, indicating if the socket is available.
        """
        try:
            temp_socket = socket.create_connection(
                (self._parsed_server_host, self._parsed_server_port), timeout=1
            )
            temp_socket.close()
            return True
        except Exception as e:
            logger.warning(f"Server is down with exception: {e}")
            return False

New Version:

    def _check_heartbeat(self) -> bool:
        """
        Attempt to connect to the JVM port. If connection failure, then JVM is dead, or if connection success 
        then JVM is alive even if close() raises. Logs on connection failure and on close error.

        :return: bool, True if connect succeeded, False if connect failed.
        """
        try:
            temp_socket = socket.create_connection(
                (self._parsed_server_host, self._parsed_server_port), timeout=1
            )
        except Exception as e:
            logger.warning(f"Server is down with exception: {e}")
            return False

        try:
            temp_socket.close()
        except Exception as e:
            logger.warning(f"Failed to close socket: {e}")
        
        return True

Any related issues, documentation, discussions?

Closes #4726

How was this PR tested?

Changed old test to reflect new behavior(True on connection succeeds but socket close rasises)
OLD TEST(REMOVED):

    def test_returns_false_when_socket_close_raises(self):
        # Pins the false-negative path: connect succeeds but the subsequent
        # close() throws (e.g. broken pipe on a half-open socket). The bare
        # `except Exception` in _check_heartbeat() catches it and reports
        # "server down", and a regression that narrows that handler would be
        # caught here.
        hb = make_heartbeat()
        fake_sock = MagicMock()
        fake_sock.close.side_effect = OSError("close failed")
        with patch(
            "core.runnables.heartbeat.socket.create_connection",
            return_value=fake_sock,
        ):
            assert hb._check_heartbeat() is False

NEW TEST:

    def test_returns_true_when_connection_succeeds_but_socket_close_raises(self):
        hb = make_heartbeat()
        fake_sock = MagicMock()
        fake_sock.close.side_effect = OSError("close failed")
        with patch(
            "core.runnables.heartbeat.socket.create_connection",
            return_value=fake_sock,
        ):
            assert hb._check_heartbeat() is True

Passes on existing old tests

Was this PR authored or co-authored using generative AI tooling?

No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Heartbeat._check_heartbeat reports server down when only the socket close fails

1 participant