Fix ByteBuffer corruption on reconnect causing IndexOutOfBoundsException#87
Fix ByteBuffer corruption on reconnect causing IndexOutOfBoundsException#87jacomago wants to merge 4 commits intoepics-base:masterfrom
Conversation
shroffk
left a comment
There was a problem hiding this comment.
I am hoping there was no edge case use for the explicit flush()
|
I'm not familiar with the code. The comments are very good, so the next person looking at this gets a good idea of what we're trying to accomplish here. Still, simply based on having used A For the next usage, I get it that we have a bug here where the buffer isn't properly reset. But to fix that, I would expect one of which reset to buffer to either "empty" or to the size of the expected message. Or do we know for sure that |
Right. Who knows? |
|
I'm currently trying to reproduce the bug so I can explicitly test this fix. I think rewind makes more sense. But I just went through code and it is that there is one message of a specific size, because we get the size from the type information. I guess since this is happening in resubscribe the last event or events weren't cleared. |
The flush does nothing, the outputstream returned is from |
Two bugs triggered by connection failures on Java 11+ (including Java 21): 1. EventAddRequest.resubscribeSubscription(): a failed noSyncSend could leave requestMessage with position < capacity. The next reconnect would flip() that partial position into limit, and the reconnect after that would throw IndexOutOfBoundsException at putInt(8, ...) because limit < 12. Fix: reset limit and position to capacity at the top of resubscribeSubscription() so flip() in Transport.submit() always produces position=0, limit=capacity. 2. CATransport.noSyncSend(): channel.socket().getOutputStream().flush() throws IllegalBlockingModeException when the SocketChannel is in non-blocking mode (as used by the Reactor), and is a no-op for TCP output in any mode. The uncaught RuntimeException bypassed the catch(IOException) handler, skipping close(true) and leaving the buffer in a partially-consumed state. Fix: remove the call. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove dead asyncCloseOnError parameter: it was never consulted — the method always called close(true) on error. Both call sites passed true anyway; the one that passed false (flushInternal) already had its own close() in the catch block. - Replace the double loop (outer 16 KB parts chunking + inner unbounded retry) with a single while (buffer.hasRemaining()) loop. Non-blocking channel.write() already returns fewer bytes when the kernel send buffer is full, making the explicit chunking redundant. - Enforce the retry limit: the exit condition (tries <= TRIES) had been commented out, making the loop infinite. Now throws IOException after MAX_SEND_RETRIES (10) attempts so a persistently full send buffer triggers a clean disconnect rather than blocking forever. - Restore interrupt handling: InterruptedException was silently swallowed. Now restores the interrupted status and throws IOException. - Promote the retry constant to a named static field. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
0af4853 to
17b5551
Compare
Reverts default to infinite as before
|
FYI @ralphlange |
|
That sounds like our crashes. Let me show this to my colleagues. |
Two bugs triggered by connection failures on Java 11+ (including Java 21):
EventAddRequest.resubscribeSubscription(): a failed noSyncSend could leave requestMessage with position < capacity. The next reconnect would flip() that partial position into limit, and the reconnect after that would throw IndexOutOfBoundsException at putInt(8, ...) because limit < 12. Fix: reset limit and position to capacity at the top of resubscribeSubscription() so flip() in Transport.submit() always produces position=0, limit=capacity.
CATransport.noSyncSend(): channel.socket().getOutputStream().flush() throws IllegalBlockingModeException when the SocketChannel is in non-blocking mode (as used by the Reactor), and is a no-op for TCP output in any mode. The uncaught RuntimeException bypassed the catch(IOException) handler, skipping close(true) and leaving the buffer in a partially-consumed state. Fix: remove the call.
Fixes #86