HDFS-17916: DataStreamer#processDatanodeOrExternalError() fails to return byte arrays to ByteArrayManager#8466
Conversation
…turn byte arrays to ByteArrayManager
|
💔 -1 overall
This message was automatically generated. |
|
Yetus failure appears unrelated to this PR |
|
Looks right by me as well. Heya @ZanderXu , anything else to do here? Do you mind handling the commits on this one? I'm not technically a committer. |
|
@steveloughran @cnauroth @jojochuang any chance you have a moment to see this one home? Much appreciated! |
| // release() is performed by the streamer thread, so allow a brief | ||
| // moment for that thread to settle after close() returns. | ||
| GenericTestUtils.waitFor(() -> managers.countAllocated() == 0, 50, 5000); | ||
| assertEquals(0, managers.countAllocated(), |
There was a problem hiding this comment.
assertj
assertThat(managers.countAllocated())
.describedAs("count allocated")
.isEqualTo(0);|
|
||
| ByteArrayManager bam = | ||
| fs.getClient().getClientContext().getByteArrayManager(); | ||
| assertTrue(bam instanceof ByteArrayManager.Impl, |
There was a problem hiding this comment.
can you use AssertJ asserts, as they can backport to branch-3.4 (java8, junit4
assertThat(bam)
.describedAs(""expected bounded ByteArrayManager ")
.isInstanceOf(ByteArrayManager.Impl); |
@ndimiduk i don't go near hdfs, but for you I will, especially as this is so simple production-side. Commented on the tests, which are mainly about using assertJ @charlesconnell once this is in, you'll need to backport prs to branch-3.5 (straightforward) and branch-3.4, where the junit4/5 migration adds work. Using assertJ means the only homework here is changing the import of the Test class |
|
Thank you for the review @steveloughran, I've made your requested changes |
steveloughran
left a comment
There was a problem hiding this comment.
+1 pending test results
|
Tests ran for 24 hours and were killed |
|
💔 -1 overall
This message was automatically generated. |
|
@steveloughran I believe that the failed tests here are unrelated to my change. If you are satisfied with it, could you please merge? |
steveloughran
left a comment
There was a problem hiding this comment.
+1
failures look unrelated to me too, given this is the error case. I have just had to review this code carefully though to be confident that this is good.
Can I add that the code around 1292 also seems to need a similar cleanup: the buffer release should be moved into the finally(), rather than the end of the successful path.
|
@charlesconnell merged to trunk. Can you submit a backport PR against branch-3.5 so we can see how things go there? |
|
@steveloughran Thank you for the merge. The backport is open at #8516. The code around line 1292 could be improved to be easier to reason about, but I don't think there is a byte array leak there. The |
Description of PR
A certain code path in the DFS client
DataStreamerappears to discard DFSPacket objects without returning their contained byte arrays to theByteArrayManager. I discovered this bug at my company after we had HBase server threads hung for hours atByteArrayManager#allocate(). Because the leak only happens in an error-handling path, the problem requires an unhealthy HDFS cluster in order to be exposed.I took a heap dump of a high-uptime but relatively healthy HBase server, and found evidence of leaked byte arrays there too. In the heap dump, the two
FixedLengthManagersboth hadnumAllocated= 9, but there were zero liveDFSPacketobjects. This suggests that the byte arrays, and their containingDFSPacketshad been garbage collected, unbeknownst to FixedLengthManager.In
DataStreamer.javastarting at line 1410, theDFSPacketthat isremove()'d fromdataQueueis allowed to be garbage collected without further interaction.This PR adds this line in order to return the packet's buffer to the ByteArrayManager:
Contains content generated by Claude Opus 4.7
How was this patch tested?
New unit tests added
For code changes:
LICENSE,LICENSE-binary,NOTICE-binaryfiles?AI Tooling
If an AI tool was used:
where is the name of the AI tool used.
https://www.apache.org/legal/generative-tooling.html