Fix process leak in PosixCommandBasedStatisticsGetter#1717
Fix process leak in PosixCommandBasedStatisticsGetter#1717orange475 wants to merge 2 commits intoapache:masterfrom
Conversation
|
@xinyuiscool @shanthoosh @sborya Could you take a look at this? Thank you |
| } catch (InterruptedException e) { | ||
| Thread.currentThread().interrupt(); | ||
| throw new IOException("Interrupted while waiting for command to complete", e); | ||
| } |
There was a problem hiding this comment.
It might be good to add:
} finally {
executable.destroy();
}
here just to make sure.
| // Wait for the process to complete to prevent resource leak | ||
| try { | ||
| executable.waitFor(); | ||
| } catch (InterruptedException e) { |
There was a problem hiding this comment.
Before we re-interrupt, we should probably try to kill it off. e.g.
} catch (InterruptedException e) {
executable.destroy();
Thread.currentThread().interrupt();
throw new IOException("Interrupted while waiting for command to complete", e);
}
There was a problem hiding this comment.
We'll do destroy in finally block
| @@ -55,6 +55,15 @@ private List<String> getAllCommandOutput(String[] cmdArray) throws IOException { | |||
| } | |||
| } | |||
| processReader.close(); | |||
There was a problem hiding this comment.
Instead of explicitly calling close, this should probably use try-with-resources instead. Also, since we're already changing this, any thoughts on draining the stderr buffer here too? Although (very) unlikely, if the process spams stderr and fills the pipe, it'll deadlock on the wait until something drains it off. e.g.
try (BufferedReader processReader = new BufferedReader(new InputStreamReader(executable.getInputStream()));
BufferedReader errorReader = new BufferedReader(new InputStreamReader(executable.getErrorStream()))) {
String line;
while ((line = processReader.readLine()) != null) {
if (!line.isEmpty()) {
psOutput.add(line);
}
}
while ((line = errorReader.readLine()) != null) {
if (!line.isEmpty()) {
log.error("stderr while running {}: {}", cmdArray, line);
}
}
}
There was a problem hiding this comment.
Added the stderr buffer draining.
|
Updated |
Improve stderr drain performance by only record the first 100 lines
0b5a014 to
7f6d456
Compare
|
|
||
| // Wait for the process to complete to prevent resource leak | ||
| try { | ||
| boolean finished = executable.waitFor(COMMAND_TIMEOUT_SECONDS, TimeUnit.SECONDS); |
There was a problem hiding this comment.
What will be the behavior if the executable did not complete within the timeout?
There was a problem hiding this comment.
It will throw an exception and finally destroy the executable. This is to avoid infinite wait. We could also remove this TTL to just call executable.waitFor() or with a longer default time. @bringhurst what's your thought on this?
|
|
||
| // Wait for the process to complete to prevent resource leak | ||
| try { | ||
| boolean finished = executable.waitFor(COMMAND_TIMEOUT_SECONDS, TimeUnit.SECONDS); |
There was a problem hiding this comment.
Do we want to make this configurable per job.
There was a problem hiding this comment.
I dont think this would be necessary at this point as we don't want to expose this to users for tuning. Make it configurable would also require a wider refactoring.
|
This change was discussed offline. It was updated to a minute to sync up with the metrics reporting interval. There's no need to restrict it more than that. |
| while ((line = processReader.readLine()) != null) { | ||
| if (!line.isEmpty()) { | ||
| psOutput.add(line); | ||
| try (BufferedReader processReader = new BufferedReader(new InputStreamReader(executable.getInputStream())); |
There was a problem hiding this comment.
Can we please clarify how did we have verify and test the changes end-to-end such that it addresses the leack problem?
Summary
PosixCommandBasedStatisticsGetterwhere shell processes spawned to collect systemstatistics were not being properly reaped
process.waitFor()call after reading command output to ensure child processes complete and are removedfrom the process table
Problem
The
PosixCommandBasedStatisticsGetterclass executes shell commands (ps, grep) to collect memory statistics butwasn't waiting for these processes to complete. This caused zombie processes to accumulate over time in
long-running Samza containers, potentially leading to resource exhaustion.
Solution
process.waitFor()after closing the process reader to ensure proper process cleanupTest Plan