[SPARK-54879][CORE] Add final status to Spark History Server for failed executions #53657
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This adds the final application status to SHS for failures with non-zero exit codes to improve debugging.
Decisions made
At first I wanted to always include the final exit code. When using spark-submit in local mode I found that throwing driver exceptions can result in exit code 0. This would be confusing because the status would show as a success. Instead, I decided to scope this to only showing the final status for known failures. Ideally, a future enhancement can be made to not use exit code 0 for failed executions for all cases, but this is a riskier breaking change that I did not want to make here.
Possible future extensions
Right now I only show the exit code for explicit failures (non-zero exit code). There are some cases I wanted to solve, but require more foundational work.
Why are the changes needed?
If a Spark job fails due to an issue on the driver side (e.g. during the commit phase, custom driver code) this is not surfaced anywhere in SHS. This becomes even more confusing when a query/job has run successfully. SHS will show all jobs as completed successfully and the only way to know the job failed is to look in the logs. Before this change this is what a failed driver side application would look like in SHS. It looks like a success.

After the change this is what it looks like:

Does this PR introduce any user-facing change?
Yes, it adds the application final status to the SHS overview when an execution fails and has a non-zero exit code.
How was this patch tested?
Unit tests and manually tested SHS.
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code 2.0.76 (for unit tests only)