Skip to content

Conversation

@robreeves
Copy link
Contributor

@robreeves robreeves commented Dec 31, 2025

What changes were proposed in this pull request?

This adds the final application status to SHS for failures with non-zero exit codes to improve debugging.

Decisions made

At first I wanted to always include the final exit code. When using spark-submit in local mode I found that throwing driver exceptions can result in exit code 0. This would be confusing because the status would show as a success. Instead, I decided to scope this to only showing the final status for known failures. Ideally, a future enhancement can be made to not use exit code 0 for failed executions for all cases, but this is a riskier breaking change that I did not want to make here.

Possible future extensions

Right now I only show the exit code for explicit failures (non-zero exit code). There are some cases I wanted to solve, but require more foundational work.

  1. Help users understand when a driver crashed. Right now SHS will show the app as still running. This is another reason I didn't want to show the status outside of known failures. This is tricky and I wanted to infer it based on the time gap between the last event in the event log and current time, but that information is not there now (and I'm not sure on the approach).
  2. Enhance exit codes to never use 0 during a failure so it can help users debug more scenarios.
  3. Instead of just showing an exit code, map it to its meaning. This requires more work since it is resource manager specific.

Why are the changes needed?

If a Spark job fails due to an issue on the driver side (e.g. during the commit phase, custom driver code) this is not surfaced anywhere in SHS. This becomes even more confusing when a query/job has run successfully. SHS will show all jobs as completed successfully and the only way to know the job failed is to look in the logs. Before this change this is what a failed driver side application would look like in SHS. It looks like a success.
image

After the change this is what it looks like:
image

Does this PR introduce any user-facing change?

Yes, it adds the application final status to the SHS overview when an execution fails and has a non-zero exit code.

How was this patch tested?

Unit tests and manually tested SHS.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code 2.0.76 (for unit tests only)

@github-actions
Copy link

JIRA Issue Information

=== New Feature SPARK-54879 ===
Summary: Add final application status to Spark History Server
Assignee: None
Status: Open
Affected: ["4.2.0"]


This comment was automatically generated by GitHub Actions

@robreeves robreeves changed the title [SPARK-54879][CORE] Add final status to Spark History Server [SPARK-54879][CORE] Add final status to Spark History Server for failed executions Jan 2, 2026
@robreeves robreeves marked this pull request as ready for review January 2, 2026 05:08
case Some(code) if code != 0 =>
<li>
<strong>Final Status:</strong>
{s"Failure (exit code: $code)"}
Copy link
Member

@pan3793 pan3793 Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be

  • Succeeded
  • Failed (exit code: $code)

?

displaying "Succeeded" status helps the user to distinguish the running/crashed app (no SparkListenerApplicationEnd event) from normally finished app

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants