Skip to content

💥 Use Temporal Failures for Nexus Error Serialization#1336

Open
Quinn-With-Two-Ns wants to merge 2 commits intomainfrom
nexus-error-serialization-quinn
Open

💥 Use Temporal Failures for Nexus Error Serialization#1336
Quinn-With-Two-Ns wants to merge 2 commits intomainfrom
nexus-error-serialization-quinn

Conversation

@Quinn-With-Two-Ns
Copy link
Contributor

@Quinn-With-Two-Ns Quinn-With-Two-Ns commented Feb 19, 2026

Note This PR can't be merged until the corresponding Core PR (temporalio/sdk-core#1109) is merged and updated in this branch and the nexus-rpc PR (nexus-rpc/sdk-python#45) is merged and released

What was changed

  • Nexus error serialization now uses Temporal Failures: Instead of the custom Nexus Failure JSON format (nexus.v1.Failure with metadata/details), Nexus handler errors and operation errors are now serialized as standard Temporal failure.v1.Failure protos via the SDK's DataConverter/FailureConverter. This includes HandlerErrornexus_handler_failure_info and OperationErrorCancelledError/ApplicationError conversions.
  • Simplified _nexus.py worker code: Removed the _nexus_error_to_nexus_failure_proto, _operation_error_to_proto, and _handler_error_to_proto helper methods. Errors are now encoded directly through data_converter.encode_failure() into the completion/response proto's failure field.
  • Simplified _exception_to_handler_error: Removed the workaround that inserted an extra ApplicationError at the head of the HandlerError cause chain (previously needed to preserve the HandlerError message when hoisted to the Nexus Failure).
  • Converter improvements: _error_to_failure for HandlerError now uses error.message and error.stack_trace instead of str(error). from_failure uses match/case instead of if/elif chains.
  • Updated protos and sdk-core: Updated nexus_pb2, workflow_activation_pb2, workflow_commands_pb2, and the sdk-core submodule to support the new Nexus failure fields on completion protos.
  • Removed HTTP-based Nexus tests: Deleted test_handler.py and test_handler_async_operation.py which tested via direct HTTP calls (the HTTP interface is not user-facing). Converted remaining tests (test_workflow_run_operation.py, test_dynamic_creation_of_user_handler_classes.py) to use workflow callers.
  • Refactored error chain tests: Replaced the class-based ErrorConversionTestCase registry pattern with typed dataclasses (ExpectedNexusOperationError, ExpectedHandlerError, ExpectedApplicationError, ExpectedCancelledError) and explicit ErrorTestService handler methods.
  • Added failure converter unit tests: New tests in test_converter.py for round-trip serialization of HandlerError, NexusOperationError, OperationError, and related types.
  • Cleaned up stale TODOs: Removed TODO comments marked as "Won't Do" in the Nexus TODO triage, and removed the unused self._interceptors variable in the Nexus worker.

Why?

The previous Nexus error serialization used a custom JSON-based format (nexus.v1.Failure with metadata and JSON details) that was specific to the Nexus HTTP protocol. With the move to have Core SDK handle protocol-level conversion based on server capabilities, the Python SDK now only needs to produce standard Temporal Failure protos.

Checklist

  1. How was this tested:

    • Existing Nexus workflow caller tests converted and expanded to validate error chains end-to-end
    • New unit tests for FailureConverter round-trip serialization of Nexus error types in tests/test_converter.py
    • New typed error chain validation in tests/nexus/test_workflow_caller_error_chains.py
  2. Any docs updates needed?

    • No

Note

Medium Risk
Generated API/proto shape changes for Nexus responses can break consumers relying on the old operation_error variant or missing new capability fields, and behavior depends on a non-released nexus-rpc git source override.

Overview
Nexus protos are updated to support Temporal failure serialization by adding a new StartOperationResponse.failure variant of type temporal.api.failure.v1.Failure and deprecating the older operation_error variant, plus extending nexus.v1.Failure with stack_trace and recursive cause fields.

Adds Request.Capabilities (including temporal_failure_responses) to allow capability negotiation, wires in the new failure proto dependency/imports in generated files, and exposes CountSchedulesRequest/Response from workflowservice.v1. pyproject.toml loosens nexus-rpc pinning and adds a uv git source override for a specific nexus-rpc branch.

Written by Cursor Bugbot for commit b58ce86. This will update automatically on new commits. Configure here.

@Quinn-With-Two-Ns Quinn-With-Two-Ns changed the title Nexus error ser. 💥 Use Temporal Failures for Nexus Error Serialization Feb 19, 2026
@Quinn-With-Two-Ns Quinn-With-Two-Ns marked this pull request as ready for review February 19, 2026 20:07
@Quinn-With-Two-Ns Quinn-With-Two-Ns requested a review from a team as a code owner February 19, 2026 20:07
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

keywords = ["temporal", "workflow"]
dependencies = [
"nexus-rpc==1.3.0",
"nexus-rpc",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unpinned dependency and development git source for nexus-rpc

High Severity

The nexus-rpc dependency was changed from ==1.3.0 to a bare "nexus-rpc" with no version constraint, and a [tool.uv.sources] git override pointing to a feature branch (amazzeo/add-failure) was added. The git source only applies to uv-based installs; pip users installing the published package would resolve nexus-rpc from PyPI with no version floor, potentially getting an old version that lacks the new failure serialization APIs. Every other dependency in the list has version bounds. The version constraint needs to be restored with appropriate bounds, and the [tool.uv.sources] development override needs to be removed before merge.

Additional Locations (1)

Fix in Cursor Fix in Web

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b58ce8660f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

keywords = ["temporal", "workflow"]
dependencies = [
"nexus-rpc==1.3.0",
"nexus-rpc",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Pin nexus-rpc to a version that supports new failure fields

This change removes the explicit nexus-rpc==1.3.0 requirement while the new converter path now constructs nexusrpc.HandlerError with stack_trace/original_failure (see DefaultFailureConverter.from_failure), which are not guaranteed on older releases. In environments that install from PyPI (which ignore [tool.uv.sources]), resolving to an incompatible nexus-rpc version will raise runtime TypeError the first time a Nexus handler failure is decoded, so this dependency needs a minimum/compatible version constraint.

Useful? React with 👍 / 👎.

Comment on lines +403 to +407
match err.state:
case nexusrpc.OperationErrorState.CANCELED:
raise CancelledError(err.message) from err.__cause__
case nexusrpc.OperationErrorState.FAILED:
raise ApplicationError(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle unknown OperationError states explicitly

The match err.state block only raises for CANCELED and FAILED; there is no default branch. If an unexpected state is received (e.g., from a newer nexus-rpc enum value or a malformed custom raise), this except nexusrpc.OperationError path falls through without returning a StartOperationResponse, and _handle_start_operation_task proceeds as if start-operation succeeded with an invalid empty response variant instead of surfacing a failure.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments