Skip to content

Temporal SDK gRPC calls fail with DEADLINE_EXCEEDED after node restart (GraalVM / K8s) #2840

@olegdibrov

Description

@olegdibrov

When a Spring Boot application using the Temporal Java SDK is compiled as a GraalVM native image and deployed in Kubernetes, the application fails to communicate with Temporal after node restarts or pod rescheduling.

The same application works correctly:

  • ✅ On JVM (non-native)
  • ✅ In Docker outside Kubernetes
  • ❌ Fails in Kubernetes when running as GraalVM native image

This suggests a compatibility issue between Temporal Java SDK and GraalVM native runtime, potentially related to:

  • gRPC channel lifecycle
  • DNS resolution / service discovery
  • resource or reflection configuration
  • connection reuse after pod rescheduling

Steps to Reproduce

  1. Install Temporal using official Helm chart - https://github.com/temporalio/helm-charts
  2. Deploy demo application - https://github.com/olegdibrov/temporal-graalvm-k8s
    Install the app:
    helm install control {path/to/chart}

Application logic (executed on startup):

List<DescribeNamespaceResponse> namespaces = workflowClient
    .getWorkflowServiceStubs()
    .blockingStub()
    .listNamespaces(ListNamespacesRequest.newBuilder().build())
    .getNamespacesList();
log.info("Found {} namespaces", namespaces.size());
  1. Restart Kubernetes node OR drain node:
    kubectl drain <node> --ignore-daemonsets
    Observe application startup behavior

Actual Behavior

Application fails to start for 10–30 minutes

Repeated errors:
io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: Deadline CallOptions was exceeded after 9.999s
Temporal cluster is healthy (all pods ready)
Eventually, the application may recover without restart

Expected Behavior

  • Application should reconnect to Temporal immediately after pod restart
  • listNamespaces should succeed consistently
  • No prolonged unavailability if Temporal cluster is healthy

Important Observations

  • Issue only occurs in GraalVM native image
  • Does NOT reproduce on JVM
  • Does NOT reproduce outside Kubernetes
  • Temporal services are reachable and healthy during failure window

Delay (~10–30 minutes) suggests:

  • stale DNS cache
  • broken gRPC channel reuse
  • or native-image-related networking issue

Environment

  • Temporal SDK: 1.33.0
  • GraalVM: 25
  • Java: 25
  • Spring Boot: 3.5.13
  • Kubernetes: v1.30.5

Logs

Example error:

io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: Deadline CallOptions was exceeded after 9.999786125s

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions