Temporal SDK gRPC calls fail with DEADLINE_EXCEEDED after node restart (GraalVM / K8s)

When a Spring Boot application using the Temporal Java SDK is compiled as a GraalVM native image and deployed in Kubernetes, the application fails to communicate with Temporal after node restarts or pod rescheduling.

The same application works correctly:

- ✅ On JVM (non-native)
- ✅ In Docker outside Kubernetes
- ❌ Fails in Kubernetes when running as GraalVM native image

This suggests a compatibility issue between Temporal Java SDK and GraalVM native runtime, potentially related to:

- gRPC channel lifecycle
- DNS resolution / service discovery
- resource or reflection configuration
- connection reuse after pod rescheduling

### Steps to Reproduce

1. Install Temporal using official Helm chart - https://github.com/temporalio/helm-charts
2. Deploy demo application - https://github.com/olegdibrov/temporal-graalvm-k8s
Install the app:
`helm install control {path/to/chart}`

Application logic (executed on startup):

```
List<DescribeNamespaceResponse> namespaces = workflowClient
    .getWorkflowServiceStubs()
    .blockingStub()
    .listNamespaces(ListNamespacesRequest.newBuilder().build())
    .getNamespacesList();
log.info("Found {} namespaces", namespaces.size());
```

3. Restart Kubernetes node OR drain node:
`kubectl drain <node> --ignore-daemonsets`
Observe application startup behavior

### Actual Behavior

Application fails to start for 10–30 minutes

Repeated errors:
`io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: Deadline CallOptions was exceeded after 9.999s`
Temporal cluster is healthy (all pods ready)
Eventually, the application may recover without restart

### Expected Behavior

- Application should reconnect to Temporal immediately after pod restart
- listNamespaces should succeed consistently
- No prolonged unavailability if Temporal cluster is healthy


### Important Observations

- Issue only occurs in GraalVM native image
- Does NOT reproduce on JVM
- Does NOT reproduce outside Kubernetes
- Temporal services are reachable and healthy during failure window


Delay (~10–30 minutes) suggests:

- stale DNS cache
- broken gRPC channel reuse
- or native-image-related networking issue


### Environment

- Temporal SDK: 1.33.0
- GraalVM: 25
- Java: 25
- Spring Boot: 3.5.13
- Kubernetes: v1.30.5


### Logs

Example error:

`io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED: Deadline CallOptions was exceeded after 9.999786125s`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Temporal SDK gRPC calls fail with DEADLINE_EXCEEDED after node restart (GraalVM / K8s) #2840

Steps to Reproduce

Actual Behavior

Expected Behavior

Important Observations

Environment

Logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Temporal SDK gRPC calls fail with DEADLINE_EXCEEDED after node restart (GraalVM / K8s) #2840

Description

Steps to Reproduce

Actual Behavior

Expected Behavior

Important Observations

Environment

Logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions