Skip to content

bug: pod spec doesn't match sts template spec #854

@razvan

Description

@razvan

Workaround

Ensure catalog definitions are created before creating the Trino cluster.

Proper solution is probably to have a sperate controller for the catalogs.

Description

TL;DR: The tests create the Trino cluster and then the iceberg catalog definition. In the first reconciliation pass, the operator doesn't find this catalog so it applies a StatefulSet version without the necessary export statements. In a second pass, the operator updates the STS definition but the Pod is not updated. This new behaviour is caused by the MaxUnavailableStatefulSet feature gate in Kubernetes 1.35.


Update 3: The beta feature alone is responsible for this bug. It is present even if the sts/pods are not annotated by other controllers (I uninstalled commons and secret from the cluster).

Update 2: The beta feature MaxUnavailableStatefulSet (enabled by default in v1.35.0) is responsible for this bug. See More Tests below.

Update: Apparently this bug depends on the Kubernetes version. See More Tests below.

In some circumstances the pod spec of Trino pods (coordinator and worker) do not match the template spec of the corresponding StatefulSet objects.

Currently one pod field is known to be different, and this is the container args field.

This was discovered while investigating the failure of the opa-authorization integration tests.

In this test, the Trino pods fail to start. The error message is:

trino java.lang.RuntimeException: Environment variable is not set: CATALOG_ICEBERG_S3_AWS_ACCESS_KEY  
trino at io.airlift.configuration.secrets.SecretsResolver.lambda$getResolvedConfiguration$0(SecretsResolver.java:49)  
trino at io.airlift.configuration.secrets.SecretsResolver.lambda$getResolvedConfiguration$1(SecretsResolver.java:61)  
trino at java.base/java.util.Map.forEach(Unknown Source)  
trino at io.airlift.configuration.secrets.SecretsResolver.getResolvedConfiguration(SecretsResolver.java:56)  
trino at io.airlift.configuration.secrets.SecretsResolver.getResolvedConfiguration(SecretsResolver.java:48)  
trino at io.trino.connector.DefaultCatalogFactory.createCatalog(DefaultCatalogFactory.java:127)  
trino at io.trino.connector.LazyCatalogFactory.createCatalog(LazyCatalogFactory.java:44)  
trino at io.trino.connector.StaticCatalogManager.lambda$loadInitialCatalogs$1(StaticCatalogManager.java:159)  
trino at java.base/java.util.concurrent.FutureTask.run(Unknown Source)  
trino at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)  
trino at java.base/java.util.concurrent.FutureTask.run(Unknown Source)  
trino at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)  
trino at java.base/java.util.concurrent.ExecutorCompletionService.submit(Unknown Source)  
trino at io.trino.util.Executors.executeUntilFailure(Executors.java:46)  
trino at io.trino.connector.StaticCatalogManager.loadInitialCatalogs(StaticCatalogManager.java:153)  
trino at io.trino.server.Server.doStart(Server.java:140)  
trino at io.trino.server.Server.lambda$start$0(Server.java:79)  
trino at io.trino.$gen.Trino_479_stackable0_0_0_dev____20260302_093346_1.run(Unknown Source)  
trino at io.trino.server.Server.start(Server.java:79)  
trino at io.trino.server.TrinoServer.main(TrinoServer.java:37)

The failure comes from the fact that the pod spec is missing the following statements from the args field:

export CATALOG_ICEBERG_S3_AWS_ACCESS_KEY="$(cat /stackable/secrets/s3-credentials-class/accessKey)"
export CATALOG_ICEBERG_S3_AWS_SECRET_KEY="$(cat /stackable/secrets/s3-credentials-class/secretKey)"

These statements are present in the StatefulSet objects trino-coordinator-default and trino-worker-default.

Since the STS objects are correct, the operator doesn't reconcile them so the pods are left in a failure state.

Forcing the operator pod to restart "fixes" the problem.

More tests

Result Kubernetes Version Notes
1.35.0 CI
1.35.1 Local minikube cluster
1.35.1 (with MaxUnavailableStatefulSet feature gate disabled)1 Local minikube cluster
1.34.3 CI

Attempts to fix

  1. ❌ Raise the spec.minReadySeconds for the StatefulSet from 0 (default) to 1 seconds.
    There is also a bug report related to this field which looks like it may also have implications here.
  2. ❌ Change spec.updateStrategy.type for the StatefulSet to OnDelete.
  3. ❌ Do not create poddisruptionbudgets.
  4. ❌ Uninstall the commons, secret and listener operator to make sure no annotations are added to the sts/pods.

Footnotes

  1. To test with the disabled feature gate, I created a new cluster with the command below:
    minikube start \
    --kubernetes-version=v1.35.1 \
    --nodes 1
    --cpus=6
    --memory=33518MB
    --insecure-registry oci.stackable.tech
    --addons metrics-server
    --extra-config=controller-manager.feature-gates=MaxUnavailableStatefulSet=false \

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

Status

Development: In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions