Skip to content

Conversation

@kaovilai
Copy link
Member

@kaovilai kaovilai commented Nov 7, 2025

Signed-off-by: Tiger Kaovilai tkaovila@redhat.com

Why the changes were made

This PR enhances the reliability of MySQL-based E2E test applications by adding comprehensive health probes to prevent race conditions and improve startup coordination during backup/restore operations.

Problem

MySQL E2E tests (particularly "MySQL application two Vol CSI") were experiencing intermittent failures with symptoms including:

  • Multi-attach volume errors during restore operations
  • Pods marked as "ready" before MySQL was actually accepting connections
  • TodoList application attempting to connect before MySQL initialization completed
  • Tests timing out waiting for application availability

Root Cause

The MySQL deployments only had:

  • ✅ startupProbe (MySQL exec command)
  • ⚠️ livenessProbe (TCP socket only - doesn't verify database functionality)
  • Missing readinessProbe - causing Kubernetes to route traffic prematurely

The TodoList deployments had no health probes at all.

Changes Made

MySQL Container Enhancements

All three MySQL deployment files updated with:

Added readinessProbe:

readinessProbe:
  exec:
    command:
    - /usr/bin/timeout
    - 1s
    - /usr/bin/mysql
    - $(MYSQL_DATABASE)
    - -h
    - 127.0.0.1
    - -u$(MYSQL_USER)
    - -p$(MYSQL_PASSWORD)
    - -e
    - SELECT 1
  initialDelaySeconds: 10
  periodSeconds: 5
  timeoutSeconds: 2
  failureThreshold: 3

Enhanced livenessProbe:
Changed from TCP socket to MySQL exec command for better failure detection:

livenessProbe:
  exec:
    command:
    - /usr/bin/timeout
    - 1s
    - /usr/bin/mysql
    - $(MYSQL_DATABASE)
    - -h
    - 127.0.0.1
    - -u$(MYSQL_USER)
    - -p$(MYSQL_PASSWORD)
    - -e
    - SELECT 1
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 3

TodoList Container Enhancements

Added HTTP-based health probes:

livenessProbe:
  httpGet:
    path: /
    port: 8000
  initialDelaySeconds: 30
  periodSeconds: 10
readinessProbe:
  httpGet:
    path: /
    port: 8000
  initialDelaySeconds: 10
  periodSeconds: 5

Files Modified

  • tests/e2e/sample-applications/mysql-persistent/mysql-persistent.yaml
  • tests/e2e/sample-applications/mysql-persistent/mysql-persistent-csi.yaml
  • tests/e2e/sample-applications/mysql-persistent/mysql-persistent-twovol-csi.yaml

Expected Benefits

  1. Prevents premature pod ready status: readinessProbe ensures MySQL is actually accepting queries before traffic is routed
  2. Better failure detection: exec-based livenessProbe detects database issues, not just port availability
  3. Coordinated startup: todolist won't receive traffic until it's actually serving requests
  4. Resolves multi-attach errors: Proper probe sequencing prevents volume attachment race conditions during restore
  5. Matches mongo-persistent reliability: Brings MySQL deployments to same quality level as working MongoDB configuration

How to test the changes made

Minikube Testing (Verified)

# Deploy the updated MySQL application
kubectl apply -f tests/e2e/sample-applications/mysql-persistent/mysql-persistent.yaml

# Wait for pods to become ready (should take ~30-40 seconds)
kubectl get pods -n mysql-persistent -w

# Expected output:
# NAME                       READY   STATUS    RESTARTS   AGE
# mysql-xxxxx-xxxxx          2/2     Running   0          45s
# todolist-xxxxx-xxxxx       1/1     Running   0          45s

# Verify MySQL connectivity
kubectl exec -n mysql-persistent deployment/mysql -c mysql -- \
  mysql -utodolist -h 127.0.0.1 -uchangeme -pchangeme -e "SELECT 1"

# Verify TodoList application
kubectl exec -n mysql-persistent deployment/mysql -c curl-tool -- \
  curl -s http://todolist:8000/ | head -20

E2E Testing

The enhanced probes should improve reliability of MySQL-related E2E tests, particularly:

  • MySQL application two Vol CSI
  • MySQL backup/restore scenarios
  • Any tests involving MySQL persistent storage

Probe Behavior Verification

# Check probe status in pod description
kubectl describe pod -n mysql-persistent -l app=mysql

# Look for successful probe executions in events
kubectl get events -n mysql-persistent --sort-by='.lastTimestamp'

Note

Responses generated with Claude

@coderabbitai
Copy link

coderabbitai bot commented Nov 7, 2025

Walkthrough

Kubernetes health probe configurations are updated across three MySQL persistent deployment manifests. MySQL container livenessProbes transition from tcpSocket to exec-based checks executing SELECT 1 queries, readinessProbes are added for MySQL containers, and httpGet-based livenessProbe and readinessProbe are added to todolist containers.

Changes

Cohort / File(s) Summary
MySQL Persistent Deployment Health Probes
tests/e2e/sample-applications/mysql-persistent/mysql-persistent.yaml, tests/e2e/sample-applications/mysql-persistent/mysql-persistent-csi.yaml, tests/e2e/sample-applications/mysql-persistent/mysql-persistent-twovol-csi.yaml
Replaced MySQL container livenessProbes from tcpSocket-based checks to exec-based health checks executing SELECT 1 queries; added readinessProbes with exec-based MySQL health checks (including initialDelaySeconds, periodSeconds, timeoutSeconds, failureThreshold); added livenessProbe and readinessProbe (httpGet on port 8000, path "/") to todolist containers with distinct timing parameters.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Multiple YAML files with similar but individually distinct probe configurations require verification across each file
  • Different probe types (exec for MySQL, httpGet for todolist) and their respective timing parameters (initialDelaySeconds, periodSeconds, timeoutSeconds, failureThreshold) need validation for each container
  • mysql-persistent-twovol-csi.yaml contains the most comprehensive probe configuration and should be reviewed first as a reference
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link

openshift-ci bot commented Nov 7, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kaovilai

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 7, 2025
…create MySQL Minikube test configuration

Signed-off-by: Tiger Kaovilai <tkaovila@redhat.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (3)
tests/e2e/sample-applications/mysql-persistent/mysql-persistent-csi.yaml (1)

124-140: Critical: Duplicate issue—environment variables will not expand in exec probe command array.

Same variable expansion issue as the readinessProbe (lines 107-123). The livenessProbe must also be wrapped in a shell.

Apply the same fix pattern:

            livenessProbe:
-              exec:
-                command:
-                - /usr/bin/timeout
-                - 1s
-                - /usr/bin/mysql
-                - $(MYSQL_DATABASE)
-                - -h
-                - 127.0.0.1
-                - -u$(MYSQL_USER)
-                - -p$(MYSQL_PASSWORD)
-                - -e
-                - SELECT 1
+              exec:
+                command:
+                - /bin/sh
+                - -c
+                - /usr/bin/timeout 1s /usr/bin/mysql $(MYSQL_DATABASE) -h 127.0.0.1 -u$(MYSQL_USER) -p$(MYSQL_PASSWORD) -e "SELECT 1"
              initialDelaySeconds: 30
              periodSeconds: 10
              timeoutSeconds: 5
              failureThreshold: 3
tests/e2e/sample-applications/mysql-persistent/mysql-persistent-twovol-csi.yaml (1)

116-132: Critical: Duplicate issue—environment variables will not expand in exec probe command array.

Same variable expansion issue affecting the livenessProbe. Must wrap in shell for variable substitution to work.

Apply the same fix pattern:

            livenessProbe:
-              exec:
-                command:
-                - /usr/bin/timeout
-                - 1s
-                - /usr/bin/mysql
-                - $(MYSQL_DATABASE)
-                - -h
-                - 127.0.0.1
-                - -u$(MYSQL_USER)
-                - -p$(MYSQL_PASSWORD)
-                - -e
-                - SELECT 1
+              exec:
+                command:
+                - /bin/sh
+                - -c
+                - /usr/bin/timeout 1s /usr/bin/mysql $(MYSQL_DATABASE) -h 127.0.0.1 -u$(MYSQL_USER) -p$(MYSQL_PASSWORD) -e "SELECT 1"
              initialDelaySeconds: 30
              periodSeconds: 10
              timeoutSeconds: 5
              failureThreshold: 3
tests/e2e/sample-applications/mysql-persistent/mysql-persistent.yaml (1)

137-153: Critical: Duplicate issue—environment variables will not expand in exec probe command array.

The livenessProbe has the same variable expansion issue. Additionally, note that the pre-existing startupProbe (lines 154-170) uses the same problematic command pattern and should be fixed as well.

Apply the same fix pattern:

            livenessProbe:
-              exec:
-                command:
-                - /usr/bin/timeout
-                - 1s
-                - /usr/bin/mysql
-                - $(MYSQL_DATABASE)
-                - -h
-                - 127.0.0.1
-                - -u$(MYSQL_USER)
-                - -p$(MYSQL_PASSWORD)
-                - -e
-                - SELECT 1
+              exec:
+                command:
+                - /bin/sh
+                - -c
+                - /usr/bin/timeout 1s /usr/bin/mysql $(MYSQL_DATABASE) -h 127.0.0.1 -u$(MYSQL_USER) -p$(MYSQL_PASSWORD) -e "SELECT 1"
              initialDelaySeconds: 30
              periodSeconds: 10
              timeoutSeconds: 5
              failureThreshold: 3

Consider also fixing the startupProbe at lines 154-170 to use the same shell-wrapped approach.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 9feb185 and 3916802.

📒 Files selected for processing (3)
  • tests/e2e/sample-applications/mysql-persistent/mysql-persistent-csi.yaml (2 hunks)
  • tests/e2e/sample-applications/mysql-persistent/mysql-persistent-twovol-csi.yaml (2 hunks)
  • tests/e2e/sample-applications/mysql-persistent/mysql-persistent.yaml (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • tests/e2e/sample-applications/mysql-persistent/mysql-persistent-csi.yaml
  • tests/e2e/sample-applications/mysql-persistent/mysql-persistent-twovol-csi.yaml
  • tests/e2e/sample-applications/mysql-persistent/mysql-persistent.yaml
🔇 Additional comments (3)
tests/e2e/sample-applications/mysql-persistent/mysql-persistent-csi.yaml (1)

214-225: TodoList health probes look good.

The httpGet-based probes for the todolist container are correctly formatted and have appropriate timing (10s initialDelay for readiness, 30s for liveness).

tests/e2e/sample-applications/mysql-persistent/mysql-persistent-twovol-csi.yaml (1)

205-216: TodoList health probes look good.

The httpGet-based probes for the todolist container are correctly formatted and have appropriate timing.

tests/e2e/sample-applications/mysql-persistent/mysql-persistent.yaml (1)

227-238: TodoList health probes look good.

The httpGet-based probes for the todolist container are correctly formatted with appropriate timing.

Comment on lines +107 to +123
readinessProbe:
exec:
command:
- /usr/bin/timeout
- 1s
- /usr/bin/mysql
- $(MYSQL_DATABASE)
- -h
- 127.0.0.1
- -u$(MYSQL_USER)
- -p$(MYSQL_PASSWORD)
- -e
- SELECT 1
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 2
failureThreshold: 3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Environment variables will not expand in exec probe command array.

Kubernetes exec probes execute commands directly without shell interpretation. The shell variables $(MYSQL_DATABASE), $(MYSQL_USER), and $(MYSQL_PASSWORD) in the command array will be passed as literal strings to the mysql binary, causing the health check to fail. For example, the mysql command will receive the literal argument "$(MYSQL_DATABASE)" instead of the actual database name.

The exec command must be wrapped in a shell to enable variable expansion. Apply this fix:

            readinessProbe:
-              exec:
-                command:
-                - /usr/bin/timeout
-                - 1s
-                - /usr/bin/mysql
-                - $(MYSQL_DATABASE)
-                - -h
-                - 127.0.0.1
-                - -u$(MYSQL_USER)
-                - -p$(MYSQL_PASSWORD)
-                - -e
-                - SELECT 1
+              exec:
+                command:
+                - /bin/sh
+                - -c
+                - /usr/bin/timeout 1s /usr/bin/mysql $(MYSQL_DATABASE) -h 127.0.0.1 -u$(MYSQL_USER) -p$(MYSQL_PASSWORD) -e "SELECT 1"
              initialDelaySeconds: 10
              periodSeconds: 5
              timeoutSeconds: 2
              failureThreshold: 3
🤖 Prompt for AI Agents
In tests/e2e/sample-applications/mysql-persistent/mysql-persistent-csi.yaml
around lines 107 to 123, the exec readinessProbe uses literal array entries like
$(MYSQL_DATABASE) which will not be expanded; wrap the whole command in a shell
so environment variables are expanded by replacing the array with a shell
invocation (e.g. /bin/sh -c) and pass the full timeout + mysql invocation as a
single string so the shell expands MYSQL_DATABASE, MYSQL_USER and MYSQL_PASSWORD
at runtime; ensure proper quoting of the SELECT statement and timeout invocation
within that single string.

Comment on lines +99 to +115
readinessProbe:
exec:
command:
- /usr/bin/timeout
- 1s
- /usr/bin/mysql
- $(MYSQL_DATABASE)
- -h
- 127.0.0.1
- -u$(MYSQL_USER)
- -p$(MYSQL_PASSWORD)
- -e
- SELECT 1
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 2
failureThreshold: 3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Environment variables will not expand in exec probe command array.

Same critical variable expansion issue as identified in mysql-persistent-csi.yaml. The readinessProbe shell variables must be wrapped in /bin/sh -c for expansion to occur.

Apply this fix:

            readinessProbe:
-              exec:
-                command:
-                - /usr/bin/timeout
-                - 1s
-                - /usr/bin/mysql
-                - $(MYSQL_DATABASE)
-                - -h
-                - 127.0.0.1
-                - -u$(MYSQL_USER)
-                - -p$(MYSQL_PASSWORD)
-                - -e
-                - SELECT 1
+              exec:
+                command:
+                - /bin/sh
+                - -c
+                - /usr/bin/timeout 1s /usr/bin/mysql $(MYSQL_DATABASE) -h 127.0.0.1 -u$(MYSQL_USER) -p$(MYSQL_PASSWORD) -e "SELECT 1"
              initialDelaySeconds: 10
              periodSeconds: 5
              timeoutSeconds: 2
              failureThreshold: 3
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
readinessProbe:
exec:
command:
- /usr/bin/timeout
- 1s
- /usr/bin/mysql
- $(MYSQL_DATABASE)
- -h
- 127.0.0.1
- -u$(MYSQL_USER)
- -p$(MYSQL_PASSWORD)
- -e
- SELECT 1
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 2
failureThreshold: 3
readinessProbe:
exec:
command:
- /bin/sh
- -c
- /usr/bin/timeout 1s /usr/bin/mysql $(MYSQL_DATABASE) -h 127.0.0.1 -u$(MYSQL_USER) -p$(MYSQL_PASSWORD) -e "SELECT 1"
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 2
failureThreshold: 3
🤖 Prompt for AI Agents
In
tests/e2e/sample-applications/mysql-persistent/mysql-persistent-twovol-csi.yaml
around lines 99 to 115, the readinessProbe uses an exec command array containing
shell variables (e.g. $(MYSQL_DATABASE), $(MYSQL_USER), $(MYSQL_PASSWORD)) which
will not be expanded; change the exec to run a shell so variables expand by
replacing the command array with a single shell invocation like /bin/sh -c and
pass the full command as one string (e.g. use /usr/bin/timeout 1s /usr/bin/mysql
"$(MYSQL_DATABASE)" -h 127.0.0.1 -u"$(MYSQL_USER)" -p"$(MYSQL_PASSWORD)" -e
'SELECT 1'), keeping the existing probe timings intact.

Comment on lines +120 to +136
readinessProbe:
exec:
command:
- /usr/bin/timeout
- 1s
- /usr/bin/mysql
- $(MYSQL_DATABASE)
- -h
- 127.0.0.1
- -u$(MYSQL_USER)
- -p$(MYSQL_PASSWORD)
- -e
- SELECT 1
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 2
failureThreshold: 3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Environment variables will not expand in exec probe command array.

Same critical variable expansion issue identified across all three YAML files. The readinessProbe command array contains literal $(MYSQL_DATABASE), $(MYSQL_USER), and $(MYSQL_PASSWORD) strings that Kubernetes will not expand. These must be wrapped in a shell for variable substitution.

Apply this fix:

            readinessProbe:
-              exec:
-                command:
-                - /usr/bin/timeout
-                - 1s
-                - /usr/bin/mysql
-                - $(MYSQL_DATABASE)
-                - -h
-                - 127.0.0.1
-                - -u$(MYSQL_USER)
-                - -p$(MYSQL_PASSWORD)
-                - -e
-                - SELECT 1
+              exec:
+                command:
+                - /bin/sh
+                - -c
+                - /usr/bin/timeout 1s /usr/bin/mysql $(MYSQL_DATABASE) -h 127.0.0.1 -u$(MYSQL_USER) -p$(MYSQL_PASSWORD) -e "SELECT 1"
              initialDelaySeconds: 10
-             periodSeconds: 5
-             timeoutSeconds: 2
-             failureThreshold: 3
+             periodSeconds: 5
+             timeoutSeconds: 2
+             failureThreshold: 3
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
readinessProbe:
exec:
command:
- /usr/bin/timeout
- 1s
- /usr/bin/mysql
- $(MYSQL_DATABASE)
- -h
- 127.0.0.1
- -u$(MYSQL_USER)
- -p$(MYSQL_PASSWORD)
- -e
- SELECT 1
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 2
failureThreshold: 3
readinessProbe:
exec:
command:
- /bin/sh
- -c
- /usr/bin/timeout 1s /usr/bin/mysql $(MYSQL_DATABASE) -h 127.0.0.1 -u$(MYSQL_USER) -p$(MYSQL_PASSWORD) -e "SELECT 1"
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 2
failureThreshold: 3
🤖 Prompt for AI Agents
In tests/e2e/sample-applications/mysql-persistent/mysql-persistent.yaml around
lines 120-136, the readinessProbe exec command uses literal $(MYSQL_...) tokens
which Kubernetes will not expand; change the command to run a shell so
environment variables are expanded (e.g. replace the command array with a shell
invocation like /bin/sh -c and provide a single string that runs timeout and
mysql using $MYSQL_DATABASE, $MYSQL_USER and $MYSQL_PASSWORD with proper
quoting), and apply the same change to the other two YAML files that share this
probe.

@kaovilai
Copy link
Member Author

kaovilai commented Nov 7, 2025

/retest

ai-retester: The e2e-test-cli-aws-e2e step failed because the Mongo application DATAMOVER via CLI test timed out after 540 seconds, and a data upload process had problems and did not succeed for a while. Also, there was a warning that DeploymentConfig is deprecated, although this did not directly cause the failure.

The underlying issue seems to be that 'todolist' container in pod remains waiting to start (PodInitializing), and that external AWS EBS volumes were not being successfully provisioned on time preventing the test from proceeding, leading to Pod remaining on "Pending" phase.

@openshift-ci
Copy link

openshift-ci bot commented Nov 8, 2025

@kaovilai: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/4.20-e2e-test-aws 3916802 link false /test 4.20-e2e-test-aws
ci/prow/4.20-e2e-test-hcp-aws 3916802 link false /test 4.20-e2e-test-hcp-aws

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant