Commit 745f6e9
authored
feat: Search-and-Replace Image Patching (#409)
## What does this pull request do?
Replace hardcoded array index-based patching with a search-and-replace
approach for updating ADOT instrumentation images in our EKS test
deployments.
The solution uses `jq` to find the correct argument by pattern matching.
### Problem
The current implementation uses hardcoded array indices to patch
deployment arguments:
- Java: `args[2]`
- Python: `args[3]`
- DotNet: `args[4]`
- NodeJS: `args[5]`
This approach is fragile and will break if:
- Arguments are reordered in the deployment
- New arguments are added before the image arguments
- The deployment structure changes
Which is what has happened
[here](https://github.com/aws-observability/aws-otel-java-instrumentation/actions/runs/15219049376/job/42813192472#step:26:316)
(notice new args were added in the deployment config).
### Before
```bash
kubectl patch deploy ... --type='json' \
-p='[{"op": "replace", "path": "/spec/template/spec/containers/0/args/2", "value": "--auto-instrumentation-java-image=..."}]'
```
### After
```bash
kubectl get deploy ... -o json | \
jq '.spec.template.spec.containers[0].args |= map(if test("^--auto-instrumentation-java-image=") then "--auto-instrumentation-java-image=..." else . end)' | \
kubectl apply -f -
```
## Test strategy
### 1. Functional Testing with Real EKS Deployment
Retrieved actual deployment configuration from e2e-playground cluster
and verified both approaches produce identical results:
```bash
# Both approaches successfully update the image:
OLD: --auto-instrumentation-java-image=TEST_JAVA:v1.0.0 ✓
NEW: --auto-instrumentation-java-image=TEST_JAVA:v1.0.0 ✓
# NEW approach only modifies the targeted argument:
--auto-instrumentation-java-image=TEST_JAVA:v1.0.0 ✓ Changed
--auto-instrumentation-python-image=...v0.9.0 ✓ Unchanged
--auto-instrumentation-dotnet-image=...v1.7.0 ✓ Unchanged
--auto-instrumentation-nodejs-image=...v0.6.0 ✓ Unchanged
```
### 2. Edge Case Testing Results
Tested five critical edge cases with the actual deployment
configuration:
#### Non-existent argument patch
- **Test**: Try to patch `--auto-instrumentation-go-image` (doesn't
exist)
- **OLD approach**: Would fail with index out of bounds
- **NEW approach**: Safe no-op, no changes made
#### Reordered arguments
- **Test**: Swapped Java and Python argument positions
- **OLD approach**: Created duplicate Java entries, corrupted deployment
- **NEW approach**: Correctly found and updated only the Java argument
#### New arguments inserted
- **Test**: Added new flags before image arguments
- **OLD approach**: Patched `--new-feature-flag=enabled` instead of Java
image
- **NEW approach**: Still correctly found and patched Java image
#### Sequential patches
- **Test**: Applied multiple patches in sequence (simulating CI/CD)
- **Result**: Both Java and Python successfully updated without
conflicts
#### Malformed arguments
- **Test**: Replaced Java arg with malformed string
- **OLD approach**: Would blindly replace at index
- **NEW approach**: No match found, safely skipped
### 3. Test Commands Used
```bash
# Get real deployment
kubectl get deploy -n amazon-cloudwatch amazon-cloudwatch-observability-controller-manager -o json > deployment.json
# Test transformation
cat deployment.json | \
jq '.spec.template.spec.containers[0].args |= map(if test("^--auto-instrumentation-java-image=") then "--auto-instrumentation-java-image=NEW_IMAGE" else . end)'
# Run comprehensive edge case tests
./test-edge-cases.sh
```
### Test Files
-
[test-real-deployment.sh](https://paste.amazon.com/show/yiyuanh/1748359780)
- Testing with actual deployment configuration
- [test-edge-cases.sh](https://paste.amazon.com/show/yiyuanh/1748359814)
- Comprehensive edge case testing on real deployment
*Rollback procedure:*
We can safely rollback these changes by reverting the commit.
*Ensure you've run the following tests on your changes and include the
link below:*
To do so, create a `test.yml` file with `name: Test` and workflow
description to test your changes, then remove the file for your PR. Link
your test run in your PR description. This process is a short term
solution while we work on creating a staging environment for testing.
NOTE: TESTS RUNNING ON A SINGLE EKS CLUSTER CANNOT BE RUN IN PARALLEL.
See the
[needs](https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idneeds)
keyword to run tests in succession.
- Run Java EKS on `e2e-playground` in us-east-1 and eu-central-2
- Run Python EKS on `e2e-playground` in us-east-1 and eu-central-2
- Run metric limiter on EKS cluster `e2e-playground` in us-east-1 and
eu-central-2
- Run EC2 tests in all regions
- Run K8s on a separate K8s cluster (check IAD test account for master
node endpoints; these will change as we create and destroy clusters for
OS patching)
By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.1 parent 462e250 commit 745f6e9
File tree
1 file changed
+20
-8
lines changed- .github/workflows/actions/patch_image_and_check_diff
1 file changed
+20
-8
lines changedLines changed: 20 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
84 | 84 | | |
85 | 85 | | |
86 | 86 | | |
87 | | - | |
88 | | - | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
89 | 92 | | |
90 | 93 | | |
91 | 94 | | |
| |||
98 | 101 | | |
99 | 102 | | |
100 | 103 | | |
101 | | - | |
102 | | - | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
103 | 109 | | |
104 | 110 | | |
105 | 111 | | |
| |||
112 | 118 | | |
113 | 119 | | |
114 | 120 | | |
115 | | - | |
116 | | - | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
117 | 126 | | |
118 | 127 | | |
119 | 128 | | |
| |||
126 | 135 | | |
127 | 136 | | |
128 | 137 | | |
129 | | - | |
130 | | - | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
131 | 143 | | |
132 | 144 | | |
133 | 145 | | |
| |||
0 commit comments