Add state machine for image switch (milestone 4c)#50
Conversation
|
@jlebon I skipped the e2e test since we still need to define how to test the upgrade. The state machine is only checked by the env tests for now |
|
|
||
| mu sync.Mutex | ||
| switchImage string | ||
| switchApply bool |
There was a problem hiding this comment.
I think we discussed this at some point. My inclination is to not try to optimize this. I.e. let's not support an --apply special-case at all. We just do the bootc switch and then use the GenericEvent to immediately do another reconcile if desiredImageState is booted.
There was a problem hiding this comment.
To clarify, --apply is not used during staging, we always stage with plain bootc switch . The
--apply is only used in the reboot step: when the image is already staged and desiredImageState is Booted, we run bootc switch --apply to boot into it.
Are you suggesting we should decouple this further and trigger the reboot through a different mechanism (e.g. systemctl reboot) instead of bootc switch --apply?
There was a problem hiding this comment.
Sorry, this is wrong bootc switch --apply doesn't reboot, but it is instead bootc upgrade --apply, the right command
There was a problem hiding this comment.
The --apply is only used in the reboot step: when the image is already staged and desiredImageState is Booted, we run bootc switch --apply to boot into it.
OK, that makes sense and I guess it does match closer to what we'll actually do once we have --from-downloaded.
I do like the executor verb being called Reboot() though as you have now to make this clearer. So maybe then we just change it to take an image arg so it runs bootc switch --apply.
But yeah, I think the bug you're hitting is bootc-dev/bootc#2249.
7e1ff20 to
2afd447
Compare
|
There are still some issue with this PR... investigating. |
56119dc to
ba87f0d
Compare
|
I think we should prioritize #57 and implement at least the happy path test for the reboot |
e4557ce to
413cf0d
Compare
1226849 to
ed7b12d
Compare
| func (e *HostExecutor) Switch(ctx context.Context, image string) error { | ||
| log := logf.FromContext(ctx) | ||
|
|
||
| cmd := e.nsenterCmd(ctx, "bootc", "switch", image) |
There was a problem hiding this comment.
TODO link to bootc-dev/bootc#2137 would still be good here.
There was a problem hiding this comment.
Should I rename the function Stage then? It doesn't follow bootc API but it makes the intention more explicit like the reboot
| limits: | ||
| cpu: 500m | ||
| memory: 128Mi | ||
| memory: 512Mi |
There was a problem hiding this comment.
The commit message needs a rationale.
| return nil | ||
| } | ||
|
|
||
| func (f *fakeExecutor) Upgrade(_ context.Context) error { |
There was a problem hiding this comment.
Shouldn't this be Reboot instead?
| Type: bootcv1alpha1.NodeDegraded, | ||
| Status: metav1.ConditionTrue, | ||
| Reason: bootcv1alpha1.NodeReasonError, | ||
| Message: fmt.Sprintf("failed to get bootc status: %v", err), |
There was a problem hiding this comment.
OK as is, but I wonder if it would be cleaner to instead define a new error type, e.g. degradedError which gets "caught" in the top-level Reconcile() loop and sets NodeReasonError and the message to from the degradedError object. See also invalidSpecError() in the controller code.
There was a problem hiding this comment.
Actually, we can treat all the error from the reconcileBootcNode as Degraded errors and we don't need an extra type
There was a problem hiding this comment.
Hmm, maybe. I think it's helpful to distinguish between "OS errors", and any other error (e.g. failed to get or patch some k8s object). I think only the former should cause a NodeDegraded condition.
There was a problem hiding this comment.
The reconcileBootcNode does already return an error only if there are an OS error. Except if we we consider the bn.Status.Booted == nil not as an OS error. The other cases is when a bootc operation fails.
| return ctrl.Result{}, nil | ||
| } | ||
|
|
||
| _, desiredDigest, _ := strings.Cut(desiredImage, "@") |
| err := executor.Switch(ctx, image) | ||
|
|
||
| s.mu.Lock() | ||
| if ctx.Err() != nil { |
There was a problem hiding this comment.
I guess one TODO here is to switch to exec'ing the bootc switch async, and then a select here where we monitor the cancel channel so we can actually send a SIGINT to bootc.
|
I haven't re-reviewed past the 2nd commit yet (tests). |
Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Alice Frosi <afrosi@redhat.com>
Rewrite the reconciler to detect image mismatches between spec.desiredImage and the booted image, stage via bootc switch in a background goroutine. Once, it finished to staged the image, the termination of the goroutine triggers once more the reconciliation loop which will detect that the system requires a reboot. The reconciliation function ensures that the bootc node transitions from Staging to Staged, and then to Rebooting. Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Alice Frosi <afrosi@redhat.com>
Replace raw JSON bytes with a bootc.Status struct in the test fake. Status() serializes the struct via json.Marshal, and Switch() auto-mutates the status (staging sets Staged). Upgrade() records the call for test assertions. Add newBootcStatus() and newBootEntry() helpers to build test state without verbose JSON constants. Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Alice Frosi <afrosi@redhat.com>
Add envtest cases for the daemon reconciler state machine: - TestStagingTriggered: image mismatch triggers bootc switch - TestStagingError: switch failure sets Degraded condition - TestAlreadyStaged: skip switch when image already staged - TestRebootingSet: reboot triggered when desiredImageState is Booted - TestRollback: restage when desired image changes - TestCancelInflightSwitch: spec change cancels in-flight switch Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Alice Frosi <afrosi@redhat.com>
Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Alice Frosi <afrosi@redhat.com>
Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Alice Frosi <afrosi@redhat.com>
Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Alice Frosi <afrosi@redhat.com>
Implements the daemon-side state machine that detects image mismatches between spec.desiredImage and the booted image,
stages updates via bootc switch in a background goroutine, and triggers reboot when desiredImageState == Booted.
one path