Mark VMs in error state when expunge fails during destroy operation#12749
Mark VMs in error state when expunge fails during destroy operation#12749sureshanaparti wants to merge 1 commit intoapache:4.22from
Conversation
|
@blueorangutan package |
|
@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## 4.22 #12749 +/- ##
=========================================
Coverage 17.61% 17.61%
- Complexity 15665 15668 +3
=========================================
Files 5917 5917
Lines 531400 531423 +23
Branches 64970 64973 +3
=========================================
+ Hits 93603 93614 +11
- Misses 427244 427253 +9
- Partials 10553 10556 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR addresses a UX problem where a VM becomes stuck in the Expunging state when expunge fails during a destroy operation with expunge=true. The fix transitions the VM to Error state on expunge failure, allowing the user to retry the destroy operation.
Changes:
- Adds a new FSM transition
Expunging → Error(viaOperationFailedToError) to the VM state machine. - Introduces a
transitionExpungingToError()helper method inUserVmManagerImpland calls it fromdestroyVm()when expunge fails (either by returningfalseor throwing aRuntimeException). - Adds unit tests covering the key code paths of the new helper method.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
api/src/main/java/com/cloud/vm/VirtualMachine.java |
Adds the new Expunging → Error FSM transition with a comment. |
server/src/main/java/com/cloud/vm/UserVmManagerImpl.java |
Adds transitionExpungingToError() helper and updates destroyVm() to call it on expunge failure. |
server/src/test/java/com/cloud/vm/UserVmManagerImplTest.java |
Adds four unit tests for transitionExpungingToError(). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| } | ||
| if (!expunged) { | ||
| transitionExpungingToError(vm.getId()); | ||
| throw new CloudRuntimeException("Failed to expunge VM " + destroyedVm); |
There was a problem hiding this comment.
The error message on line 3591 uses destroyedVm (the UserVm object's toString() representation, e.g. something like "VirtualMachineVO[id=N]"), whereas the error message at line 3587 uses vm.getUuid() to clearly identify the VM. Using destroyedVm in the error message is inconsistent within the same block and may produce a less informative or unpredictable output for consumers of this exception. The message should consistently use vm.getUuid() or similar identifier to clearly identify the VM.
| throw new CloudRuntimeException("Failed to expunge VM " + destroyedVm); | |
| throw new CloudRuntimeException("Failed to expunge VM " + vm.getUuid()); |
| if (expunge) { | ||
| boolean expunged; | ||
| try { | ||
| expunged = expunge(vm); | ||
| } catch (RuntimeException e) { | ||
| logger.error("Failed to expunge VM [{}] due to: {}", vm, e.getMessage(), e); | ||
| transitionExpungingToError(vm.getId()); | ||
| throw new CloudRuntimeException("Failed to expunge VM " + vm.getUuid() + " due to: " + e.getMessage(), e); | ||
| } | ||
| if (!expunged) { | ||
| transitionExpungingToError(vm.getId()); | ||
| throw new CloudRuntimeException("Failed to expunge VM " + destroyedVm); | ||
| } |
There was a problem hiding this comment.
The updated destroyVm(DestroyVMCmd) method now has two new failure paths: one where expunge(vm) returns false and one where it throws a RuntimeException. Both paths call transitionExpungingToError() and then throw a CloudRuntimeException. However, there are no tests in this PR covering these new code paths in destroyVm. The existing testDestroyVm test at line 3628 only covers the success case (where expunge returns true). Tests verifying that transitionExpungingToError is called and a CloudRuntimeException is thrown in both failure cases would improve reliability.
| } | ||
| } | ||
| } catch (NoTransitionException e) { | ||
| logger.warn("Failed to transition VM to Error state: {}", e.getMessage()); |
There was a problem hiding this comment.
The warning log message at line 2593 doesn't include the VM's UUID or ID to help identify which VM failed to transition. Since this code is inside a try block that starts at line 2582 and the vm variable is in scope when the NoTransitionException is thrown (it can only be thrown from within the if (vm != null && ...) check on line 2584), the log message should include vm.getUuid() to make troubleshooting easier. However, vm is declared inside the try block and would be accessible in the catch block because the exception can only occur inside the if block where vm != null.
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17011 |
|
@blueorangutan test |
|
@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
|
[SF] Trillian Build Failed (tid-15565) |
Description
This PR marks VMs in error state when expunge fails during destroy operation (with expunge=true).
Currently, when expunge fails, the VM gets stuck in Expunging state and the user is not able to perform any operation on the VM. This change moves the VM to Error state when expunge fails, thus allowing users to retry the destroy operation.
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
How did you try to break this feature and the system with this change?