Skip to content

Mark VMs in error state when expunge fails during destroy operation#12749

Open
sureshanaparti wants to merge 1 commit intoapache:4.22from
shapeblue:mark-expunge-failed-vms-in-error-state
Open

Mark VMs in error state when expunge fails during destroy operation#12749
sureshanaparti wants to merge 1 commit intoapache:4.22from
shapeblue:mark-expunge-failed-vms-in-error-state

Conversation

@sureshanaparti
Copy link
Contributor

Description

This PR marks VMs in error state when expunge fails during destroy operation (with expunge=true).

Currently, when expunge fails, the VM gets stuck in Expunging state and the user is not able to perform any operation on the VM. This change moves the VM to Error state when expunge fails, thus allowing users to retry the destroy operation.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@codecov
Copy link

codecov bot commented Mar 5, 2026

Codecov Report

❌ Patch coverage is 60.00000% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 17.61%. Comparing base (65e5409) to head (767d37d).

Files with missing lines Patch % Lines
.../src/main/java/com/cloud/vm/UserVmManagerImpl.java 58.33% 7 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##               4.22   #12749   +/-   ##
=========================================
  Coverage     17.61%   17.61%           
- Complexity    15665    15668    +3     
=========================================
  Files          5917     5917           
  Lines        531400   531423   +23     
  Branches      64970    64973    +3     
=========================================
+ Hits          93603    93614   +11     
- Misses       427244   427253    +9     
- Partials      10553    10556    +3     
Flag Coverage Δ
uitests 3.70% <ø> (ø)
unittests 18.68% <60.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a UX problem where a VM becomes stuck in the Expunging state when expunge fails during a destroy operation with expunge=true. The fix transitions the VM to Error state on expunge failure, allowing the user to retry the destroy operation.

Changes:

  • Adds a new FSM transition Expunging → Error (via OperationFailedToError) to the VM state machine.
  • Introduces a transitionExpungingToError() helper method in UserVmManagerImpl and calls it from destroyVm() when expunge fails (either by returning false or throwing a RuntimeException).
  • Adds unit tests covering the key code paths of the new helper method.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
api/src/main/java/com/cloud/vm/VirtualMachine.java Adds the new Expunging → Error FSM transition with a comment.
server/src/main/java/com/cloud/vm/UserVmManagerImpl.java Adds transitionExpungingToError() helper and updates destroyVm() to call it on expunge failure.
server/src/test/java/com/cloud/vm/UserVmManagerImplTest.java Adds four unit tests for transitionExpungingToError().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

}
if (!expunged) {
transitionExpungingToError(vm.getId());
throw new CloudRuntimeException("Failed to expunge VM " + destroyedVm);
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message on line 3591 uses destroyedVm (the UserVm object's toString() representation, e.g. something like "VirtualMachineVO[id=N]"), whereas the error message at line 3587 uses vm.getUuid() to clearly identify the VM. Using destroyedVm in the error message is inconsistent within the same block and may produce a less informative or unpredictable output for consumers of this exception. The message should consistently use vm.getUuid() or similar identifier to clearly identify the VM.

Suggested change
throw new CloudRuntimeException("Failed to expunge VM " + destroyedVm);
throw new CloudRuntimeException("Failed to expunge VM " + vm.getUuid());

Copilot uses AI. Check for mistakes.
Comment on lines +3580 to +3592
if (expunge) {
boolean expunged;
try {
expunged = expunge(vm);
} catch (RuntimeException e) {
logger.error("Failed to expunge VM [{}] due to: {}", vm, e.getMessage(), e);
transitionExpungingToError(vm.getId());
throw new CloudRuntimeException("Failed to expunge VM " + vm.getUuid() + " due to: " + e.getMessage(), e);
}
if (!expunged) {
transitionExpungingToError(vm.getId());
throw new CloudRuntimeException("Failed to expunge VM " + destroyedVm);
}
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated destroyVm(DestroyVMCmd) method now has two new failure paths: one where expunge(vm) returns false and one where it throws a RuntimeException. Both paths call transitionExpungingToError() and then throw a CloudRuntimeException. However, there are no tests in this PR covering these new code paths in destroyVm. The existing testDestroyVm test at line 3628 only covers the success case (where expunge returns true). Tests verifying that transitionExpungingToError is called and a CloudRuntimeException is thrown in both failure cases would improve reliability.

Copilot uses AI. Check for mistakes.
}
}
} catch (NoTransitionException e) {
logger.warn("Failed to transition VM to Error state: {}", e.getMessage());
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning log message at line 2593 doesn't include the VM's UUID or ID to help identify which VM failed to transition. Since this code is inside a try block that starts at line 2582 and the vm variable is in scope when the NoTransitionException is thrown (it can only be thrown from within the if (vm != null && ...) check on line 2584), the log message should include vm.getUuid() to make troubleshooting easier. However, vm is declared inside the try block and would be accessible in the catch block because the exception can only occur inside the if block where vm != null.

Copilot uses AI. Check for mistakes.
@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17011

@DaanHoogland
Copy link
Contributor

@blueorangutan test

@blueorangutan
Copy link

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link

[SF] Trillian Build Failed (tid-15565)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

5 participants