Skip to content

KVM: Enable HA heartbeat on ShareMountPoint#12773

Draft
weizhouapache wants to merge 1 commit intoapache:4.22from
weizhouapache:4.22-kvm-ha-on-shared-mount-point
Draft

KVM: Enable HA heartbeat on ShareMountPoint#12773
weizhouapache wants to merge 1 commit intoapache:4.22from
weizhouapache:4.22-kvm-ha-on-shared-mount-point

Conversation

@weizhouapache
Copy link
Member

Description

This PR improves KVM HA heartbeat to support SharedMountPoint

Tested

  • add shared mount point pool
  • kvm hosts write heartbeat to the mount point
  • deploy vms
  • set global settingforce.ha to true
  • force shutdown one of the kvm hosts
  • after 3 mins, the VMs are started on other kvm hosts.

Example of heartbeat files

[root@kvm1 ~]# ls -l /mnt/cloudstack*/KVMHA
/mnt/cloudstack1/KVMHA:
total 1
-rw-r--r--. 1 root root 11 Mar  9 12:28 hb-10.1.34.125
-rw-r--r--. 1 root root 11 Mar  9 12:09 hb-10.1.35.76

/mnt/cloudstack2/KVMHA:
total 1
-rw-r--r--. 1 root root 11 Mar  9 12:28 hb-10.1.34.125
-rw-r--r--. 1 root root 11 Mar  9 12:09 hb-10.1.35.76

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

@weizhouapache weizhouapache force-pushed the 4.22-kvm-ha-on-shared-mount-point branch from 296835a to 93b95aa Compare March 9, 2026 12:47
@codecov
Copy link

codecov bot commented Mar 9, 2026

Codecov Report

❌ Patch coverage is 5.55556% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 17.61%. Comparing base (c3d6a8c) to head (93b95aa).
⚠️ Report is 2 commits behind head on 4.22.

Files with missing lines Patch % Lines
...oud/hypervisor/kvm/storage/LibvirtStoragePool.java 0.00% 8 Missing ⚠️
...irtCheckVMActivityOnStoragePoolCommandWrapper.java 0.00% 2 Missing ⚠️
...m/cloud/hypervisor/kvm/storage/KVMStoragePool.java 0.00% 2 Missing ⚠️
.../hypervisor/kvm/storage/KVMStoragePoolManager.java 0.00% 2 Missing ⚠️
...e/driver/CloudStackPrimaryDataStoreDriverImpl.java 0.00% 2 Missing ⚠️
...om/cloud/hypervisor/kvm/resource/KVMHAMonitor.java 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##               4.22   #12773   +/-   ##
=========================================
  Coverage     17.61%   17.61%           
- Complexity    15661    15665    +4     
=========================================
  Files          5917     5917           
  Lines        531402   531414   +12     
  Branches      64971    64972    +1     
=========================================
+ Hits          93586    93596   +10     
- Misses       427262   427263    +1     
- Partials      10554    10555    +1     
Flag Coverage Δ
uitests 3.70% <ø> (ø)
unittests 18.68% <5.55%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@weizhouapache
Copy link
Member Author

@blueoranguran package

@weizhouapache weizhouapache requested a review from Copilot March 9, 2026 16:02
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enhances KVM HA heartbeat to support SharedMountPoint primary storage by introducing a new heartbeat script and extending HA support checks across KVM storage and monitor components.

Changes:

  • Added a new KVM “SharedMountPoint” heartbeat script (kvmsmpheartbeat.sh) that writes heartbeat files to a mountpoint.
  • Expanded HA-supported storage pool types to include SharedMountPoint in driver, monitor, and pool logic.
  • Added a setType hook to KVMStoragePool and set the pool type after creation / retrieval to ensure HA detection works.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
scripts/vm/hypervisor/kvm/kvmsmpheartbeat.sh New local mountpoint-based heartbeat writer/checker script for SharedMountPoint HA.
plugins/storage/volume/default/.../CloudStackPrimaryDataStoreDriverImpl.java Allows HA support for SharedMountPoint in primary storage capability logic.
plugins/hypervisors/kvm/.../LibvirtStoragePool.java Enables HA support for SharedMountPoint and selects the correct heartbeat script.
plugins/hypervisors/kvm/.../KVMStoragePoolManager.java Sets pool type after creation; exposes getStorageAdaptor.
plugins/hypervisors/kvm/.../KVMStoragePool.java Introduces a default setType method for backward compatibility.
plugins/hypervisors/kvm/.../LibvirtCheckVMActivityOnStoragePoolCommandWrapper.java Ensures retrieved pool has its type set before HA checks.
plugins/hypervisors/kvm/.../KVMHAMonitor.java Expands heartbeat monitoring to include SharedMountPoint pools.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +168 to +201
if [ ! -f "$hbFile" ]; then
# signal large difference if file missing
return 999999
fi
now=$(date +%s)
hb=$(cat "$hbFile" 2>/dev/null)
if [ -z "$hb" ]; then
return 999998
fi
diff=`expr $now - $hb 2>/dev/null`
if [ $? -ne 0 ]
then
return 999997
fi
if [ -z "$interval" ]; then
# if no interval provided, consider 0 as success
if [ $diff -gt 0 ]; then
return $diff
else
return 0
fi
fi
if [ $diff -gt $interval ]
then
return $diff
fi
return 0
}

if [ "$rflag" == "1" ]
then
check_hbLog
diff=$?
if [ $diff == 0 ]
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return in bash only supports exit codes 0–255; values like 999999 will wrap modulo 256, so diff=$? will not contain the intended “seconds ago” value and can lead to incorrect DEAD/ALIVE reporting. Consider having check_hbLog print the computed diff (e.g., via echo) and use its return code only as success/failure, or store the diff in a global variable and return 0/1.

Suggested change
if [ ! -f "$hbFile" ]; then
# signal large difference if file missing
return 999999
fi
now=$(date +%s)
hb=$(cat "$hbFile" 2>/dev/null)
if [ -z "$hb" ]; then
return 999998
fi
diff=`expr $now - $hb 2>/dev/null`
if [ $? -ne 0 ]
then
return 999997
fi
if [ -z "$interval" ]; then
# if no interval provided, consider 0 as success
if [ $diff -gt 0 ]; then
return $diff
else
return 0
fi
fi
if [ $diff -gt $interval ]
then
return $diff
fi
return 0
}
if [ "$rflag" == "1" ]
then
check_hbLog
diff=$?
if [ $diff == 0 ]
hb_diff=0
if [ ! -f "$hbFile" ]; then
# signal large difference if file missing
hb_diff=999999
return 1
fi
now=$(date +%s)
hb=$(cat "$hbFile" 2>/dev/null)
if [ -z "$hb" ]; then
hb_diff=999998
return 1
fi
diff=`expr $now - $hb 2>/dev/null`
if [ $? -ne 0 ]
then
hb_diff=999997
return 1
fi
if [ -z "$interval" ]; then
# if no interval provided, consider 0 as success
if [ $diff -gt 0 ]; then
hb_diff=$diff
return 1
else
hb_diff=0
return 0
fi
fi
if [ $diff -gt $interval ]
then
hb_diff=$diff
return 1
fi
hb_diff=0
return 0
}
if [ "$rflag" == "1" ]
then
check_hbLog
status=$?
diff="${hb_diff:-0}"
if [ $status -eq 0 ]

Copilot uses AI. Check for mistakes.
Comment on lines +201 to +206
if [ $diff == 0 ]
then
echo "=====> ALIVE <====="
else
echo "=====> Considering host as DEAD because last write on [$hbFile] was [$diff] seconds ago, but the max interval is [$interval] <======"
fi
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is doing a numeric comparison but uses the string operator == and unquoted $diff. Use numeric operators (-eq, -ne) and quote variables in [ tests (or switch to [[ ... ]]) to avoid mis-comparisons or test errors if the variable is empty/unset.

Copilot uses AI. Check for mistakes.
Comment on lines +107 to +110
if [ $? -gt 0 ]
then
return
fi
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$? here reflects only the exit status of the last command in the pipeline (awk), not whether ps/grep matched anything. As written, this check won’t behave as intended and is effectively redundant with the later -z "$vmPids" guard. Consider removing this $? block, or use set -o pipefail (with care for script-wide impact) or replace the pipeline with a more direct PID lookup (e.g., pgrep) and check for empty output.

Suggested change
if [ $? -gt 0 ]
then
return
fi

Copilot uses AI. Check for mistakes.
Comment on lines +124 to +130
mounts=$(cat /proc/mounts | grep "$MountPoint")
if [ $? -gt 0 ]
then
# mount point not present — we don't remount in local-only script
# nothing to do here; keep for compatibility with original flow
:
else
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grep "$MountPoint" can produce false positives when $MountPoint is a substring of another mount target (e.g., /mnt/cloudstack1 matching /mnt/cloudstack10). This can incorrectly treat the mount as present and trigger deleteVMs. Consider using a stricter mount check (e.g., matching the mount target field exactly or using findmnt).

Copilot uses AI. Check for mistakes.
private final Map<String, StorageAdaptor> _storageMapper = new HashMap<String, StorageAdaptor>();

private StorageAdaptor getStorageAdaptor(StoragePoolType type) {
public StorageAdaptor getStorageAdaptor(StoragePoolType type) {
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this method from private to public expands the exposed API surface of KVMStoragePoolManager, which can make future refactors harder. If external access is required, consider narrowing visibility (package-private) or providing a more purpose-driven public method; if it’s only needed for tests, consider a test-oriented visibility approach rather than making it broadly public.

Suggested change
public StorageAdaptor getStorageAdaptor(StoragePoolType type) {
StorageAdaptor getStorageAdaptor(StoragePoolType type) {

Copilot uses AI. Check for mistakes.

public class KVMHAMonitor extends KVMHABase implements Runnable {

public static final List<StoragePoolType> STORAGE_POOL_TYPES_WITH_HA_SUPPORT = List.of(StoragePoolType.NetworkFilesystem, StoragePoolType.SharedMountPoint);
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HA-supported pool types are now defined in multiple places (e.g., also in CloudStackPrimaryDataStoreDriverImpl), and LibvirtStoragePool depends on KVMHAMonitor just to reuse this constant. Consider centralizing this list in a small shared utility/helper in an appropriate common package for the KVM plugin (or a single authoritative location) to avoid duplication and reduce cross-class coupling.

Copilot uses AI. Check for mistakes.
Comment on lines +21 to +22
-i identifier (ignored for local-only heartbeat)
-p path (ignored for local-only heartbeat)
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The help text says -i is “ignored for local-only heartbeat”, but the script hard-requires it and exits with status 1 without any message. Consider either (a) not requiring -i for this script, or (b) emitting a clear error (and/or calling help) explaining why it’s required for compatibility.

Suggested change
-i identifier (ignored for local-only heartbeat)
-p path (ignored for local-only heartbeat)
-i identifier (required for CLI compatibility; value ignored by local-only heartbeat)
-p path (required for CLI compatibility; value ignored by local-only heartbeat)

Copilot uses AI. Check for mistakes.
Comment on lines +70 to +75
# Match original kvmheartbeat.sh: require NfsSvrIP parameter for CLI compatibility
if [ -z "$NfsSvrIP" ]
then
exit 1
fi

Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The help text says -i is “ignored for local-only heartbeat”, but the script hard-requires it and exits with status 1 without any message. Consider either (a) not requiring -i for this script, or (b) emitting a clear error (and/or calling help) explaining why it’s required for compatibility.

Suggested change
# Match original kvmheartbeat.sh: require NfsSvrIP parameter for CLI compatibility
if [ -z "$NfsSvrIP" ]
then
exit 1
fi

Copilot uses AI. Check for mistakes.
@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✖️ debian ✔️ suse15. SL-JID 17048

@weizhouapache
Copy link
Member Author

@blueorangutan package

@blueorangutan
Copy link

@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17054

@weizhouapache
Copy link
Member Author

@blueorangutan test

@blueorangutan
Copy link

@weizhouapache a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@weizhouapache weizhouapache added this to the 4.22.1 milestone Mar 10, 2026
@weizhouapache
Copy link
Member Author

@sureshanaparti @rajujith
added this to 4.22.1 milestone, a small improvement but may benefit several users

cc @DaanHoogland @NuxRo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants