KVM: Enable HA heartbeat on ShareMountPoint#12773
KVM: Enable HA heartbeat on ShareMountPoint#12773weizhouapache wants to merge 1 commit intoapache:4.22from
Conversation
296835a to
93b95aa
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## 4.22 #12773 +/- ##
=========================================
Coverage 17.61% 17.61%
- Complexity 15661 15665 +4
=========================================
Files 5917 5917
Lines 531402 531414 +12
Branches 64971 64972 +1
=========================================
+ Hits 93586 93596 +10
- Misses 427262 427263 +1
- Partials 10554 10555 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@blueoranguran package |
There was a problem hiding this comment.
Pull request overview
Enhances KVM HA heartbeat to support SharedMountPoint primary storage by introducing a new heartbeat script and extending HA support checks across KVM storage and monitor components.
Changes:
- Added a new KVM “SharedMountPoint” heartbeat script (
kvmsmpheartbeat.sh) that writes heartbeat files to a mountpoint. - Expanded HA-supported storage pool types to include
SharedMountPointin driver, monitor, and pool logic. - Added a
setTypehook toKVMStoragePooland set the pool type after creation / retrieval to ensure HA detection works.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/vm/hypervisor/kvm/kvmsmpheartbeat.sh | New local mountpoint-based heartbeat writer/checker script for SharedMountPoint HA. |
| plugins/storage/volume/default/.../CloudStackPrimaryDataStoreDriverImpl.java | Allows HA support for SharedMountPoint in primary storage capability logic. |
| plugins/hypervisors/kvm/.../LibvirtStoragePool.java | Enables HA support for SharedMountPoint and selects the correct heartbeat script. |
| plugins/hypervisors/kvm/.../KVMStoragePoolManager.java | Sets pool type after creation; exposes getStorageAdaptor. |
| plugins/hypervisors/kvm/.../KVMStoragePool.java | Introduces a default setType method for backward compatibility. |
| plugins/hypervisors/kvm/.../LibvirtCheckVMActivityOnStoragePoolCommandWrapper.java | Ensures retrieved pool has its type set before HA checks. |
| plugins/hypervisors/kvm/.../KVMHAMonitor.java | Expands heartbeat monitoring to include SharedMountPoint pools. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if [ ! -f "$hbFile" ]; then | ||
| # signal large difference if file missing | ||
| return 999999 | ||
| fi | ||
| now=$(date +%s) | ||
| hb=$(cat "$hbFile" 2>/dev/null) | ||
| if [ -z "$hb" ]; then | ||
| return 999998 | ||
| fi | ||
| diff=`expr $now - $hb 2>/dev/null` | ||
| if [ $? -ne 0 ] | ||
| then | ||
| return 999997 | ||
| fi | ||
| if [ -z "$interval" ]; then | ||
| # if no interval provided, consider 0 as success | ||
| if [ $diff -gt 0 ]; then | ||
| return $diff | ||
| else | ||
| return 0 | ||
| fi | ||
| fi | ||
| if [ $diff -gt $interval ] | ||
| then | ||
| return $diff | ||
| fi | ||
| return 0 | ||
| } | ||
|
|
||
| if [ "$rflag" == "1" ] | ||
| then | ||
| check_hbLog | ||
| diff=$? | ||
| if [ $diff == 0 ] |
There was a problem hiding this comment.
return in bash only supports exit codes 0–255; values like 999999 will wrap modulo 256, so diff=$? will not contain the intended “seconds ago” value and can lead to incorrect DEAD/ALIVE reporting. Consider having check_hbLog print the computed diff (e.g., via echo) and use its return code only as success/failure, or store the diff in a global variable and return 0/1.
| if [ ! -f "$hbFile" ]; then | |
| # signal large difference if file missing | |
| return 999999 | |
| fi | |
| now=$(date +%s) | |
| hb=$(cat "$hbFile" 2>/dev/null) | |
| if [ -z "$hb" ]; then | |
| return 999998 | |
| fi | |
| diff=`expr $now - $hb 2>/dev/null` | |
| if [ $? -ne 0 ] | |
| then | |
| return 999997 | |
| fi | |
| if [ -z "$interval" ]; then | |
| # if no interval provided, consider 0 as success | |
| if [ $diff -gt 0 ]; then | |
| return $diff | |
| else | |
| return 0 | |
| fi | |
| fi | |
| if [ $diff -gt $interval ] | |
| then | |
| return $diff | |
| fi | |
| return 0 | |
| } | |
| if [ "$rflag" == "1" ] | |
| then | |
| check_hbLog | |
| diff=$? | |
| if [ $diff == 0 ] | |
| hb_diff=0 | |
| if [ ! -f "$hbFile" ]; then | |
| # signal large difference if file missing | |
| hb_diff=999999 | |
| return 1 | |
| fi | |
| now=$(date +%s) | |
| hb=$(cat "$hbFile" 2>/dev/null) | |
| if [ -z "$hb" ]; then | |
| hb_diff=999998 | |
| return 1 | |
| fi | |
| diff=`expr $now - $hb 2>/dev/null` | |
| if [ $? -ne 0 ] | |
| then | |
| hb_diff=999997 | |
| return 1 | |
| fi | |
| if [ -z "$interval" ]; then | |
| # if no interval provided, consider 0 as success | |
| if [ $diff -gt 0 ]; then | |
| hb_diff=$diff | |
| return 1 | |
| else | |
| hb_diff=0 | |
| return 0 | |
| fi | |
| fi | |
| if [ $diff -gt $interval ] | |
| then | |
| hb_diff=$diff | |
| return 1 | |
| fi | |
| hb_diff=0 | |
| return 0 | |
| } | |
| if [ "$rflag" == "1" ] | |
| then | |
| check_hbLog | |
| status=$? | |
| diff="${hb_diff:-0}" | |
| if [ $status -eq 0 ] |
| if [ $diff == 0 ] | ||
| then | ||
| echo "=====> ALIVE <=====" | ||
| else | ||
| echo "=====> Considering host as DEAD because last write on [$hbFile] was [$diff] seconds ago, but the max interval is [$interval] <======" | ||
| fi |
There was a problem hiding this comment.
This is doing a numeric comparison but uses the string operator == and unquoted $diff. Use numeric operators (-eq, -ne) and quote variables in [ tests (or switch to [[ ... ]]) to avoid mis-comparisons or test errors if the variable is empty/unset.
| if [ $? -gt 0 ] | ||
| then | ||
| return | ||
| fi |
There was a problem hiding this comment.
$? here reflects only the exit status of the last command in the pipeline (awk), not whether ps/grep matched anything. As written, this check won’t behave as intended and is effectively redundant with the later -z "$vmPids" guard. Consider removing this $? block, or use set -o pipefail (with care for script-wide impact) or replace the pipeline with a more direct PID lookup (e.g., pgrep) and check for empty output.
| if [ $? -gt 0 ] | |
| then | |
| return | |
| fi |
| mounts=$(cat /proc/mounts | grep "$MountPoint") | ||
| if [ $? -gt 0 ] | ||
| then | ||
| # mount point not present — we don't remount in local-only script | ||
| # nothing to do here; keep for compatibility with original flow | ||
| : | ||
| else |
There was a problem hiding this comment.
grep "$MountPoint" can produce false positives when $MountPoint is a substring of another mount target (e.g., /mnt/cloudstack1 matching /mnt/cloudstack10). This can incorrectly treat the mount as present and trigger deleteVMs. Consider using a stricter mount check (e.g., matching the mount target field exactly or using findmnt).
| private final Map<String, StorageAdaptor> _storageMapper = new HashMap<String, StorageAdaptor>(); | ||
|
|
||
| private StorageAdaptor getStorageAdaptor(StoragePoolType type) { | ||
| public StorageAdaptor getStorageAdaptor(StoragePoolType type) { |
There was a problem hiding this comment.
Changing this method from private to public expands the exposed API surface of KVMStoragePoolManager, which can make future refactors harder. If external access is required, consider narrowing visibility (package-private) or providing a more purpose-driven public method; if it’s only needed for tests, consider a test-oriented visibility approach rather than making it broadly public.
| public StorageAdaptor getStorageAdaptor(StoragePoolType type) { | |
| StorageAdaptor getStorageAdaptor(StoragePoolType type) { |
|
|
||
| public class KVMHAMonitor extends KVMHABase implements Runnable { | ||
|
|
||
| public static final List<StoragePoolType> STORAGE_POOL_TYPES_WITH_HA_SUPPORT = List.of(StoragePoolType.NetworkFilesystem, StoragePoolType.SharedMountPoint); |
There was a problem hiding this comment.
HA-supported pool types are now defined in multiple places (e.g., also in CloudStackPrimaryDataStoreDriverImpl), and LibvirtStoragePool depends on KVMHAMonitor just to reuse this constant. Consider centralizing this list in a small shared utility/helper in an appropriate common package for the KVM plugin (or a single authoritative location) to avoid duplication and reduce cross-class coupling.
| -i identifier (ignored for local-only heartbeat) | ||
| -p path (ignored for local-only heartbeat) |
There was a problem hiding this comment.
The help text says -i is “ignored for local-only heartbeat”, but the script hard-requires it and exits with status 1 without any message. Consider either (a) not requiring -i for this script, or (b) emitting a clear error (and/or calling help) explaining why it’s required for compatibility.
| -i identifier (ignored for local-only heartbeat) | |
| -p path (ignored for local-only heartbeat) | |
| -i identifier (required for CLI compatibility; value ignored by local-only heartbeat) | |
| -p path (required for CLI compatibility; value ignored by local-only heartbeat) |
| # Match original kvmheartbeat.sh: require NfsSvrIP parameter for CLI compatibility | ||
| if [ -z "$NfsSvrIP" ] | ||
| then | ||
| exit 1 | ||
| fi | ||
|
|
There was a problem hiding this comment.
The help text says -i is “ignored for local-only heartbeat”, but the script hard-requires it and exits with status 1 without any message. Consider either (a) not requiring -i for this script, or (b) emitting a clear error (and/or calling help) explaining why it’s required for compatibility.
| # Match original kvmheartbeat.sh: require NfsSvrIP parameter for CLI compatibility | |
| if [ -z "$NfsSvrIP" ] | |
| then | |
| exit 1 | |
| fi |
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✖️ debian ✔️ suse15. SL-JID 17048 |
|
@blueorangutan package |
|
@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17054 |
|
@blueorangutan test |
|
@weizhouapache a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
|
@sureshanaparti @rajujith |
Description
This PR improves KVM HA heartbeat to support SharedMountPoint
Tested
Example of heartbeat files
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
How did you try to break this feature and the system with this change?