From 6ade2c53e881debf54a21741e455500493c8a602 Mon Sep 17 00:00:00 2001 From: Adam Rudell Date: Fri, 17 Apr 2026 15:12:56 -0500 Subject: [PATCH 01/11] Add TSG: NcHostAgent unable to connect to ApiService after Solution Update Add troubleshooting guide for NcHostAgent connectivity failure to Network Controller ApiService following Azure Local solution updates. The issue is caused by missing NetworkControllerNodeNames registry key on Hyper-V hosts, which prevents certificate rotation from propagating to NC VMs. Mitigation steps include: - Validating and populating the registry key on all hosts - Verifying local admin permissions on NC VMs - Running Start-SdnServerCertificateRotation for short-term recovery Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- TSG/Networking/README.md | 1 + ...onnect-ApiService-After-Solution-Update.md | 170 ++++++++++++++++++ 2 files changed, 171 insertions(+) create mode 100644 TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md diff --git a/TSG/Networking/README.md b/TSG/Networking/README.md index 805ea6f2..00fa9064 100644 --- a/TSG/Networking/README.md +++ b/TSG/Networking/README.md @@ -22,6 +22,7 @@ For Network Environment Validator Resources, see [TSG/EnvironmentValidator/Netwo - [How To: SDN Layer 3 Gateway Configuration](SDN-Express/HowTo-SDNExpress-SDN-Layer3-Gateway-Configuration.md) - [Troubleshoot: Host Unreachable](SDN-Express/Troubleshoot-SDNExpress-HealthAlert-HostUnreachable.md) - [Troubleshoot: Outbound Connectivity Issues with NAT](SDN-Express/Troubleshoot-SDNExpress-Outbound-Connectivity-Issues-When-Using-Outbound-NAT.md) +- [Troubleshoot: NcHostAgent Unable to Connect to ApiService After Solution Update](SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md) - [Troubleshoot: Recreate Intent Without SR-IOV](SDN-Express/Troubleshoot-SDNExpress-Recreate-Intent-No-SRIOV.md) ### Diagnostics diff --git a/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md b/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md new file mode 100644 index 00000000..cbf07370 --- /dev/null +++ b/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md @@ -0,0 +1,170 @@ +# NcHostAgent Unable to Connect to ApiService After Solution Update + + + + + + + + + + + + + + + + + + +
ComponentSDN Express / Network Controller
SeverityCritical
Applicable ScenariosSolution Update
Affected VersionsAll versions
+ +## Overview + +After completing an Azure Local solution update, Network Controller (NC) loses the ability to manage Hyper-V hosts due to a certificate rotation failure. The solution update rotates host certificates, but the corresponding AzureStackCertificationAuthority certificate is not properly propagated to the Network Controller VMs. This causes the NcHostAgent on each host to fail authentication when connecting to the NC ApiService, effectively breaking SDN management across the cluster. + +This issue recurs after every subsequent solution update until the underlying root cause is addressed. + +## Symptoms + +**Common error messages:** + +The NcHostAgent service on the Hyper-V hosts reports connectivity failures to the Network Controller ApiService. + +``` +NcHostAgent unable to connect to ApiService +``` + +**Observable behaviors:** + +- Network Controller cannot manage Hyper-V hosts after a solution update +- SDN policies are no longer applied to tenant VMs +- Virtual network connectivity may be disrupted +- NcHostAgent service on the hosts fails to authenticate with Network Controller +- The issue reoccurs after each subsequent solution update + +## Root Cause + +During a solution update, the host certificates are rotated by ECE (Environment Configuration Engine). For the certificate rotation to also propagate the AzureStackCertificationAuthority certificate to the Network Controller VMs, the following registry key must exist and be populated on each Hyper-V host: + +``` +HKLM:\SYSTEM\CurrentControlSet\Services\NcHostAgent\Parameters\NetworkControllerNodeNames +``` + +This registry key is created by Windows Admin Center (WAC) during SDN deployment. In environments where WAC was not used (e.g., SdnExpress-based deployments or brownfield configurations), this registry key does not exist. When the key is missing, the secret rotation process silently skips the NC and SLB certificate injection and incorrectly reports success. + +As a result, after the update the hosts present new certificates that the Network Controller does not trust, breaking the NcHostAgent-to-ApiService connection. + +## Resolution + +### Prerequisites + +- Administrative access to the Hyper-V host nodes +- Administrative access to the Network Controller VMs +- [SdnDiagnostics](https://github.com/microsoft/SdnDiagnostics/wiki) PowerShell module installed + +### Steps + +1. **Validate the registry key exists and is populated** + + On each Hyper-V host, verify that the `NetworkControllerNodeNames` registry key exists and contains the Network Controller VM names. + + ```powershell + # Check if the registry key exists and has a value + $regPath = "HKLM:\SYSTEM\CurrentControlSet\Services\NcHostAgent\Parameters" + $regValue = Get-ItemProperty -Path $regPath -Name "NetworkControllerNodeNames" -ErrorAction SilentlyContinue + + if ($null -eq $regValue) { + Write-Host "NetworkControllerNodeNames registry key is MISSING." -ForegroundColor Red + } + elseif ([string]::IsNullOrWhiteSpace($regValue.NetworkControllerNodeNames)) { + Write-Host "NetworkControllerNodeNames registry key EXISTS but is EMPTY." -ForegroundColor Yellow + } + else { + Write-Host "NetworkControllerNodeNames: $($regValue.NetworkControllerNodeNames)" -ForegroundColor Green + } + ``` + + If the key is missing or empty, populate it with the Network Controller VM names (comma-separated): + + ```powershell + # Replace ,, with your actual NC VM names + $ncNodeNames = ",," + Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Services\NcHostAgent\Parameters" -Name "NetworkControllerNodeNames" -Value $ncNodeNames + ``` + + > **Important:** Repeat this step on every Hyper-V host in the cluster. + +2. **Validate local administrator permissions on Network Controller VMs** + + The certificate rotation process requires that the Hyper-V hosts can remotely manage the Network Controller VMs. Verify that the cluster nodes have local administrator permissions on each NC VM. + + ```powershell + # Run from a Hyper-V host against each NC VM + # Replace with your NC VM name + Invoke-Command -ComputerName -ScriptBlock { + $adminGroup = Get-LocalGroupMember -Group "Administrators" -ErrorAction SilentlyContinue + $adminGroup | Format-Table Name, ObjectClass, PrincipalSource + } + ``` + + If the cluster nodes are not listed as local administrators, add them: + + ```powershell + # Run on each NC VM + # Replace with your cluster node machine account + Invoke-Command -ComputerName -ScriptBlock { + Add-LocalGroupMember -Group "Administrators" -Member "" + } + ``` + + > **Important:** Repeat this for each Hyper-V host against every NC VM. + +3. **Run certificate rotation to restore connectivity (short-term mitigation)** + + Use `Start-SdnServerCertificateRotation` from the SdnDiagnostics module to manually trigger certificate rotation and restore the NcHostAgent-to-ApiService connection. + + ```powershell + # Import the SdnDiagnostics module if not already loaded + Import-Module SdnDiagnostics + + # Generate the SdnDiagnostics credential (NC admin credentials) + $credential = Get-Credential + + # Start the certificate rotation + Start-SdnServerCertificateRotation -Credential $credential + ``` + + > **Note:** This is a short-term mitigation. The issue will recur on the next solution update unless the registry key (Step 1) is properly configured on all hosts before the update. + +4. **Verify resolution** + + Confirm that the NcHostAgent service on each host can now communicate with the Network Controller. + + ```powershell + # Check NcHostAgent service status on each host + Get-Service -Name NcHostAgent | Select-Object Name, Status + + # Verify Network Controller connectivity by querying server resources + $nchaParams = Get-ItemProperty "HKLM:\SYSTEM\CurrentControlSet\Services\NcHostAgent\Parameters" + $ncUri = "https://$($nchaParams.PeerCertificateCName)" + Get-NetworkControllerServer -ConnectionUri $ncUri + ``` + + A successful response listing server resources confirms that connectivity has been restored. + +## Prevention + +To prevent this issue from recurring on future solution updates: + +- Ensure the `NetworkControllerNodeNames` registry key is populated on **all** Hyper-V hosts before performing a solution update +- Verify local administrator permissions on NC VMs are correctly configured before updates +- After any solution update, validate NcHostAgent connectivity to the Network Controller as part of post-update verification + +## Related Issues + +- [Troubleshoot: Host Not Connected to Controller](Troubleshoot-SDNExpress-HealthAlert-HostNotConnectedToController.md) +- [Troubleshoot: Host Unreachable](Troubleshoot-SDNExpress-HealthAlert-HostUnreachable.md) +- [SdnDiagnostics Wiki](https://github.com/microsoft/SdnDiagnostics/wiki) + +--- From 604c350b87dc3e1d65f1062971b91f0d0651eb07 Mon Sep 17 00:00:00 2001 From: Adam Rudell Date: Fri, 17 Apr 2026 15:26:07 -0500 Subject: [PATCH 02/11] Expand Root Cause section into three sub-causes Break the Root Cause into distinct sub-sections: 1. Missing NetworkControllerNodeNames registry key 2. HCI admin user not in Local Administrators on NC VMs 3. Code defect in certificate rotation logic (fix in progress) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- ...onnect-ApiService-After-Solution-Update.md | 38 ++++++++++++++----- 1 file changed, 29 insertions(+), 9 deletions(-) diff --git a/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md b/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md index cbf07370..4500fcc3 100644 --- a/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md +++ b/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md @@ -29,23 +29,32 @@ This issue recurs after every subsequent solution update until the underlying ro **Common error messages:** -The NcHostAgent service on the Hyper-V hosts reports connectivity failures to the Network Controller ApiService. +ConfigurationState reported for /servers/{resourceId} may show failure indicating: ``` -NcHostAgent unable to connect to ApiService +PolicyConfigurationFailure +``` + +If you run `Debug-SdnFabricInfrastructure`, you may see the following test reporting failure: + +``` +Test-SdnHostAgentConnectionStateToApiService +Description = "Network Controller Host Agent is not connected to the Network Controller API Service." +Impact = "Policy configuration may not be pushed to the Hyper-V host(s) if no southbound connectivity is available." ``` **Observable behaviors:** -- Network Controller cannot manage Hyper-V hosts after a solution update -- SDN policies are no longer applied to tenant VMs -- Virtual network connectivity may be disrupted -- NcHostAgent service on the hosts fails to authenticate with Network Controller -- The issue reoccurs after each subsequent solution update +- Network Controller is unable to program VFP policies to VMs, resulting. +- VM traffic for VMs, especially after live-migration may break. ## Root Cause -During a solution update, the host certificates are rotated by ECE (Environment Configuration Engine). For the certificate rotation to also propagate the AzureStackCertificationAuthority certificate to the Network Controller VMs, the following registry key must exist and be populated on each Hyper-V host: +During a solution update, the host certificates are rotated by ECE (Environment Configuration Engine). For the certificate rotation to also propagate the AzureStackCertificationAuthority certificate to the Network Controller VMs, several conditions must be met. A failure in any of the following causes the NcHostAgent-to-ApiService connection to break after the update. + +### 1. Missing NetworkControllerNodeNames Registry Key + +The certificate rotation process depends on the following registry key being present and populated on each Hyper-V host: ``` HKLM:\SYSTEM\CurrentControlSet\Services\NcHostAgent\Parameters\NetworkControllerNodeNames @@ -53,7 +62,18 @@ HKLM:\SYSTEM\CurrentControlSet\Services\NcHostAgent\Parameters\NetworkController This registry key is created by Windows Admin Center (WAC) during SDN deployment. In environments where WAC was not used (e.g., SdnExpress-based deployments or brownfield configurations), this registry key does not exist. When the key is missing, the secret rotation process silently skips the NC and SLB certificate injection and incorrectly reports success. -As a result, after the update the hosts present new certificates that the Network Controller does not trust, breaking the NcHostAgent-to-ApiService connection. +### 2. HCI Admin User Not Added to Local Administrators on NC VMs + +The certificate rotation process uses PowerShell remoting (`Invoke-Command`) from the Hyper-V hosts to the Network Controller VMs to install the updated certificates. If the cluster node machine accounts or the HCI admin user are not members of the Local Administrators group on the NC VMs, the remote commands fail and certificate propagation does not complete. + +### 3. Code Defect in Certificate Rotation Logic + +There is a known defect in the certificate rotation logic where: + +- The `Get-VM` lookup assumes VM names match the FQDN or NetBIOS name, which is not always true +- An `Invoke-Command` argument handling issue (extra comma) results in a null argument being passed, causing the rotation to be silently skipped + +> **Note:** A fix for these code defects is currently in development and will be addressed in a future update. ## Resolution From 08a1f7abceabbdaaa6863ac607035af7a9fcb8aa Mon Sep 17 00:00:00 2001 From: Adam Rudell Date: Fri, 17 Apr 2026 16:05:37 -0500 Subject: [PATCH 03/11] save latest changes --- ...onnect-ApiService-After-Solution-Update.md | 179 +++++++----------- 1 file changed, 73 insertions(+), 106 deletions(-) diff --git a/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md b/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md index 4500fcc3..c0c50fae 100644 --- a/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md +++ b/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md @@ -15,7 +15,7 @@ Affected Versions - All versions + 2506,2507,2508,2509,2510,2511,2512,2601,2602,2603,2604 @@ -60,131 +60,98 @@ The certificate rotation process depends on the following registry key being pre HKLM:\SYSTEM\CurrentControlSet\Services\NcHostAgent\Parameters\NetworkControllerNodeNames ``` -This registry key is created by Windows Admin Center (WAC) during SDN deployment. In environments where WAC was not used (e.g., SdnExpress-based deployments or brownfield configurations), this registry key does not exist. When the key is missing, the secret rotation process silently skips the NC and SLB certificate injection and incorrectly reports success. +This registry key is created by Windows Admin Center (WAC) and is created when you use WAC to manage Network Controller. In environments where WAC is not used this registry key does not exist. If this key is not present for SdnExpress deployments, we will not rotate the AzureStackCertificationAuthority. ### 2. HCI Admin User Not Added to Local Administrators on NC VMs -The certificate rotation process uses PowerShell remoting (`Invoke-Command`) from the Hyper-V hosts to the Network Controller VMs to install the updated certificates. If the cluster node machine accounts or the HCI admin user are not members of the Local Administrators group on the NC VMs, the remote commands fail and certificate propagation does not complete. +The certificate rotation process uses PowerShell remoting (`Invoke-Command`) from the Hyper-V hosts to the Network Controller VMs to install the updated certificates. If the HCI admin is not a member of the Local Administrators group on the NC VMs, the remote commands fail and certificate propagation does not complete. ### 3. Code Defect in Certificate Rotation Logic -There is a known defect in the certificate rotation logic where: - -- The `Get-VM` lookup assumes VM names match the FQDN or NetBIOS name, which is not always true -- An `Invoke-Command` argument handling issue (extra comma) results in a null argument being passed, causing the rotation to be silently skipped +There is a known defect in the certificate rotation logic where incorrect name validation is performed, resulting in us skipping the rotate and marking the step as Success. > **Note:** A fix for these code defects is currently in development and will be addressed in a future update. ## Resolution -### Prerequisites - -- Administrative access to the Hyper-V host nodes -- Administrative access to the Network Controller VMs -- [SdnDiagnostics](https://github.com/microsoft/SdnDiagnostics/wiki) PowerShell module installed - -### Steps - -1. **Validate the registry key exists and is populated** - - On each Hyper-V host, verify that the `NetworkControllerNodeNames` registry key exists and contains the Network Controller VM names. - - ```powershell - # Check if the registry key exists and has a value - $regPath = "HKLM:\SYSTEM\CurrentControlSet\Services\NcHostAgent\Parameters" - $regValue = Get-ItemProperty -Path $regPath -Name "NetworkControllerNodeNames" -ErrorAction SilentlyContinue - - if ($null -eq $regValue) { - Write-Host "NetworkControllerNodeNames registry key is MISSING." -ForegroundColor Red - } - elseif ([string]::IsNullOrWhiteSpace($regValue.NetworkControllerNodeNames)) { - Write-Host "NetworkControllerNodeNames registry key EXISTS but is EMPTY." -ForegroundColor Yellow - } - else { - Write-Host "NetworkControllerNodeNames: $($regValue.NetworkControllerNodeNames)" -ForegroundColor Green - } - ``` - - If the key is missing or empty, populate it with the Network Controller VM names (comma-separated): - - ```powershell - # Replace ,, with your actual NC VM names - $ncNodeNames = ",," - Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Services\NcHostAgent\Parameters" -Name "NetworkControllerNodeNames" -Value $ncNodeNames - ``` - - > **Important:** Repeat this step on every Hyper-V host in the cluster. - -2. **Validate local administrator permissions on Network Controller VMs** - - The certificate rotation process requires that the Hyper-V hosts can remotely manage the Network Controller VMs. Verify that the cluster nodes have local administrator permissions on each NC VM. - - ```powershell - # Run from a Hyper-V host against each NC VM - # Replace with your NC VM name - Invoke-Command -ComputerName -ScriptBlock { - $adminGroup = Get-LocalGroupMember -Group "Administrators" -ErrorAction SilentlyContinue - $adminGroup | Format-Table Name, ObjectClass, PrincipalSource - } - ``` +### Missing NetworkControllerNodeNames Registry Key - If the cluster nodes are not listed as local administrators, add them: +On each Hyper-V host, verify that the `NetworkControllerNodeNames` registry key exists and contains the Network Controller VM names. - ```powershell - # Run on each NC VM - # Replace with your cluster node machine account - Invoke-Command -ComputerName -ScriptBlock { - Add-LocalGroupMember -Group "Administrators" -Member "" - } - ``` +```powershell +# Check if the registry key exists and has a value +$regPath = "HKLM:\SYSTEM\CurrentControlSet\Services\NcHostAgent\Parameters" +$regValue = Get-ItemProperty -Path $regPath -Name "NetworkControllerNodeNames" -ErrorAction SilentlyContinue - > **Important:** Repeat this for each Hyper-V host against every NC VM. - -3. **Run certificate rotation to restore connectivity (short-term mitigation)** - - Use `Start-SdnServerCertificateRotation` from the SdnDiagnostics module to manually trigger certificate rotation and restore the NcHostAgent-to-ApiService connection. - - ```powershell - # Import the SdnDiagnostics module if not already loaded - Import-Module SdnDiagnostics - - # Generate the SdnDiagnostics credential (NC admin credentials) - $credential = Get-Credential - - # Start the certificate rotation - Start-SdnServerCertificateRotation -Credential $credential - ``` - - > **Note:** This is a short-term mitigation. The issue will recur on the next solution update unless the registry key (Step 1) is properly configured on all hosts before the update. - -4. **Verify resolution** - - Confirm that the NcHostAgent service on each host can now communicate with the Network Controller. +if ($null -eq $regValue) { + Write-Host "NetworkControllerNodeNames registry key is MISSING." -ForegroundColor Red +} +elseif ([string]::IsNullOrWhiteSpace($regValue.NetworkControllerNodeNames)) { + Write-Host "NetworkControllerNodeNames registry key EXISTS but is EMPTY." -ForegroundColor Yellow +} +else { + Write-Host "NetworkControllerNodeNames: $($regValue.NetworkControllerNodeNames)" -ForegroundColor Green +} +``` - ```powershell - # Check NcHostAgent service status on each host - Get-Service -Name NcHostAgent | Select-Object Name, Status +If the key is missing or empty, populate it with the Network Controller VM names (comma-separated): - # Verify Network Controller connectivity by querying server resources - $nchaParams = Get-ItemProperty "HKLM:\SYSTEM\CurrentControlSet\Services\NcHostAgent\Parameters" - $ncUri = "https://$($nchaParams.PeerCertificateCName)" - Get-NetworkControllerServer -ConnectionUri $ncUri - ``` +```powershell +# Replace ,, with your actual NC VM names +$ncNodeNames = ",," +Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Services\NcHostAgent\Parameters" -Name "NetworkControllerNodeNames" -Value $ncNodeNames +``` - A successful response listing server resources confirms that connectivity has been restored. +> **Important:** Repeat this step on every Hyper-V host in the cluster. -## Prevention +### HCI Admin User Not Added to Local Administrators on NC VMs -To prevent this issue from recurring on future solution updates: +The certificate rotation process requires that the Hyper-V hosts can remotely manage the Network Controller VMs. Verify that the cluster nodes have local administrator permissions on each NC VM. -- Ensure the `NetworkControllerNodeNames` registry key is populated on **all** Hyper-V hosts before performing a solution update -- Verify local administrator permissions on NC VMs are correctly configured before updates -- After any solution update, validate NcHostAgent connectivity to the Network Controller as part of post-update verification +```powershell +# Run from a Hyper-V host against each NC VM +# Replace with your NC VM name +Invoke-Command -ComputerName -ScriptBlock { + $adminGroup = Get-LocalGroupMember -Group "Administrators" -ErrorAction SilentlyContinue + $adminGroup | Format-Table Name, ObjectClass, PrincipalSource +} +``` -## Related Issues +If the cluster nodes are not listed as local administrators, add them: -- [Troubleshoot: Host Not Connected to Controller](Troubleshoot-SDNExpress-HealthAlert-HostNotConnectedToController.md) -- [Troubleshoot: Host Unreachable](Troubleshoot-SDNExpress-HealthAlert-HostUnreachable.md) -- [SdnDiagnostics Wiki](https://github.com/microsoft/SdnDiagnostics/wiki) +```powershell +# Run on each NC VM +# Replace with your cluster node machine account +Invoke-Command -ComputerName -ScriptBlock { + Add-LocalGroupMember -Group "Administrators" -Member "" +} +``` ---- +> **Important:** Repeat this for each Hyper-V host against every NC VM. + +### Install AzureStackCertificationAuthortity certificate to NC manually +This is a short term mitigation that you should perform as part of your post solution update steps until a code fix is released. This only needs to be executed from one of the Hyper-V hosts. + +This requires [SdnDiagnostics](https://learn.microsoft.com/en-us/azure/azure-local/manage/sdn-log-collection?view=azloc-2603#install-the-sdn-diagnostics-powershell-module-on-the-client-computer) to be installed on the Network Controller VMs as well as the Hyper-V hosts. By default for Azure Local, SdnDiagnostics is included for the Hyper-V hosts. + +1. Connect to a Hyper-V host and copy the .cer file to each NC node. + ```powershell + $currentVersion = (Get-SolutionUpdateEnvironment -ErrorAction Stop).CurrentVersion + if ($($currentVersion.Minor) -ge 2604) { + $rootDir = 'C:\ProgramData\AzureEdge\CertificateStore\LocalMachine\Root' + } + else { + $rootDir = 'C:\ClusterStorage\Infrastructure_1\Shares\SU1_Infrastructure_1\AzureStackCertificateAuthority' + } + Copy-SdnFileToComputer -Path (Join-Path -Path $rootDir -ChildPath "AzureStackCertificationAuthority.cer" -Destination (Get-SdnWorkingDirectory) -ComputerName 'NC1','NC2','NC3' + ``` + +1. Install the certificate into trusted root store. + ```powershell + Invoke-SdnCommand -ComputerName 'NC1','NC2','NC3' -ScriptBlock { + $cert = Get-ChildItem -Path "$(Get-SdnWorkingDirectory)\AzureStackCertificationAuthority.cer" + Import-SdnCertificate -FilePath $cert.FullName -CertStore 'Cert:\LocalMachine\Root' + } + ``` + +Alternatively, if you have SdnDiagnostics with version 4.2604 or later, you can leverage [Start-SdnServerCertificateRotation](https://learn.microsoft.com/en-us/azure/azure-local/manage/update-sdn-infrastructure-certificates) from the Hyper-V host which will automatically detect the AzureStackCertificationAuthority certificate and copy to Network Controller VMs. From ce3ca33a1ff39fcafd1a0aabbc33b54e79152293 Mon Sep 17 00:00:00 2001 From: Adam Rudell Date: Fri, 17 Apr 2026 16:16:37 -0500 Subject: [PATCH 04/11] add steps to find the deploy user --- ...able-To-Connect-ApiService-After-Solution-Update.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md b/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md index c0c50fae..1ee10af5 100644 --- a/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md +++ b/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md @@ -62,9 +62,9 @@ HKLM:\SYSTEM\CurrentControlSet\Services\NcHostAgent\Parameters\NetworkController This registry key is created by Windows Admin Center (WAC) and is created when you use WAC to manage Network Controller. In environments where WAC is not used this registry key does not exist. If this key is not present for SdnExpress deployments, we will not rotate the AzureStackCertificationAuthority. -### 2. HCI Admin User Not Added to Local Administrators on NC VMs +### 2. HCI Deployment User Not Added to Local Administrators on NC VMs -The certificate rotation process uses PowerShell remoting (`Invoke-Command`) from the Hyper-V hosts to the Network Controller VMs to install the updated certificates. If the HCI admin is not a member of the Local Administrators group on the NC VMs, the remote commands fail and certificate propagation does not complete. +The certificate rotation process uses PowerShell remoting (`Invoke-Command`) from the Hyper-V hosts to the Network Controller VMs to install the updated certificates. If the HCI deployment user is not a member of the Local Administrators group on the NC VMs, the remote commands fail and certificate propagation does not complete. ### 3. Code Defect in Certificate Rotation Logic @@ -104,9 +104,9 @@ Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Services\NcHostAgent\Para > **Important:** Repeat this step on every Hyper-V host in the cluster. -### HCI Admin User Not Added to Local Administrators on NC VMs +### HCI Deployment User Not Added to Local Administrators on NC VMs -The certificate rotation process requires that the Hyper-V hosts can remotely manage the Network Controller VMs. Verify that the cluster nodes have local administrator permissions on each NC VM. +The certificate rotation process requires that the Hyper-V hosts can remotely manage the Network Controller VMs. Verify that the cluster nodes have local administrator permissions on each NC VM. To determine your deployment user, leverage `Get-AzsSupportLcmDeploymentUserName` included with the [Support Diagnostics Tool](https://learn.microsoft.com/en-us/azure/azure-local/manage/support-tools). ```powershell # Run from a Hyper-V host against each NC VM @@ -123,7 +123,7 @@ If the cluster nodes are not listed as local administrators, add them: # Run on each NC VM # Replace with your cluster node machine account Invoke-Command -ComputerName -ScriptBlock { - Add-LocalGroupMember -Group "Administrators" -Member "" + Add-LocalGroupMember -Group "Administrators" -Member "" } ``` From a1eb93c7e58ad41aa2ae491679d76843b482cd3b Mon Sep 17 00:00:00 2001 From: Adam Rudell Date: Fri, 17 Apr 2026 16:19:53 -0500 Subject: [PATCH 05/11] update behaviors --- ...ent-Unable-To-Connect-ApiService-After-Solution-Update.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md b/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md index 1ee10af5..ab422726 100644 --- a/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md +++ b/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md @@ -45,8 +45,9 @@ Impact = "Policy configuration may not be pushed to the Hyper-V host(s) if no so **Observable behaviors:** -- Network Controller is unable to program VFP policies to VMs, resulting. -- VM traffic for VMs, especially after live-migration may break. +- Network Controller is unable to program VFP policies to VMs. +- Network connecitvity issues for workloads. +- Network connectivity issues, after performing live-migration. ## Root Cause From 3ab3bd0f7bb80945df1be56f7c0c9e598019cfd3 Mon Sep 17 00:00:00 2001 From: Adam Rudell Date: Fri, 17 Apr 2026 16:21:57 -0500 Subject: [PATCH 06/11] update naming --- ...tAgent-Unable-To-Connect-ApiService-After-Solution-Update.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md b/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md index ab422726..393ff9f7 100644 --- a/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md +++ b/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md @@ -51,7 +51,7 @@ Impact = "Policy configuration may not be pushed to the Hyper-V host(s) if no so ## Root Cause -During a solution update, the host certificates are rotated by ECE (Environment Configuration Engine). For the certificate rotation to also propagate the AzureStackCertificationAuthority certificate to the Network Controller VMs, several conditions must be met. A failure in any of the following causes the NcHostAgent-to-ApiService connection to break after the update. +During a solution update, the host certificates are rotated automatically. For the certificate rotation to also propagate the AzureStackCertificationAuthority certificate to the Network Controller VMs, several conditions must be met. A failure in any of the following causes the NcHostAgent-to-ApiService connection to break after the update. ### 1. Missing NetworkControllerNodeNames Registry Key From e88c964255cbb4d0e060d88c77afc54e8e25bcd2 Mon Sep 17 00:00:00 2001 From: Adam Rudell Date: Fri, 17 Apr 2026 16:23:42 -0500 Subject: [PATCH 07/11] fix formatting --- ...onnect-ApiService-After-Solution-Update.md | 44 +++++++++---------- 1 file changed, 21 insertions(+), 23 deletions(-) diff --git a/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md b/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md index 393ff9f7..d1172829 100644 --- a/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md +++ b/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md @@ -131,28 +131,26 @@ Invoke-Command -ComputerName -ScriptBlock { > **Important:** Repeat this for each Hyper-V host against every NC VM. ### Install AzureStackCertificationAuthortity certificate to NC manually -This is a short term mitigation that you should perform as part of your post solution update steps until a code fix is released. This only needs to be executed from one of the Hyper-V hosts. - -This requires [SdnDiagnostics](https://learn.microsoft.com/en-us/azure/azure-local/manage/sdn-log-collection?view=azloc-2603#install-the-sdn-diagnostics-powershell-module-on-the-client-computer) to be installed on the Network Controller VMs as well as the Hyper-V hosts. By default for Azure Local, SdnDiagnostics is included for the Hyper-V hosts. - -1. Connect to a Hyper-V host and copy the .cer file to each NC node. - ```powershell - $currentVersion = (Get-SolutionUpdateEnvironment -ErrorAction Stop).CurrentVersion - if ($($currentVersion.Minor) -ge 2604) { - $rootDir = 'C:\ProgramData\AzureEdge\CertificateStore\LocalMachine\Root' - } - else { - $rootDir = 'C:\ClusterStorage\Infrastructure_1\Shares\SU1_Infrastructure_1\AzureStackCertificateAuthority' - } - Copy-SdnFileToComputer -Path (Join-Path -Path $rootDir -ChildPath "AzureStackCertificationAuthority.cer" -Destination (Get-SdnWorkingDirectory) -ComputerName 'NC1','NC2','NC3' - ``` - -1. Install the certificate into trusted root store. - ```powershell - Invoke-SdnCommand -ComputerName 'NC1','NC2','NC3' -ScriptBlock { - $cert = Get-ChildItem -Path "$(Get-SdnWorkingDirectory)\AzureStackCertificationAuthority.cer" - Import-SdnCertificate -FilePath $cert.FullName -CertStore 'Cert:\LocalMachine\Root' - } - ``` +This is a short term mitigation that you should perform as part of your post solution update steps until a code fix is released. This only needs to be executed from one of the Hyper-V hosts. This requires [SdnDiagnostics](https://learn.microsoft.com/en-us/azure/azure-local/manage/sdn-log-collection?view=azloc-2603#install-the-sdn-diagnostics-powershell-module-on-the-client-computer) to be installed on the Network Controller VMs as well as the Hyper-V hosts. By default for Azure Local, SdnDiagnostics is included for the Hyper-V hosts. + +Connect to a Hyper-V host and copy the .cer file to each NC node. + ```powershell + $currentVersion = (Get-SolutionUpdateEnvironment -ErrorAction Stop).CurrentVersion + if ($($currentVersion.Minor) -ge 2604) { + $rootDir = 'C:\ProgramData\AzureEdge\CertificateStore\LocalMachine\Root' + } + else { + $rootDir = 'C:\ClusterStorage\Infrastructure_1\Shares\SU1_Infrastructure_1\AzureStackCertificateAuthority' + } + Copy-SdnFileToComputer -Path (Join-Path -Path $rootDir -ChildPath "AzureStackCertificationAuthority.cer" -Destination (Get-SdnWorkingDirectory) -ComputerName 'NC1','NC2','NC3' + ``` + +Install the certificate into trusted root store. + ```powershell + Invoke-SdnCommand -ComputerName 'NC1','NC2','NC3' -ScriptBlock { + $cert = Get-ChildItem -Path "$(Get-SdnWorkingDirectory)\AzureStackCertificationAuthority.cer" + Import-SdnCertificate -FilePath $cert.FullName -CertStore 'Cert:\LocalMachine\Root' + } + ``` Alternatively, if you have SdnDiagnostics with version 4.2604 or later, you can leverage [Start-SdnServerCertificateRotation](https://learn.microsoft.com/en-us/azure/azure-local/manage/update-sdn-infrastructure-certificates) from the Hyper-V host which will automatically detect the AzureStackCertificationAuthority certificate and copy to Network Controller VMs. From d33826554ee24ce4b859b88362d68257ea18131c Mon Sep 17 00:00:00 2001 From: Adam Rudell Date: Fri, 17 Apr 2026 16:24:11 -0500 Subject: [PATCH 08/11] Update TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- ...gent-Unable-To-Connect-ApiService-After-Solution-Update.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md b/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md index d1172829..8c6dc67b 100644 --- a/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md +++ b/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md @@ -46,8 +46,8 @@ Impact = "Policy configuration may not be pushed to the Hyper-V host(s) if no so **Observable behaviors:** - Network Controller is unable to program VFP policies to VMs. -- Network connecitvity issues for workloads. -- Network connectivity issues, after performing live-migration. +- Network connectivity issues for workloads. +- Network connectivity issues after performing live migration. ## Root Cause From 78ce4c55424c111bff4b6642ecf98b582099b992 Mon Sep 17 00:00:00 2001 From: Adam Rudell Date: Fri, 17 Apr 2026 16:24:59 -0500 Subject: [PATCH 09/11] Update TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- ...tAgent-Unable-To-Connect-ApiService-After-Solution-Update.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md b/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md index 8c6dc67b..7e222a01 100644 --- a/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md +++ b/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md @@ -130,7 +130,7 @@ Invoke-Command -ComputerName -ScriptBlock { > **Important:** Repeat this for each Hyper-V host against every NC VM. -### Install AzureStackCertificationAuthortity certificate to NC manually +### Install AzureStackCertificationAuthority certificate to NC manually This is a short term mitigation that you should perform as part of your post solution update steps until a code fix is released. This only needs to be executed from one of the Hyper-V hosts. This requires [SdnDiagnostics](https://learn.microsoft.com/en-us/azure/azure-local/manage/sdn-log-collection?view=azloc-2603#install-the-sdn-diagnostics-powershell-module-on-the-client-computer) to be installed on the Network Controller VMs as well as the Hyper-V hosts. By default for Azure Local, SdnDiagnostics is included for the Hyper-V hosts. Connect to a Hyper-V host and copy the .cer file to each NC node. From 3ad6fda88d5453975ba9399e4b24116a71294f8b Mon Sep 17 00:00:00 2001 From: Adam Rudell Date: Fri, 17 Apr 2026 16:29:21 -0500 Subject: [PATCH 10/11] apply copilot suggestions --- ...onnect-ApiService-After-Solution-Update.md | 22 ++++++++----------- 1 file changed, 9 insertions(+), 13 deletions(-) diff --git a/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md b/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md index d1172829..73c2a786 100644 --- a/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md +++ b/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md @@ -107,31 +107,27 @@ Set-ItemProperty -Path "HKLM:\SYSTEM\CurrentControlSet\Services\NcHostAgent\Para ### HCI Deployment User Not Added to Local Administrators on NC VMs -The certificate rotation process requires that the Hyper-V hosts can remotely manage the Network Controller VMs. Verify that the cluster nodes have local administrator permissions on each NC VM. To determine your deployment user, leverage `Get-AzsSupportLcmDeploymentUserName` included with the [Support Diagnostics Tool](https://learn.microsoft.com/en-us/azure/azure-local/manage/support-tools). +The certificate rotation process requires that the Hyper-V hosts can remotely manage the Network Controller VMs. Verify that the Azure Local Deployment User has local administrator permissions on each NC VM. To determine your deployment user, leverage `Get-AzsSupportLcmDeploymentUserName` included with the [Support Diagnostics Tool](https://learn.microsoft.com/en-us/azure/azure-local/manage/support-tools). ```powershell -# Run from a Hyper-V host against each NC VM -# Replace with your NC VM name -Invoke-Command -ComputerName -ScriptBlock { +# run this against all the NC VMs +Invoke-Command -ComputerName 'NC1','NC2','NC3' -ScriptBlock { $adminGroup = Get-LocalGroupMember -Group "Administrators" -ErrorAction SilentlyContinue $adminGroup | Format-Table Name, ObjectClass, PrincipalSource } ``` -If the cluster nodes are not listed as local administrators, add them: +If the deployment user is not listed as local administrators, add them: ```powershell -# Run on each NC VM -# Replace with your cluster node machine account -Invoke-Command -ComputerName -ScriptBlock { - Add-LocalGroupMember -Group "Administrators" -Member "" +# Replace with your deployment user account identified +Invoke-Command -ComputerName 'NC1','NC2','NC3' -ScriptBlock { + Add-LocalGroupMember -Group "Administrators" -Member "" } ``` -> **Important:** Repeat this for each Hyper-V host against every NC VM. - ### Install AzureStackCertificationAuthortity certificate to NC manually -This is a short term mitigation that you should perform as part of your post solution update steps until a code fix is released. This only needs to be executed from one of the Hyper-V hosts. This requires [SdnDiagnostics](https://learn.microsoft.com/en-us/azure/azure-local/manage/sdn-log-collection?view=azloc-2603#install-the-sdn-diagnostics-powershell-module-on-the-client-computer) to be installed on the Network Controller VMs as well as the Hyper-V hosts. By default for Azure Local, SdnDiagnostics is included for the Hyper-V hosts. +This is a short term mitigation that you should perform as part of your post solution update steps until a code fix is released. This only needs to be executed from one of the Hyper-V hosts.This requires [SdnDiagnostics](https://learn.microsoft.com/en-us/azure/azure-local/manage/sdn-log-collection?view=azloc-2603#install-the-sdn-diagnostics-powershell-module-on-the-client-computer) to be installed on the Network Controller VMs as well as the Hyper-V hosts. By default for Azure Local, SdnDiagnostics is included for the Hyper-V hosts. Connect to a Hyper-V host and copy the .cer file to each NC node. ```powershell @@ -142,7 +138,7 @@ Connect to a Hyper-V host and copy the .cer file to each NC node. else { $rootDir = 'C:\ClusterStorage\Infrastructure_1\Shares\SU1_Infrastructure_1\AzureStackCertificateAuthority' } - Copy-SdnFileToComputer -Path (Join-Path -Path $rootDir -ChildPath "AzureStackCertificationAuthority.cer" -Destination (Get-SdnWorkingDirectory) -ComputerName 'NC1','NC2','NC3' + Copy-SdnFileToComputer -Path (Join-Path -Path $rootDir -ChildPath "AzureStackCertificationAuthority.cer") -Destination (Get-SdnWorkingDirectory) -ComputerName 'NC1','NC2','NC3' ``` Install the certificate into trusted root store. From 8d076e74821e25bc80aa177d682c677f06fa64ce Mon Sep 17 00:00:00 2001 From: Adam Rudell Date: Fri, 17 Apr 2026 16:31:24 -0500 Subject: [PATCH 11/11] remove version from hyperlink --- ...tAgent-Unable-To-Connect-ApiService-After-Solution-Update.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md b/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md index 7e11be8d..594cc562 100644 --- a/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md +++ b/TSG/Networking/SDN-Express/Troubleshoot-SDNExpress-NcHostAgent-Unable-To-Connect-ApiService-After-Solution-Update.md @@ -127,7 +127,7 @@ Invoke-Command -ComputerName 'NC1','NC2','NC3' -ScriptBlock { ``` ### Install AzureStackCertificationAuthority certificate to NC manually -This is a short term mitigation that you should perform as part of your post solution update steps until a code fix is released. This only needs to be executed from one of the Hyper-V hosts.This requires [SdnDiagnostics](https://learn.microsoft.com/en-us/azure/azure-local/manage/sdn-log-collection?view=azloc-2603#install-the-sdn-diagnostics-powershell-module-on-the-client-computer) to be installed on the Network Controller VMs as well as the Hyper-V hosts. By default for Azure Local, SdnDiagnostics is included for the Hyper-V hosts. +This is a short term mitigation that you should perform as part of your post solution update steps until a code fix is released. This only needs to be executed from one of the Hyper-V hosts.This requires [SdnDiagnostics](https://learn.microsoft.com/en-us/azure/azure-local/manage/sdn-log-collection#install-the-sdn-diagnostics-powershell-module-on-the-client-computer) to be installed on the Network Controller VMs as well as the Hyper-V hosts. By default for Azure Local, SdnDiagnostics is included for the Hyper-V hosts. Connect to a Hyper-V host and copy the .cer file to each NC node. ```powershell