Skip to content

Fix intermittent installer hang on Windows#68870

Open
twangboy wants to merge 2 commits intosaltstack:3006.xfrom
twangboy:fix_nsis
Open

Fix intermittent installer hang on Windows#68870
twangboy wants to merge 2 commits intosaltstack:3006.xfrom
twangboy:fix_nsis

Conversation

@twangboy
Copy link
Copy Markdown
Contributor

@twangboy twangboy commented Apr 1, 2026

What does this PR do?

Fix intermittent silent-mode installer hang on Windows; add NSIS test infrastructure

Upstream 3006.x worked around a known silent-mode installer hang with a wmic process kill — deprecated on Windows 11 and unreliable. This change fixes the root causes and replaces the workaround.

Salt-Minion-Setup.nsi:

  • Replace wmic kill in .onInstSuccess and un.onUninstSuccess with System::Call "kernel32::ExitProcess(i 0)" guarded by ${If} ${Silent}. ExitProcess bypasses the NSIS message loop entirely, avoiding the cross-thread deadlock between the exec thread and UI thread that caused the hang.
  • Replace nsExec::ExecToStack for service start with SimpleSC::StartService "salt-minion" "" 30. SimpleSC calls the SCM API inside the plugin DLL with no child process, pipe, or handle inheritance — eliminating the deadlock source.
  • Replace nsExec::ExecToStack for stop/remove in un.uninstallSalt with SimpleSC::StopService (wait_for_file_release=1, timeout=30) and SimpleSC::RemoveService. Remove the 6-second sleep and three-pass PowerShell kill loop that were compensating for the old approach.
  • Switch all nsExec::ExecToStack ssm.exe set calls in -Post and .onInstSuccess to nsExec::Exec (fire-and-forget); configuration commands need no stdout capture.
  • Add taskkill belt-and-suspenders after SimpleSC stop/remove.
  • Guard SetAutoClose true in uninstallSalt with !ifdef UNINSTALL so it does not fire during the upgrade path (.onInit calling uninstallSalt).
  • Add SetAutoClose true in .onInit for silent mode and un.onInit.
  • Push/Pop $R0 around Call uninstallSalt in .onInit: GetParent inside the function writes to $R0, clobbering the saved silent flag used to conditionally restore SetSilent normal afterward.
  • Fix typo "mnually" and add missing Quit after VCRedist error.
  • Remove InstallDirRegKey (unused).

conftest.py:

  • Replace subprocess.PIPE with subprocess.DEVNULL. NSIS Exec passes inherited pipe handles to child processes (nssm, salt-minion), which hold the write-end open indefinitely; proc.communicate() never sees EOF, causing the test to block forever.
  • Replace proc.communicate() with proc.wait() for the same reason.
  • Reduce run_command timeout from 300 to 60 seconds.
  • Add _kill_process_tree() to terminate the entire process hierarchy on timeout (not just the top-level installer process).
  • Return True/False from run_command and assert in install_salt so a force-kill fails the test — detecting hangs is the purpose of this suite.
  • Add _wait_for_processes() with SCM registry polling: waits for all PROCESSES to exit, INSTDIR to be deleted, and the salt-minion SCM key to disappear before the next iteration. Prevents ERROR_SERVICE_MARKED_FOR_DELETE races between iterations.
  • Fix duplicate Un_D.exe entry in PROCESSES (should be Un_E.exe).

Test stubs (new files):

  • stubs/daemon/: Go program that blocks on SIGINT, used as a realistic salt-minion.exe stub that NSSM can manage as a service.
  • stubs/exit/: Go program that exits immediately with code 0, used as vcredist stub so ExecWait in the installer succeeds.

setup.ps1 / quick_setup.ps1:

  • Build stubs with go build -C so Go module resolution finds go.mod regardless of the caller's working directory.
  • Replace fake Set-Content vcredist binaries with real go build output.
  • Add Go 1.20+ prerequisite check: look in PATH, then fall back to C:\Program Files\Go and C:\Go; download and install silently from salt-windows-deps if missing.

install_nsis.ps1:

  • Update NSIS download URL from 3.10 to 3.11.
  • Add SimpleSC plugin install section (ANSI + Unicode DLLs from nsis-plugin-simplesc.zip and nsis-plugin-simplescu.zip).

test.ps1:

Previous Behavior

NSIS Tests were failing, but not reporting failure.
Installer would hang intermittently.

New Behavior

Test failiures now bubble up.
Installer doesn't hang.

Merge requirements satisfied?

[NOTICE] Bug fixes or features added to Salt require tests.

Commits signed with GPG?

Yes

@twangboy twangboy requested a review from a team as a code owner April 1, 2026 18:35
@twangboy twangboy self-assigned this Apr 1, 2026
@twangboy twangboy added the test:full Run the full test suite label Apr 1, 2026
@twangboy twangboy added this to the Sulpher v3006.24 milestone Apr 1, 2026
twangboy added 2 commits April 1, 2026 15:01
Fix intermittent silent-mode installer hang on Windows; add NSIS test infrastructure

Upstream 3006.x worked around a known silent-mode installer hang with
a wmic process kill — deprecated on Windows 11 and unreliable. This
change fixes the root causes and replaces the workaround.

Salt-Minion-Setup.nsi:
- Replace wmic kill in .onInstSuccess and un.onUninstSuccess with
  System::Call "kernel32::ExitProcess(i 0)" guarded by ${If} ${Silent}.
  ExitProcess bypasses the NSIS message loop entirely, avoiding the
  cross-thread deadlock between the exec thread and UI thread that
  caused the hang.
- Replace nsExec::ExecToStack for service start with
  SimpleSC::StartService "salt-minion" "" 30. SimpleSC calls the SCM
  API inside the plugin DLL with no child process, pipe, or handle
  inheritance — eliminating the deadlock source.
- Replace nsExec::ExecToStack for stop/remove in un.uninstallSalt with
  SimpleSC::StopService (wait_for_file_release=1, timeout=30) and
  SimpleSC::RemoveService. Remove the 6-second sleep and three-pass
  PowerShell kill loop that were compensating for the old approach.
- Switch all nsExec::ExecToStack ssm.exe set calls in -Post and
  .onInstSuccess to nsExec::Exec (fire-and-forget); configuration
  commands need no stdout capture.
- Add taskkill belt-and-suspenders after SimpleSC stop/remove.
- Guard SetAutoClose true in uninstallSalt with !ifdef __UNINSTALL__
  so it does not fire during the upgrade path (.onInit calling
  uninstallSalt).
- Add SetAutoClose true in .onInit for silent mode and un.onInit.
- Push/Pop $R0 around Call uninstallSalt in .onInit: GetParent inside
  the function writes to $R0, clobbering the saved silent flag used
  to conditionally restore SetSilent normal afterward.
- Fix typo "mnually" and add missing Quit after VCRedist error.
- Remove InstallDirRegKey (unused).

conftest.py:
- Replace subprocess.PIPE with subprocess.DEVNULL. NSIS Exec passes
  inherited pipe handles to child processes (nssm, salt-minion), which
  hold the write-end open indefinitely; proc.communicate() never sees
  EOF, causing the test to block forever.
- Replace proc.communicate() with proc.wait() for the same reason.
- Reduce run_command timeout from 300 to 60 seconds.
- Add _kill_process_tree() to terminate the entire process hierarchy
  on timeout (not just the top-level installer process).
- Return True/False from run_command and assert in install_salt so
  a force-kill fails the test — detecting hangs is the purpose of
  this suite.
- Add _wait_for_processes() with SCM registry polling: waits for all
  PROCESSES to exit, INSTDIR to be deleted, and the salt-minion SCM
  key to disappear before the next iteration. Prevents
  ERROR_SERVICE_MARKED_FOR_DELETE races between iterations.
- Fix duplicate Un_D.exe entry in PROCESSES (should be Un_E.exe).

Test stubs (new files):
- stubs/daemon/: Go program that blocks on SIGINT, used as a
  realistic salt-minion.exe stub that NSSM can manage as a service.
- stubs/exit/: Go program that exits immediately with code 0, used
  as vcredist stub so ExecWait in the installer succeeds.

setup.ps1 / quick_setup.ps1:
- Build stubs with go build -C <module-dir> so Go module resolution
  finds go.mod regardless of the caller's working directory.
- Replace fake Set-Content vcredist binaries with real go build output.
- Add Go 1.20+ prerequisite check: look in PATH, then fall back to
  C:\Program Files\Go and C:\Go; download and install silently from
  salt-windows-deps if missing.

install_nsis.ps1:
- Update NSIS download URL from 3.10 to 3.11.
- Add SimpleSC plugin install section (ANSI + Unicode DLLs from
  nsis-plugin-simplesc.zip and nsis-plugin-simplescu.zip).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:full Run the full test suite

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant