Skip to content

Comments

Apply hooks in install_cuda_and_libraries#165

Merged
ocaisa merged 2 commits intoEESSI:mainfrom
casparvl:apply_hooks_to_cuda_installs
Feb 20, 2026
Merged

Apply hooks in install_cuda_and_libraries#165
ocaisa merged 2 commits intoEESSI:mainfrom
casparvl:apply_hooks_to_cuda_installs

Conversation

@casparvl
Copy link
Contributor

@casparvl casparvl commented Feb 19, 2026

Make sure the hooks are applied in install_cuda_and_libraries, but that the directory check is skipped, as we INTEND to install in an exceptional (i.e. non-accelerator) path.

This solves an issue I had running install_cuda_and_libraries.sh on an H100 system. It would cause the CUDA sanity check to fail since the hooks weren't being applied that would turn the 9.0a CC into a 9.0.

Originally, we skipped the hooks alltogether. But the real solution is to apply the hooks, but make sure that the hook that does the installation path check is skipped.

…at the directory check is skipped, as we INTEND to install in an exceptional (i.e. non-accelerator) path
@casparvl casparvl marked this pull request as ready for review February 19, 2026 21:55
@ocaisa
Copy link
Member

ocaisa commented Feb 19, 2026

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-deucalion for:arch=aarch64/a64fx
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-deucalion for:arch=aarch64/a64fx

@eessi-bot-deucalion
Copy link

eessi-bot-deucalion bot commented Feb 19, 2026

New job on instance eessi-bot-deucalion for repository eessi.io-2023.06-software
Building on: a64fx
Building for: aarch64/a64fx
Job dir: /home/eessibot/new-bot/jobs/2026.02/pr_165/974648

date job status comment
Feb 19 22:09:15 UTC 2026 submitted job id 974648 awaits release by job manager
Feb 19 22:09:38 UTC 2026 released job awaits launch by Slurm scheduler
Feb 19 22:10:41 UTC 2026 running job 974648 is running
Feb 19 22:20:08 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-974648.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2023.06-software-linux-aarch64-a64fx-17715391760.tar.zstsize: 0 MiB (30242 bytes)
entries: 2
modules under 2023.06/software/linux/aarch64/a64fx/modules/all
no module files in tarball
software under 2023.06/software/linux/aarch64/a64fx/software
no software packages in tarball
reprod directories under 2023.06/software/linux/aarch64/a64fx/reprod
no reprod directories in tarball
other under 2023.06/software/linux/aarch64/a64fx
2023.06/init/easybuild/eb_hooks.py
2023.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
Feb 19 22:20:09 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] ( 1/10) Skipping test: nodes in this partition only have 30720 MiB memory available (per node) according to the current ReFrame configuration, but 49152 MiB is needed
[ SKIP ] ( 2/10) Skipping test: nodes in this partition only have 30720 MiB memory available (per node) according to the current ReFrame configuration, but 49152 MiB is needed
[ SKIP ] ( 3/10) Skipping test: nodes in this partition only have 30720 MiB memory available (per node) according to the current ReFrame configuration, but 49152 MiB is needed
[ SKIP ] ( 4/10) Skipping test: nodes in this partition only have 30720 MiB memory available (per node) according to the current ReFrame configuration, but 49152 MiB is needed
[ OK ] ( 5/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/29Aug2024-foss-2023b-kokkos %scale=1_node /aeb2d9df @BotBuildTests:a64fx+default
P: perf: 580.952 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 6/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos %scale=1_node /04ff9ece @BotBuildTests:a64fx+default
P: perf: 526.343 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 7/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /15cad6c4 @BotBuildTests:a64fx+default
P: latency: 1.69 us (r:0, l:None, u:None)
[ OK ] ( 8/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /6672deda @BotBuildTests:a64fx+default
P: latency: 1.71 us (r:0, l:None, u:None)
[ OK ] ( 9/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /2a9a47b1 @BotBuildTests:a64fx+default
P: bandwidth: 8762.93 MB/s (r:0, l:None, u:None)
[ OK ] (10/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /1b24ab8e @BotBuildTests:a64fx+default
P: bandwidth: 8708.05 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 6/10 test case(s) from 10 check(s) (0 failure(s), 4 skipped, 0 aborted)
Details
✅ job output file slurm-974648.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Feb 20 19:51:43 UTC 2026 uploaded transfer of eessi-2023.06-software-linux-aarch64-a64fx-17715391760.tar.zst to S3 bucket succeeded

@eessi-bot-deucalion
Copy link

eessi-bot-deucalion bot commented Feb 19, 2026

New job on instance eessi-bot-deucalion for repository eessi.io-2025.06-software
Building on: a64fx
Building for: aarch64/a64fx
Job dir: /home/eessibot/new-bot/jobs/2026.02/pr_165/974650

date job status comment
Feb 19 22:09:22 UTC 2026 submitted job id 974650 awaits release by job manager
Feb 19 22:09:35 UTC 2026 released job awaits launch by Slurm scheduler
Feb 19 22:10:44 UTC 2026 running job 974650 is running
Feb 19 22:17:03 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-974650.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-aarch64-a64fx-17715391230.tar.zstsize: 0 MiB (30246 bytes)
entries: 2
modules under 2025.06/software/linux/aarch64/a64fx/modules/all
no module files in tarball
software under 2025.06/software/linux/aarch64/a64fx/software
no software packages in tarball
reprod directories under 2025.06/software/linux/aarch64/a64fx/reprod
no reprod directories in tarball
other under 2025.06/software/linux/aarch64/a64fx
2025.06/init/easybuild/eb_hooks.py
2025.06/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh
Feb 19 22:17:03 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] (1/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /e4bf9965 @BotBuildTests:a64fx+default [Skipping test: nodes in this partition only have 30720 MiB memory available (per node) according to the current ReFrame configuration, but 49152 MiB is needed]
[ SKIP ] (2/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /3da4890b @BotBuildTests:a64fx+default [Skipping test: nodes in this partition only have 30720 MiB memory available (per node) according to the current ReFrame configuration, but 49152 MiB is needed]
[ OK ] (3/4) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /3255009a @BotBuildTests:a64fx+default
P: latency: 0.89 us (r:0, l:None, u:None)
[ OK ] (4/4) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /59f4b331 @BotBuildTests:a64fx+default
P: bandwidth: 7909.23 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 2/4 test case(s) from 4 check(s) (0 failure(s), 2 skipped, 0 aborted)
Details
✅ job output file slurm-974650.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Feb 20 19:51:34 UTC 2026 uploaded transfer of eessi-2025.06-software-linux-aarch64-a64fx-17715391230.tar.zst to S3 bucket succeeded

@ocaisa
Copy link
Member

ocaisa commented Feb 19, 2026

@casparvl This looks fine to me, can you confirm here that it works in real life

Copy link
Member

@ocaisa ocaisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@ocaisa ocaisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The easystack file is not being picked up correctly by the script

Copy link
Member

@ocaisa ocaisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ocaisa
Copy link
Member

ocaisa commented Feb 20, 2026

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-deucalion for:arch=aarch64/a64fx
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-deucalion for:arch=aarch64/a64fx

@eessi-bot-deucalion
Copy link

eessi-bot-deucalion bot commented Feb 20, 2026

New job on instance eessi-bot-deucalion for repository eessi.io-2023.06-software
Building on: a64fx
Building for: aarch64/a64fx
Job dir: /home/eessibot/new-bot/jobs/2026.02/pr_165/979589

date job status comment
Feb 20 21:36:40 UTC 2026 submitted job id 979589 awaits release by job manager
Feb 20 21:37:09 UTC 2026 released job awaits launch by Slurm scheduler
Feb 20 21:38:11 UTC 2026 running job 979589 is running
Feb 20 21:47:56 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-979589.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2023.06-software-linux-aarch64-a64fx-17716236280.tar.zstsize: 0 MiB (26296 bytes)
entries: 1
modules under 2023.06/software/linux/aarch64/a64fx/modules/all
no module files in tarball
software under 2023.06/software/linux/aarch64/a64fx/software
no software packages in tarball
reprod directories under 2023.06/software/linux/aarch64/a64fx/reprod
no reprod directories in tarball
other under 2023.06/software/linux/aarch64/a64fx
2023.06/init/easybuild/eb_hooks.py
Feb 20 21:47:56 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] ( 1/10) Skipping test: nodes in this partition only have 30720 MiB memory available (per node) according to the current ReFrame configuration, but 49152 MiB is needed
[ SKIP ] ( 2/10) Skipping test: nodes in this partition only have 30720 MiB memory available (per node) according to the current ReFrame configuration, but 49152 MiB is needed
[ SKIP ] ( 3/10) Skipping test: nodes in this partition only have 30720 MiB memory available (per node) according to the current ReFrame configuration, but 49152 MiB is needed
[ SKIP ] ( 4/10) Skipping test: nodes in this partition only have 30720 MiB memory available (per node) according to the current ReFrame configuration, but 49152 MiB is needed
[ OK ] ( 5/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/29Aug2024-foss-2023b-kokkos %scale=1_node /aeb2d9df @BotBuildTests:a64fx+default
P: perf: 579.981 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 6/10) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos %scale=1_node /04ff9ece @BotBuildTests:a64fx+default
P: perf: 581.924 timesteps/s (r:0, l:None, u:None)
[ OK ] ( 7/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /15cad6c4 @BotBuildTests:a64fx+default
P: latency: 1.64 us (r:0, l:None, u:None)
[ OK ] ( 8/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /6672deda @BotBuildTests:a64fx+default
P: latency: 1.7 us (r:0, l:None, u:None)
[ OK ] ( 9/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_node /2a9a47b1 @BotBuildTests:a64fx+default
P: bandwidth: 8770.38 MB/s (r:0, l:None, u:None)
[ OK ] (10/10) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_node /1b24ab8e @BotBuildTests:a64fx+default
P: bandwidth: 8524.99 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 6/10 test case(s) from 10 check(s) (0 failure(s), 4 skipped, 0 aborted)
Details
✅ job output file slurm-979589.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Feb 20 21:56:45 UTC 2026 uploaded transfer of eessi-2023.06-software-linux-aarch64-a64fx-17716236280.tar.zst to S3 bucket succeeded

@eessi-bot-deucalion
Copy link

eessi-bot-deucalion bot commented Feb 20, 2026

New job on instance eessi-bot-deucalion for repository eessi.io-2025.06-software
Building on: a64fx
Building for: aarch64/a64fx
Job dir: /home/eessibot/new-bot/jobs/2026.02/pr_165/979590

date job status comment
Feb 20 21:36:46 UTC 2026 submitted job id 979590 awaits release by job manager
Feb 20 21:37:06 UTC 2026 released job awaits launch by Slurm scheduler
Feb 20 21:38:14 UTC 2026 running job 979590 is running
Feb 20 21:43:42 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-979590.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-aarch64-a64fx-17716235670.tar.zstsize: 0 MiB (26302 bytes)
entries: 1
modules under 2025.06/software/linux/aarch64/a64fx/modules/all
no module files in tarball
software under 2025.06/software/linux/aarch64/a64fx/software
no software packages in tarball
reprod directories under 2025.06/software/linux/aarch64/a64fx/reprod
no reprod directories in tarball
other under 2025.06/software/linux/aarch64/a64fx
2025.06/init/easybuild/eb_hooks.py
Feb 20 21:43:42 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] (1/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /e4bf9965 @BotBuildTests:a64fx+default [Skipping test: nodes in this partition only have 30720 MiB memory available (per node) according to the current ReFrame configuration, but 49152 MiB is needed]
[ SKIP ] (2/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /3da4890b @BotBuildTests:a64fx+default [Skipping test: nodes in this partition only have 30720 MiB memory available (per node) according to the current ReFrame configuration, but 49152 MiB is needed]
[ OK ] (3/4) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /3255009a @BotBuildTests:a64fx+default
P: latency: 0.9 us (r:0, l:None, u:None)
[ OK ] (4/4) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /59f4b331 @BotBuildTests:a64fx+default
P: bandwidth: 8222.33 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 2/4 test case(s) from 4 check(s) (0 failure(s), 2 skipped, 0 aborted)
Details
✅ job output file slurm-979590.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Feb 20 21:56:54 UTC 2026 uploaded transfer of eessi-2025.06-software-linux-aarch64-a64fx-17716235670.tar.zst to S3 bucket succeeded

@ocaisa ocaisa merged commit 8178cfe into EESSI:main Feb 20, 2026
76 of 84 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants