Skip to content

Conversation

@overfelt
Copy link

@overfelt overfelt commented Nov 18, 2025

Checklist

  • Documentation:
    • Design document has been generated and added to the docs
    • User's Guide has been updated
    • Developer's Guide has been updated
    • Documentation has been built locally and changes look as expected
  • Building
    • CMake build does not produce any new warnings from changes in this PR
  • Testing
    • A comment in the PR documents testing used to verify the changes including any tests that are added/modified/impacted.
    • CTest unit tests for new features have been added per the approved design.
    • Polaris tests for new features have been added per the approved design (and included in a test suite)
    • Unit tests have passed. Please provide a relevant CDash build entry for verification.
    • Polaris test suite has passed
    • Performance related PRs: Please include a relevant PACE experiment link documenting performance before and after.
  • Stealth Features
    • If any stealth features are included in the PR, please confirm that they have been documented.

@overfelt overfelt marked this pull request as draft November 18, 2025 23:00
@overfelt overfelt marked this pull request as ready for review November 19, 2025 13:18
@overfelt overfelt force-pushed the overfelt/HigherOrderTendency_Cleanup branch from 5ae40c2 to 1055c8f Compare December 2, 2025 20:21
@overfelt
Copy link
Author

overfelt commented Dec 2, 2025

Testing:
Unit tests for the new functions added to HorzOperators class.
(1) testsecondderivativeoncellDeterminePlanerPatchGeometry - Second derivative on a patch of elements test. This test creates a little patch of elements and defines the geometry from scratch by just hard coding a bunch of coordinates. Then the second derivative calculated by the HorzOperators class is compared to an analytic solution.
(2) testsecondderivativeoncellLeastSquaresFit - The LeastSquaresFit function is tested against an analytic solution.
(3) testsecondderivativeoncellDetermineSphericalPatchGeometry - Is like the testsecondderivativeoncellDeterminePlanerPatchGeometry unit test but on a spherical mesh.
(4) testsecondderivativeoncellconstructor - A trivial unit test that just checks that an instance of the class SecondDerivativeOnCell can be constructed.

@overfelt overfelt closed this Dec 2, 2025
@overfelt overfelt reopened this Dec 2, 2025
@philipwjones
Copy link

@overfelt This is failing both the Horz Operators unit tests on Chrysalis and on Frontier (gpu only - cpu passes). The errors are memory errors - invalid pointer in free() on Chrysalis and inaccessible memory space (for array named XP) on Frontier gpu. So something isn't getting cleaned up correctly? I'll start reading through code to review...

@overfelt
Copy link
Author

overfelt commented Dec 5, 2025

@philipwjones ,I only ran perlmutter gpu. I'll try setting up and running on Chrysalis gpu. Thanks.

@mark-petersen
Copy link
Collaborator

@overfelt, the tests also fail on perlmutter, both CPU and GPU, using the gnu compiler.

The following tests FAILED:
	 11 - HORZOPERATORS_PLANE_TEST (Failed)
	 12 - HORZOPERATORS_SPHERE_TEST (Failed)

Here are my instructions. Let me know if you can reproduce these errors. You will need to change to your own paths.

perlmutter CPU:

######### perlmutter CPU
#export CODEDIR=omega-develop
export CODEDIR=opr
export RUNDIR=test_omega_cpu

cd /global/homes/m/mpeterse/repos/E3SM/${CODEDIR}
git submodule update --init --recursive components/omega/external/GSW-C components/omega/external/yaml-cpp externals/ekat externals/scorpio cime externals/cpptrace
cd components/omega/

module load cmake
mkdir ${PSCRATCH}/runs/$RUNDIR
cd ${PSCRATCH}/runs/$RUNDIR

rm -rf build
mkdir build
cd build

# compiler options are:
export compiler=gnu
#export compiler=nvidia # not working 250421

export PARMETIS_ROOT=/global/cfs/cdirs/e3sm/software/polaris/pm-cpu/spack/dev_polaris_0_9_0_${compiler}_mpich/var/spack/environments/dev_polaris_0_9_0_${compiler}_mpich/.spack-env/view

# nvidia or gnu compiler:
cmake \
   -DOMEGA_CIME_COMPILER=${compiler} \
   -DOMEGA_BUILD_TYPE=Release \
   -DOMEGA_CIME_MACHINE=pm-cpu \
   -DOMEGA_PARMETIS_ROOT=${PARMETIS_ROOT}\
   -DOMEGA_BUILD_TEST=ON \
   -DOMEGA_VECTOR_LENGTH=1 \
   -Wno-dev \
   -S /global/homes/m/mpeterse/repos/E3SM/${CODEDIR}/components/omega -B .
# note OMEGA_VECTOR_LENGTH=8 fails MPI tests on CPUs.
cd ${PSCRATCH}/runs/$RUNDIR/build
./omega_build.sh

# linking:
cd test
ln -isf /global/homes/m/mpeterse/meshes/omega/O*nc .
cp /global/homes/m/mpeterse/repos/E3SM/${CODEDIR}/components/omega/configs/Default.yml omega.yml

# run test:
salloc --nodes 1 --qos interactive --time 01:00:00 --constraint cpu --account=m4572 # or e3sm
cd ${PSCRATCH}/runs/${RUNDIR}/build

./omega_ctest.sh

perlmutter GPU:

######### perlmutter GPU
salloc --nodes 4 --qos interactive --time 01:00:00 --constraint gpu --tasks-per-node=2 --gpus-per-task 1 --account=m4572_g # or e3sm_g

# perlmutter has nodes with either 40 or 80 gb of high bandwidth memory, and the system defaults to 40. You can ask for 80 gb nodes with the sbatch flag --constraint="gpu&hbm80gb"

export CODEDIR=opr
#export CODEDIR=omega-develop
export RUNDIR=test_omega_gpu
mkdir ${PSCRATCH}/runs/$RUNDIR
cd !$

rm -rf build
mkdir build
cd build
module load cmake

# compiler options are:
export compiler=gnugpu
#export compiler=nvidiagpu

export PARMETIS_ROOT=/global/cfs/cdirs/e3sm/software/polaris/pm-gpu/spack/dev_polaris_0_9_0_${compiler}_mpich/var/spack/environments/dev_polaris_0_9_0_${compiler}_mpich/.spack-env/view
cmake \
   -DOMEGA_CIME_COMPILER=${compiler} \
   -DOMEGA_BUILD_TYPE=Release \
   -DOMEGA_CIME_MACHINE=pm-gpu \
   -DOMEGA_PARMETIS_ROOT=${PARMETIS_ROOT}\
   -DOMEGA_BUILD_TEST=ON \
   -Wno-dev \
   -DOMEGA_MPI_ON_DEVICE:BOOL=OFF \
   -S /global/homes/m/mpeterse/repos/E3SM/${CODEDIR}/components/omega -B .
# needed for compiler bug: OMEGA_MPI_ON_DEVICE:BOOL=OFF. See https://github.com/E3SM-Project/Omega/issues/214
./omega_build.sh

# linking:
cd test
ln -isf /global/homes/m/mpeterse/meshes/omega/O*nc .
cp /global/homes/m/mpeterse/repos/E3SM/${CODEDIR}/components/omega/configs/Default.yml omega.yml

cd ..
./omega_ctest.sh

@overfelt overfelt force-pushed the overfelt/HigherOrderTendency_Cleanup branch from 35bf9b3 to 803b4a7 Compare December 9, 2025 20:38
@philipwjones
Copy link

With the recent commit, this now passes CTests on Chrysalis and Frontier (cpu/gpu). Thanks @overfelt

@mark-petersen
Copy link
Collaborator

This passes CPU and GPU tests on perlmutter. Due to the merge of #314, this now has some conflicts to be resolved. Please rebase on the current head, and we will proceed with the review after the break. Thanks!

@overfelt overfelt force-pushed the overfelt/HigherOrderTendency_Cleanup branch from 9ec4b1a to 5eb5850 Compare January 3, 2026 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants