@DCaviedesV, @s-poll, @kvrigor, @jjokella
Testcase is EXP_ID="eur-12-iic" #"eur-11u"
working_dir:
/p/scratch/cslts/gonzalez5/TSMP2/JUBE_repo/benchmark/jube
While running this test case, I observed lack of reproducibility between identical runs when using a binary built with intel.psmpi. Two runs performed using JUBE with the same configuration with model_id = icon-eclm produce different results. The same behaviour is observed for the model_id = icon-eclm-parflow, where repeated runs with identical settings (using JUBE) also produce different outputs. do not produce identical results.
Below, I present the maximum absolute error for each variables computed by comparing the outputs of two identical runs for the model_id=icon-eclm-parflow and model_id=icon-eclm:
| variable |
model_id=icon-eclm-parflow max. abs. error |
model_id=icon-eclm max. abs. error |
| ICON |
|
|
| w error = |
1.0482156277 |
1.1639280319 |
| theta_v error = |
5.6016540527 |
5.4000854492 |
| qv error = |
0.0038429732 |
0.0035402337 |
| shfl_s error = |
47.2150611877 |
131.6409301758 |
| lhfl_s error = |
60.2069587708 |
414.4877624512 |
| eCLM |
|
|
| TWS error = |
9.8613281250 |
10.4727783203 |
| H2OSOI error = |
0.0715191215 |
0.1653197557 |
| TSOI error = |
3.6534118652 |
5.7122192383 |
| TG error = |
3.6873474121 |
6.1456604004 |
| EFLX_LH_TOT error = |
192.7704467773 |
197.8242187500 |
| FSH error = |
148.3424072266 |
218.8023071289 |
| FSA error = |
271.3027954102 |
328.3160400391 |
| FSR error = |
131.1349182129 |
141.3157348633 |
| FIRA error = |
51.1596603394 |
75.3006668091 |
| Rnet error = |
220.1431274414 |
254.9673156738 |
| EFLX_SOIL_GRND error = |
108.2931365967 |
113.7866668701 |
| ParFlow |
|
|
| pressure error = |
3460.3571839130 |
- |
| saturation error = |
0.2303589024 |
- |
| evaptrans error = |
0.5114949301 |
- |
We need some insight if the magnitude of these errors is acceptable and expected for this type of scenario.
Icon units: w=[m/s], theta_v=[K], qv = [kg/kg], shfl/lhfl= [W/m2]
eCLM variables units here
NOTE: Tests were carried out on JURECA-DC with Stages/2025 and binaries were built from TSMP2 repo (master branch). e.g.:
build_tsmp2.sh --icon --eclm --env ./env/jsc.2025.gnu.openmpi
Setting model-id and component string...
Setting component source dir...
Submodule path 'models/icon': checked out 'ce5c8f8ba75d2e7db73e41cbb186d98ec34171c8'
Submodule path 'models/eCLM': checked out '4d567d2d68cac0fba977914b4a9c3ba199afd0ff'
Submodule path 'models/oasis3-mct': checked out '5253349d4ce15259fcd76e0443495c1ddb7788bb'
/p/project1/cslts/shared_data/CI_TSMP2/bin/master
/p/scratch/cslts/gonzalez5/TSMP2/JUBE_repo/benchmark/jube
NOTE1: For the model_id = eclm-parflow, or only one component (eclm, icon, or parflow); this does not occur.
NOTE2: In contrast, this non-reproducibility is not observed when using executables built with gnu.openmpi or gnu.psmpi. For these builds, repeated runs with identical configurations are reproducible for both model_id's.
NOTE3: Try another test case.
@DCaviedesV, @s-poll, @kvrigor, @jjokella
Testcase is EXP_ID="eur-12-iic" #"eur-11u"
working_dir:
While running this test case, I observed lack of reproducibility between identical runs when using a binary built with
intel.psmpi. Two runs performed usingJUBEwith the same configuration withmodel_id = icon-eclmproduce different results. The same behaviour is observed for themodel_id = icon-eclm-parflow, where repeated runs with identical settings (using JUBE) also produce different outputs. do not produce identical results.Below, I present the maximum absolute error for each variables computed by comparing the outputs of two identical runs for the
model_id=icon-eclm-parflowandmodel_id=icon-eclm:model_id=icon-eclm-parflowmax. abs. errormodel_id=icon-eclmmax. abs. errorWe need some insight if the magnitude of these errors is acceptable and expected for this type of scenario.
Icon units: w=[m/s], theta_v=[K], qv = [kg/kg], shfl/lhfl= [W/m2]
eCLM variables units here
NOTE: Tests were carried out on
JURECA-DCwithStages/2025and binaries were built from TSMP2 repo (master branch). e.g.:NOTE1: For the
model_id = eclm-parflow, or only one component (eclm,icon, orparflow); this does not occur.NOTE2: In contrast, this non-reproducibility is not observed when using executables built with
gnu.openmpiorgnu.psmpi. For these builds, repeated runs with identical configurations are reproducible for both model_id's.NOTE3: Try another test case.