Non-reproducibility when using icon + oasis and compiled with intel.psmpi

@DCaviedesV, @s-poll, @kvrigor, @jjokella 

**Testcase** is [EXP_ID="eur-12-iic" #"eur-11u"](https://github.com/HPSCTerrSys/TSMP2_workflow-engine/tree/master)
**working_dir**:
```
/p/scratch/cslts/gonzalez5/TSMP2/JUBE_repo/benchmark/jube
```

While running this test case, I observed lack of reproducibility between identical runs when using a binary built with `intel.psmpi`. Two runs performed using `JUBE` with the same configuration with `model_id = icon-eclm` produce different results. The same behaviour is observed for the  `model_id = icon-eclm-parflow`, where repeated runs with identical settings (using JUBE) also produce different outputs.  do not produce identical results. 

Below, I present the maximum absolute error for each variables computed by comparing the outputs of two identical runs for the `model_id=icon-eclm-parflow` and `model_id=icon-eclm`:
| variable       | `model_id=icon-eclm-parflow`   max. abs. error         |  `model_id=icon-eclm`  max. abs. error    |
| ------------- | --------------------------------------------------------- | -------------------------------------------- |
| **ICON** | 
|  w error =  | 1.0482156277                                             |   1.1639280319    |
| theta_v error = |  5.6016540527                                             | 5.4000854492      |
| qv error = |  0.0038429732                                             |   0.0035402337    |
|  shfl_s error =  | 47.2150611877                                             |  131.6409301758     |
|lhfl_s error =  | 60.2069587708                                             |   414.4877624512    |
| **eCLM** | 
| TWS error =  | 9.8613281250                                             |  10.4727783203     |
|  H2OSOI error =  | 0.0715191215                                             |   0.1653197557    |
|  TSOI error = |  3.6534118652                                             |    5.7122192383   |
|  TG error =  | 3.6873474121                                             |    6.1456604004   |
|  EFLX_LH_TOT error =  | 192.7704467773                                             |  197.8242187500     |
|  FSH error =  | 148.3424072266                                             |     218.8023071289  |
|  FSA error = |  271.3027954102                                             |   328.3160400391    |
|  FSR error = | 131.1349182129                                              |  141.3157348633     |
|  FIRA error =  | 51.1596603394                                             |    75.3006668091   |
|  Rnet error =  | 220.1431274414                                             |     254.9673156738  |
|  EFLX_SOIL_GRND error =  | 108.2931365967                                             |  113.7866668701     |
| **ParFlow** | 
|  pressure error = | 3460.3571839130                                              |   -    |
|  saturation error =  | 0.2303589024                                             |  -     |
|  evaptrans error =  | 0.5114949301                                             |  -     |

We need some insight if the magnitude of these errors is acceptable and expected for this type of scenario.
Icon units: w=[m/s], theta_v=[K], qv = [kg/kg], shfl/lhfl= [W/m2]
eCLM variables units [here](https://escomp.github.io/CTSM/release-clm5.0/users_guide/setting-up-and-running-a-case/master_list_file.html)

**NOTE**: Tests were carried out on `JURECA-DC` with `Stages/2025` and binaries were built from [TSMP2 repo](https://github.com/HPSCTerrSys/TSMP2) (master branch). e.g.:

```
build_tsmp2.sh --icon --eclm --env ./env/jsc.2025.gnu.openmpi
Setting model-id and component string...
Setting component source dir...
Submodule path 'models/icon': checked out 'ce5c8f8ba75d2e7db73e41cbb186d98ec34171c8'
Submodule path 'models/eCLM': checked out '4d567d2d68cac0fba977914b4a9c3ba199afd0ff'
Submodule path 'models/oasis3-mct': checked out '5253349d4ce15259fcd76e0443495c1ddb7788bb'
```
- **Binaries_directory**:
```
/p/project1/cslts/shared_data/CI_TSMP2/bin/master
```
- **Working_dir**:
```
/p/scratch/cslts/gonzalez5/TSMP2/JUBE_repo/benchmark/jube
```

**NOTE1**: For the `model_id = eclm-parflow`, or only one component (`eclm`, `icon`, or `parflow`); this does not occur. 

**NOTE2**: In contrast, this non-reproducibility is not observed when using executables built with `gnu.openmpi` or `gnu.psmpi`. For these builds, repeated runs with identical configurations are reproducible for both model_id's. 

**NOTE3**: Try another test case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-reproducibility when using icon + oasis and compiled with intel.psmpi #20

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

variable	`model_id=icon-eclm-parflow` max. abs. error	`model_id=icon-eclm` max. abs. error
ICON
w error =	1.0482156277	1.1639280319
theta_v error =	5.6016540527	5.4000854492
qv error =	0.0038429732	0.0035402337
shfl_s error =	47.2150611877	131.6409301758
lhfl_s error =	60.2069587708	414.4877624512
eCLM
TWS error =	9.8613281250	10.4727783203
H2OSOI error =	0.0715191215	0.1653197557
TSOI error =	3.6534118652	5.7122192383
TG error =	3.6873474121	6.1456604004
EFLX_LH_TOT error =	192.7704467773	197.8242187500
FSH error =	148.3424072266	218.8023071289
FSA error =	271.3027954102	328.3160400391
FSR error =	131.1349182129	141.3157348633
FIRA error =	51.1596603394	75.3006668091
Rnet error =	220.1431274414	254.9673156738
EFLX_SOIL_GRND error =	108.2931365967	113.7866668701
ParFlow
pressure error =	3460.3571839130	-
saturation error =	0.2303589024	-
evaptrans error =	0.5114949301	-

Non-reproducibility when using icon + oasis and compiled with intel.psmpi #20

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions