-
Notifications
You must be signed in to change notification settings - Fork 383
Increase default PnetCDF header size to avoid I/O hangs #1386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Increase default PnetCDF header size to avoid I/O hangs #1386
Conversation
This reduces the likelihood of header reallocation as metadata grows. Header reallocation is extremely expensive and can significantly degrade parallel I/O performance.
|
(Sorry to jump into the discussion.) You can also set argument FYI. When adding new data objects into an existing file which causes the file header section to grow, PnetCDF must move the data section to a place with a higher file offset, which can be expensive, especially when the size of exiting file is large. The application program most likely just ran very slowly, but not hanging. |
|
@wkliao Thanks very much for the suggestion to consider If we were to open an existing file for writing and call |
@wkliao This issue only surfaced when running MPAS through MPAS-JEDI under very specific conditions, which made it difficult to track down. Is there a check you’re aware of that we could have put in place to make this easier to catch? |
|
Let me use two PnetCDF terminologies to help explain.
Subtracting the two gives you the free space available in the header section.
In this case, PnetCDF will check if the file extent of the existing file is aligned with Please note that |
Two PnetCDF APIs can be used to query the If the free space is sufficiently large, then just call |
This PR increases the default Parallel NetCDF (PnetCDF) header size to 128 KB to reduce the likelihood of header reallocation during MPAS I/O. In certain situations, when MPAS overwrites existing string attributes or variables with larger values, the NetCDF header can grow beyond its preallocated padding, requiring PnetCDF to reallocate the header during
ncmpi_enddef, which can lead to an I/O hang. This behavior was identified as the root cause of the hang reported in MPAS-Workflow issue #384. By increasing the default header size, this PR decreases the likelihood that header reallocation is required when string attributes or variables are overwritten with larger values. Preliminary testing of the calculation that previously triggered the hang indicates that this change resolves the issue without impacting calculation results or I/O performance.Fixes #1385