Skip to content

Parallel HDF5: Try fixing strict collective requirements of HDF5 >= 2.0#1862

Draft
franzpoeschel wants to merge 98 commits intoopenPMD:devfrom
franzpoeschel:fix-parallel-hdf5
Draft

Parallel HDF5: Try fixing strict collective requirements of HDF5 >= 2.0#1862
franzpoeschel wants to merge 98 commits intoopenPMD:devfrom
franzpoeschel:fix-parallel-hdf5

Conversation

@franzpoeschel
Copy link
Copy Markdown
Contributor

@franzpoeschel franzpoeschel commented Mar 11, 2026

It seems that HDF5 has become quite a bit pickier about metadata definitions in parallel setups with versions 2.0 and 2.1, leading to hangups.
Earlier, it was enough to define them consistently across ranks, now we apparently have to keep the exact same order of operations.

This is bad for the Span API which runs internal flushes for structure setup.

  • No longer flush inside the Span API, instead flush already at resetDataset()
  • Try if we can only enqueue operations and run them later, should be fine for the ordering
  • Ensure that only a select type of operation runs in structural setup flushes
  • Remove flushParticlesPath and flushMeshesPath functions, these unnecessarily leaked attribute flushes into the structure setup
  • Fix tests...
  • Manually go through tests and check if any of them silently swallows some errors, the chance is high
  • Guard setDirty calls
  • There are some commits deactivating a handful of tests, deal with them
  • Decide how to go forward with the changes inside resetDataset(). Best idea: add the new logic to a new API call commitStructuralSetup() or so.

while (!(*m_handler).m_work.empty())
{
IOTask &i = (*m_handler).m_work.front();
// verifyFlushType(i.operation, l);

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
{
template <typename T, typename RecordType>
PostProcessConvertedAttributeImpl<T, RecordType>::
PostProcessConvertedAttributeImpl(RecordType record_in, handler_t reader_in)

Check notice

Code scanning / CodeQL

Large object passed by value Note

This parameter of type
Mesh
is 112 bytes - consider passing a const pointer/reference instead.
Comment on lines +698 to +701
REQUIRE(r.numAttributes() == 0);
// TODO: unitSI
// REQUIRE(r["x"].numAttributes() == 0);
// REQUIRE(r["y"].numAttributes() == 0);

Check notice

Code scanning / CodeQL

Commented-out code Note test

This comment appears to contain commented-out code.
Comment on lines +128 to +131
// if (access::write(IOHandler()->m_frontendAccess))
// {
// populateDefaultMetadata();
// }

Check notice

Code scanning / CodeQL

Commented-out code Note

This comment appears to contain commented-out code.
}
} // namespace

std::future<void> AbstractIOHandlerImpl::flush(FlushLevel l)

Check warning

Code scanning / CodeQL

Poorly documented large function Warning

Poorly documented function: fewer than 2% comments for a function of 431 lines.
Copy link
Copy Markdown

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Check notice

Code scanning / CodeQL

Unused static function Note test

Static function openPMD_parallel_bug_1655_bp5_writer_hangup is unreachable (
CATCH2_INTERNAL_TEST_24
must be removed at the same time)
Static function openPMD_parallel_bug_1655_bp5_writer_hangup is unreachable (
autoRegistrar25
must be removed at the same time)

Check notice

Code scanning / CodeQL

Unused static function Note test

Static function CATCH2_INTERNAL_TEST_24 is unreachable (
autoRegistrar25
must be removed at the same time)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants