Skip to content

Enable checkpoint restart for OutHDF5#198

Open
The9Cat wants to merge 8 commits intodevelfrom
RestartFixHDF5
Open

Enable checkpoint restart for OutHDF5#198
The9Cat wants to merge 8 commits intodevelfrom
RestartFixHDF5

Conversation

@The9Cat
Copy link
Member

@The9Cat The9Cat commented Feb 3, 2026

The issue

On restart, OutHDF5 looks for enumerated snapshot files or directories independent of checkpt

The fix

Bypass enumeration check if checkpt=true and use the existing checkpoint file or directory.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enables OutHDF5 to restart correctly in checkpoint mode by skipping snapshot enumeration and instead targeting the existing checkpoint file/directory.

Changes:

  • Use checkpoint_<runtag> (file or directory) as the restart target when checkpt=true.
  • Skip the snapshot file/directory enumeration loop when checkpt=true.
  • Avoid broadcasting nbeg in checkpoint mode since snapshot indexing is not used.
Comments suppressed due to low confidence (1)

src/OutHDF5.cc:208

  • The runtime_error messages thrown here say Component::initialize error, but this code is in OutHDF5::initialize(). With the new checkpoint-path logic, these errors can be triggered in checkpoint mode as well, so the message becomes more misleading. Update the prefix to OutHDF5::initialize (and consider including whether this was snapshot vs checkpoint mode).
	if (not std::filesystem::is_directory(dir_path)) {
	  throw std::runtime_error("Component::initialize error: you specified directory organization of output but the directory " + dir_path.string() + " does not exist");
	}
      }
      else {
	std::ostringstream fname;

	if (chkpt)
	  fname << outdir
		<<  "checkpoint_" << runtag << ".1";
	else
	  fname << outdir
		<<  filename << "_" << setw(5) << setfill('0') << nbeg
		<< ".1";

	std::filesystem::path file_path = fname.str();

	if (not std::filesystem::is_regular_file(file_path)) {
	  throw std::runtime_error("Component::initialize error: you specified file organization of output but the file " + fname.str() + " does not exist");
	}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI commented Feb 4, 2026

@The9Cat I've opened a new pull request, #200, to work on those changes. Once the pull request is ready, I'll request review from you.

The9Cat and others added 5 commits February 4, 2026 09:50
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: The9Cat <25960766+The9Cat@users.noreply.github.com>
Co-authored-by: The9Cat <25960766+The9Cat@users.noreply.github.com>
Co-authored-by: The9Cat <25960766+The9Cat@users.noreply.github.com>
Fix checkpoint restart file check for single-process runs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants