Skip to content

Always perform indexing in an atomic transactional manner#681

Merged
jviotti merged 3 commits intomainfrom
atomic-index
Feb 27, 2026
Merged

Always perform indexing in an atomic transactional manner#681
jviotti merged 3 commits intomainfrom
atomic-index

Conversation

@jviotti
Copy link
Member

@jviotti jviotti commented Feb 26, 2026

Signed-off-by: Juan Cruz Viotti jv@jviotti.com

Signed-off-by: Juan Cruz Viotti <jv@jviotti.com>
@jviotti jviotti marked this pull request as ready for review February 27, 2026 13:52
@augmentcode
Copy link

augmentcode bot commented Feb 27, 2026

🤖 Augment PR Summary

Summary: Make one index output writes transactional by building into a staging directory and atomically swapping it into place, so failed runs don’t corrupt an existing index.

Changes:

  • Link the index binary with sourcemeta::core::io and include core I/O utilities.
  • Resolve the final output path up front, create a sibling staging directory (.sourcemeta-one-*), and optionally seed it from the existing output via hard links.
  • Commit results with atomic_directory_replace and emit an explicit “Committing” log line.
  • After a successful commit, remove stale staging entries left behind by previously crashed runs.
  • Add CLI tests for atomic-failure behavior, stale staging cleanup, and --configuration not triggering cleanup.
  • Update existing CLI golden-output tests to normalize staging directory names and expect the new commit log.

Technical Notes: Staging is created as a sibling of the final output to keep operations on the same filesystem volume for hardlinking/atomic renames.

🤖 Was this summary useful? React with 👍 or 👎

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

entry.path().filename().string().starts_with(".sourcemeta-one-")) {
std::cerr << "Removing stale staging directory: "
<< entry.path().string() << "\n";
std::filesystem::remove_all(entry.path());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deletes every directory in the output parent matching .sourcemeta-one-*, which can remove another concurrently-running index job’s staging directory (or any unrelated directory with that prefix) and lead to unexpected data loss/failures. Also note this runs even on early-return paths (e.g., --configuration), so it can have destructive side effects even when not actually indexing.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 4 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/index/index.cc">

<violation number="1" location="src/index/index.cc:115">
P2: The cleanup loop indiscriminately removes any ".sourcemeta-one-*" directory in the parent folder. Concurrent runs use the same prefix, so one run can delete another run’s active staging directory, corrupting or aborting its indexing output. Consider a safer stale-detection strategy (lock/age check) or avoid deleting directories that may belong to active runs.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Signed-off-by: Juan Cruz Viotti <jv@jviotti.com>
@jviotti
Copy link
Member Author

jviotti commented Feb 27, 2026

@cubic-dev-ai review

@jviotti
Copy link
Member Author

jviotti commented Feb 27, 2026

augment review

@cubic-dev-ai
Copy link

cubic-dev-ai bot commented Feb 27, 2026

@cubic-dev-ai review

@jviotti I have started the AI code review. It will take a few minutes to complete.

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

sourcemeta::core::atomic_directory_replace(final_output_path, staging.path());

// Clean up stale staging entries from crashed previous runs. We do this
// after committing so that we never interfere with a concurrent run
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though this cleanup runs after the commit, it still removes any sibling entry starting with .sourcemeta-one-, which can delete a concurrently-running index job’s staging directory in the same parent and make that run fail (the entry.path() != staging.path() check won’t protect other staging dirs).

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 9 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/index/index.cc">

<violation number="1" location="src/index/index.cc:640">
P1: The stale-staging cleanup deletes any `.sourcemeta-one-*` directory in the parent path, which can remove an active staging directory from a concurrent run. That risks failing or corrupting another index operation. Add a real staleness check (e.g., lock file or age threshold) before deleting.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

// after committing so that we never interfere with a concurrent run
for (const auto &entry :
std::filesystem::directory_iterator{final_output_path.parent_path()}) {
if (entry.path() != staging.path() &&
Copy link

@cubic-dev-ai cubic-dev-ai bot Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: The stale-staging cleanup deletes any .sourcemeta-one-* directory in the parent path, which can remove an active staging directory from a concurrent run. That risks failing or corrupting another index operation. Add a real staleness check (e.g., lock file or age threshold) before deleting.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/index/index.cc, line 640:

<comment>The stale-staging cleanup deletes any `.sourcemeta-one-*` directory in the parent path, which can remove an active staging directory from a concurrent run. That risks failing or corrupting another index operation. Add a real staleness check (e.g., lock file or age threshold) before deleting.</comment>

<file context>
@@ -607,6 +629,26 @@ static auto index_main(const std::string_view &program,
+  // after committing so that we never interfere with a concurrent run
+  for (const auto &entry :
+       std::filesystem::directory_iterator{final_output_path.parent_path()}) {
+    if (entry.path() != staging.path() &&
+        entry.path().filename().string().starts_with(STAGING_PATH_PREFIX)) {
+      std::cerr << "Removing stale staging entry: " << entry.path().string()
</file context>
Fix with Cubic

Signed-off-by: Juan Cruz Viotti <jv@jviotti.com>
@jviotti jviotti merged commit 4a3a237 into main Feb 27, 2026
6 checks passed
@jviotti jviotti deleted the atomic-index branch February 27, 2026 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant