Always perform indexing in an atomic transactional manner#681
Conversation
302f631 to
c65190c
Compare
Signed-off-by: Juan Cruz Viotti <jv@jviotti.com>
c65190c to
5607dd0
Compare
🤖 Augment PR SummarySummary: Make Changes:
Technical Notes: Staging is created as a sibling of the final output to keep operations on the same filesystem volume for hardlinking/atomic renames. 🤖 Was this summary useful? React with 👍 or 👎 |
src/index/index.cc
Outdated
| entry.path().filename().string().starts_with(".sourcemeta-one-")) { | ||
| std::cerr << "Removing stale staging directory: " | ||
| << entry.path().string() << "\n"; | ||
| std::filesystem::remove_all(entry.path()); |
There was a problem hiding this comment.
This deletes every directory in the output parent matching .sourcemeta-one-*, which can remove another concurrently-running index job’s staging directory (or any unrelated directory with that prefix) and lead to unexpected data loss/failures. Also note this runs even on early-return paths (e.g., --configuration), so it can have destructive side effects even when not actually indexing.
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
1 issue found across 4 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/index/index.cc">
<violation number="1" location="src/index/index.cc:115">
P2: The cleanup loop indiscriminately removes any ".sourcemeta-one-*" directory in the parent folder. Concurrent runs use the same prefix, so one run can delete another run’s active staging directory, corrupting or aborting its indexing output. Consider a safer stale-detection strategy (lock/age check) or avoid deleting directories that may belong to active runs.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
6642fdb to
7185391
Compare
|
@cubic-dev-ai review |
|
augment review |
@jviotti I have started the AI code review. It will take a few minutes to complete. |
| sourcemeta::core::atomic_directory_replace(final_output_path, staging.path()); | ||
|
|
||
| // Clean up stale staging entries from crashed previous runs. We do this | ||
| // after committing so that we never interfere with a concurrent run |
There was a problem hiding this comment.
Even though this cleanup runs after the commit, it still removes any sibling entry starting with .sourcemeta-one-, which can delete a concurrently-running index job’s staging directory in the same parent and make that run fail (the entry.path() != staging.path() check won’t protect other staging dirs).
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
1 issue found across 9 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/index/index.cc">
<violation number="1" location="src/index/index.cc:640">
P1: The stale-staging cleanup deletes any `.sourcemeta-one-*` directory in the parent path, which can remove an active staging directory from a concurrent run. That risks failing or corrupting another index operation. Add a real staleness check (e.g., lock file or age threshold) before deleting.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| // after committing so that we never interfere with a concurrent run | ||
| for (const auto &entry : | ||
| std::filesystem::directory_iterator{final_output_path.parent_path()}) { | ||
| if (entry.path() != staging.path() && |
There was a problem hiding this comment.
P1: The stale-staging cleanup deletes any .sourcemeta-one-* directory in the parent path, which can remove an active staging directory from a concurrent run. That risks failing or corrupting another index operation. Add a real staleness check (e.g., lock file or age threshold) before deleting.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/index/index.cc, line 640:
<comment>The stale-staging cleanup deletes any `.sourcemeta-one-*` directory in the parent path, which can remove an active staging directory from a concurrent run. That risks failing or corrupting another index operation. Add a real staleness check (e.g., lock file or age threshold) before deleting.</comment>
<file context>
@@ -607,6 +629,26 @@ static auto index_main(const std::string_view &program,
+ // after committing so that we never interfere with a concurrent run
+ for (const auto &entry :
+ std::filesystem::directory_iterator{final_output_path.parent_path()}) {
+ if (entry.path() != staging.path() &&
+ entry.path().filename().string().starts_with(STAGING_PATH_PREFIX)) {
+ std::cerr << "Removing stale staging entry: " << entry.path().string()
</file context>
Signed-off-by: Juan Cruz Viotti jv@jviotti.com