node: Avoid duplicate key constraints when changing chain shard#6199
node: Avoid duplicate key constraints when changing chain shard#6199erayack wants to merge 1 commit intographprotocol:masterfrom
Conversation
|
This pull request hasn't had any activity for the last 90 days. If there's no more activity over the course of the next 14 days, it will automatically be closed. |
|
We really need to review this, this shouldn't have been closed |
c140bbc to
ed4e528
Compare
- Implement robust backup naming system to avoid conflicts - Add support for reusing existing backups when appropriate - Preserve previous backups with unique names - Add comprehensive logging for backup operations Fixes graphprotocol#6196
ed4e528 to
c5439e8
Compare
|
Hi @lutter, I rebased PR #6199 onto current master and resolved the conflict in node/src/manager/commands/chain.rs. The PR is now mergeable. I kept the original fix intent, but ported it to the current async chain-store code, preserved the safe chain-store creation ordering, simplified the backup-name selection flow, and added focused tests for backup-name formatting. |
|
|
||
| // Update the current chain name to chain-old | ||
| update_chain_name(conn, &chain_name, &new_name).await?; | ||
| let previous_backup_final_name = if let Some(backup) = existing_backup.as_ref() { |
There was a problem hiding this comment.
Your code will hit the same constraint in the following case:
mainnet on shard_a
mainnet-old on shard_b
Then running:
graphman chains change-shard mainnet shard_b
- Line 326 computes the next free backup name, e.g.
mainnet-old-1. - Line 327 renames
mainnet-old->mainnet-old-1. - Line 330 then tries to rename
mainnet-old-1->mainnet.
At that point the original mainnet row still exists, so the rename to mainnet violates the unique constraint on chains.name. The current mainnet -> mainnet-old-{next} rename only happens later on lines 339-340.
More generally, I’m not sure preserving multiple old caches is the right approach here. Moving a chain cache between shards should be a rare maintenance operation, and old caches are only useful as an immediate rollback. Reusing an existing backup on the target shard could make sense for an immediate rollback, although in other cases it may be too outdated. For a sequence like A -> B -> C, keeping the old cache on A does not seem useful, so probably dropping that previous backup before preserving the current B cache as the new mainnet-old would make the behavior simpler and avoid having multiple stale caches around the shards.
There was a problem hiding this comment.
I updated the flow to keep only one rollback backup and to order the reuse case as mainnet-old -> temp, mainnet -> mainnet-old, temp -> mainnet so the unique constraint cannot fire.
One thing I’m less certain about is policy: when an older mainnet-old exists and we are not reusing it, do you want that old backup dropped only after the new change-shard succeeds, or should it be deleted earlier as part of simplifying the behavior? Also, are you okay with updating the in-memory BlockStore mapping explicitly during these renames, or would you prefer a simpler approach even if it leaves cleanup to a reload?
There was a problem hiding this comment.
Sorry for the late reply, I was contemplating what we really want to happen here, so we don't make you go back and forth. What i'm thinking is:
- Add a check if
{chain}-oldalready exists and:- If its shard matches the target shard, prompt the user that it will re-use that backup (they can decline and remove it with
chain removeif they want) - If target shard is a different shard, print hint to use
chain remove.
- If its shard matches the target shard, prompt the user that it will re-use that backup (they can decline and remove it with
The point is to give more control to the indexers rather than doing stuff for then in the background
Description
This PR fixes issue #6196 by implementing a robust backup naming system that prevents duplicate key constraint violations when reverting chain shard changes.
Problem
When changing a chain's shard using
graphman chain change-shard, the existing chain is renamed to<chain>-old. However, if a subsequent attempt is made to revert the chain's shard back to the original shard, the operation fails due to a unique constraint violation because<chain>-oldalready exists in the database.Solution
next_backup_name()function that generates unique backup names by appending numeric suffixes when conflicts existChanges
ChainSwapOutcomestruct to track the results of chain swap operationsnext_backup_name()function for conflict-free backup namingchange_block_cache_shard()function with robust backup handling logicTesting
The fix handles the following scenarios:
<chain>-old)Fixes #6196