Skip to content

Add Crr Cascade capabilities to backbeat crr replication#2747

Open
SylvainSenechal wants to merge 1 commit into
development/9.5from
improvement/BB-767
Open

Add Crr Cascade capabilities to backbeat crr replication#2747
SylvainSenechal wants to merge 1 commit into
development/9.5from
improvement/BB-767

Conversation

@SylvainSenechal

@SylvainSenechal SylvainSenechal commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Issue: BB-767

Related PRs :
Arsenal : scality/Arsenal#2628
Cloudserver : scality/cloudserver#6179
CloudserverClient : scality/cloudserverclient#24
S3utils : scality/s3utils#395

@bert-e

bert-e commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Hello sylvainsenechal,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

Comment thread package.json Outdated
Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
@claude

claude Bot commented Jun 3, 2026

Copy link
Copy Markdown
  • package.json:54 — @scality/cloudserverclient uses a local file path (file:../cloudserverclient/...). Must be changed to a proper registry or git-pinned reference before merge.
    - ReplicateObject.js:6 — checkCrrCascadeEvent and getMicroVersionId() do not appear to exist in arsenal 8.3.9. Arsenal version bump likely needed.
    - ReplicateObject.js:743 — Any 409 from destination putMetadata is assumed to be cascade-stale and marked COMPLETED. Consider using a more specific signal to avoid silently skipping replication if 409 is returned for other reasons.

    Review by Claude Code

@SylvainSenechal SylvainSenechal marked this pull request as ready for review June 3, 2026 16:15
Comment thread extensions/replication/tasks/ReplicateObject.js

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can functional tests instead of just these,
But waiting for Arsenal/cloudserver to be merged, as it will be easier to make these tests (functional tests in backbeat rely on an image of cloudserver)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keeping unit test is good, functional test should just be an addition?

Comment thread package.json Outdated
@codecov

codecov Bot commented Jun 3, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.30%. Comparing base (fa0f64c) to head (5c4ed70).

Additional details and impacted files

Impacted file tree graph

Files with missing lines Coverage Δ
extensions/replication/tasks/ReplicateObject.js 92.94% <100.00%> (+0.65%) ⬆️

... and 3 files with indirect coverage changes

Components Coverage Δ
Bucket Notification 80.22% <ø> (ø)
Core Library 81.58% <ø> (ø)
Ingestion 70.13% <ø> (ø)
Lifecycle 79.06% <ø> (ø)
Oplog Populator 85.83% <ø> (ø)
Replication 62.09% <100.00%> (+0.53%) ⬆️
Bucket Scanner 85.76% <ø> (ø)
@@                 Coverage Diff                 @@
##           development/9.5    #2747      +/-   ##
===================================================
+ Coverage            75.23%   75.30%   +0.07%     
===================================================
  Files                  200      200              
  Lines                13764    13799      +35     
===================================================
+ Hits                 10355    10391      +36     
+ Misses                3399     3398       -1     
  Partials                10       10              
Flag Coverage Δ
api:retry 9.11% <0.00%> (-0.03%) ⬇️
api:routes 8.88% <0.00%> (-0.03%) ⬇️
bucket-scanner 85.76% <ø> (ø)
ft_test:queuepopulator 10.97% <0.00%> (+<0.01%) ⬆️
ingestion 12.33% <0.00%> (-0.04%) ⬇️
lib 7.81% <0.00%> (-0.01%) ⬇️
lifecycle 18.84% <0.00%> (-0.06%) ⬇️
notification 1.01% <0.00%> (-0.01%) ⬇️
oplogPopulator 0.14% <0.00%> (-0.01%) ⬇️
replication 18.79% <44.44%> (+0.02%) ⬆️
unit 53.62% <80.00%> (+0.43%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
@claude

claude Bot commented Jun 3, 2026

Copy link
Copy Markdown
  • package.json:54 — @scality/cloudserverclient uses a local file: path that will break CI and other developers. Must be changed to a proper npm or git-tag reference before merge.
    - extensions/replication/tasks/ReplicateObject.js:33-35 — Cascade sentinel objects (cascadeLoopDetected, cascadeDataComplete, partAlreadyAtDest) are plain {} singletons that pass through BackbeatTask.retry(), which mutates errors. Consider Object.freeze() to prevent accidental mutation.

    The cascade logic itself (putData/putMetadata detection, allPartsAlreadyAtDest propagation through the waterfall, _handleReplicationOutcome treating cascade signals as COMPLETED) is sound. The retry wrapper correctly passes cascade signals through without retrying (no retryable property). Tests cover the key scenarios well.

    Review by Claude Code

Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
Comment thread package.json Outdated
Comment thread package.json Outdated
Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
@claude

claude Bot commented Jun 5, 2026

Copy link
Copy Markdown
  • ReplicateObject.js:602 — Cascade sentinel objects (cascadeLoopDetected, cascadeDataComplete) are passed as err to callbacks. Verify that BackbeatTask._retry won't retry on these truthy-but-non-error sentinels, which would defeat the loop/stale detection.
    - ReplicateObject.js:433 — On error path in _getAndPutData, destLocations may contain partial results. The sentinel objects ({}) pass the filter and could reach _deleteOrphans with undefined keys. Likely harmless due to downstream filtering, but worth a defensive check.
    - ReplicateObject.js:1014 — _processQueueEntryRetryFull now passes allPartsAlreadyAtDest as mdOnly to _putMetadata, changing retry-full semantics from always-full to conditionally-metadata-only. Confirm this is intentional.
    - package.json:57 — Arsenal pinned to a raw commit hash instead of a tag. Should be updated to a tag once the Arsenal PR merges.
    - package.json:54 — scality-cloudserverclient-v1.0.9.tgz vendored as a binary blob in git. Prefer a registry or git-tag reference.
    - ReplicateObject.js:573 — Trailing whitespace.

    Review by Claude Code

Comment thread package.json Outdated
"@smithy/node-http-handler": "^3.3.3",
"JSONStream": "^1.3.5",
"arsenal": "git+https://github.com/scality/arsenal#8.3.9",
"arsenal": "git+https://github.com/scality/Arsenal#2c429ab35a5ac82c3dafa5a0296a49a23a9c8a4a",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arsenal is pinned to a raw commit hash (2c429ab...) instead of a semantic version tag. Per project conventions, git-based deps (arsenal, vaultclient, etc.) should pin to tags (e.g. #8.x.y). Commit hashes are opaque — it's unclear which features/fixes are included, and there's no semver contract. This also makes it harder for reviewers and operators to reason about what changed.

— Claude Code

Comment thread package.json Outdated
@SylvainSenechal SylvainSenechal requested review from a team, benzekrimaha and maeldonn June 8, 2026 15:47
@scality scality deleted a comment from bert-e Jun 9, 2026
Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
@scality scality deleted a comment from claude Bot Jun 22, 2026
@scality scality deleted a comment from claude Bot Jun 22, 2026
Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
VersioningRequired: true,
RequestUids: log.getSerializedUids(),
});
const putCommand = attachExpectContinueMiddleware(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create the follow-up ticket @SylvainSenechal ?

err, sourceEntry, destEntry, kafkaEntry, log, done));
}

_handleReplicationOutcome(err, sourceEntry, destEntry, kafkaEntry,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section has quite a bit of nested conditional logic and duplicate checks that make it hard to read and maintain.

Could we flatten this using guard clauses (early returns) and abstract the err.XYZ || err.name === 'XYZ' checks into a helper function? It would drastically reduce the cognitive load of this function. Let me know if you want to pair on it!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree its trash code and the diff is hard to read with the last else

I just changed it and tried something that i didn't want to do first but i think it's fine : for each condition, directly publish/return, without having to do it at the end of the function. I believe its quite readable this way

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could still use some refactor maybe, although all the erro don't have the same form, I think the diff is reasonnable here

this._getAndPutPart(sourceEntry, destEntry, part, log, done);
}, (err, destLocations) => {
}, (err, partResults) => {
const destLocations = (partResults || [])

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you extract this change into its own commit so you can explain the 'why' in the commit description ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I added some comments to clarify this a bit

Comment thread extensions/replication/tasks/ReplicateObject.js
@SylvainSenechal SylvainSenechal force-pushed the improvement/BB-767 branch 3 times, most recently from fd87ff8 to d1bf122 Compare June 26, 2026 08:49
@SylvainSenechal SylvainSenechal requested a review from maeldonn June 26, 2026 08:53

@francoisferrand francoisferrand left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few cleanup, but most importantly I am kind of worried on how to handle the conflict in case of MPU specifically...

Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
Comment on lines +470 to +472
// If all parts were already at destination, destLocations is [].
// Sending [] without metadata-only mode would wipe the object's
// location field at the destination, so force metadata-only.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we not pass partResults anyway? I mean for deleteOrphans we do indeed need to skip the parts already at destination ; but for continuation we need to pass all parts from destination : so that we can construct the metadata of the MPU object ?

the issue I think of is racing to create the object: so 2 source replicate half the objects each, and eventually one of them must still write the metadata after filling it with the actual part "ids".

→ if all parts at destination, maybe we can indeed skip metadata creation (or best to try it anyway? this cheap vs the upload of all parts anyway....)
→ if some pats are missing, we must still try to write the metadata I guess (contrary to simple objects, the case of a "partial" MPU is not the same!)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this.
DeleteOrphans is fine as it is.

I think the big problem we have (although maybe it's not a expected product scenario) is
4 locations with a mix of cascade and multi destination, using MPU :

a -> b
a -> c

b -> d
c -> d

b and c will write to d at the same time, so let's say for 10 parts, half of them will be written from b and the other from c.
Then for the parts already written by the other location, the current location will get a list of partResults with some elements being errors partAlreadyAtDest, and other elements will be proper locations.
If we pass the whole unfiltered partResults down the call chain, we will get corrupted location data with beause of the errors in partResults.

I feel like a correct way to handle this would be :
When doing an mpu, as soon as I see a partAlreadyAtDest errors, I stop because it means another location is already doing its replication, so I will let it do it fully.

But it seems like this could get racy, when both sources detect a partAlreadyAtDest -_-

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And thinking some more I think what we would want is :

When doing a putData, even when there is a 409, return the existing location for that part to backbeat
Then regardless of we had had collision on putData or not, at the end we have the data locations for all parts, backbeat can send putMEtaData with all parts locations, and so in our scenarios, both location b and c will call putMetaData, and one of the call will fail but its alright

cc @maeldonn

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When doing a putData, even when there is a 409, return the existing location for that part to backbeat

409 is not "by part" : the 409 means "there is already an object" (i.e. metadata), it is not related to a single part.
each part upload is completely transparent.

When doing an mpu, as soon as I see a partAlreadyAtDest errors, I stop because it means another location is already doing its replication, so I will let it do it fully.

agreed (though to be precise: once we detect it, it means the metadata was written: so it has already done the replication)

also need to delete all uploaded parts, since each replication source will upload their own parts independently until either success or conflict. On conflict all parts must be removed.

b and c may both write parts (putData) in parallel, but they will do so completely indepedently: the only moment when they will interact (i.e. conflict will be detectable) is when the metadata has been written by either → from that point the "remaining" putData will fail, and the loser must remove his "own" parts (all the successful putData he made) and skip the putMetadata...


this also means there is likely a followup here, to redesign MPU upload both for safety (avoid orphans) and optimization (detect conflict earlier and/or allow reusing parts already uploaded)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok I didn't understand that the 409 would only happen after we have done the putMetadata, but yeah it makes sense, it's when the data gets linked to the metadata.

In that case, our problem might become much easier, since what happens now is :

  • one of the 2 location finishes its replication first, with multiple putData and 1 putMetedata
  • The second one is slower, and as soon as the first one has done the putMetadata, this one starts receiving the "partAlreadyAtDestination". Then that second one eventually also tries to do a putMetadata, but because it's already there, it will get a MicroVersionIdAlreadyStoredException, and the error will trigger a deleteOrphans.

So I think the only follow we may wanna do, which is not even fully related to this feature is : stopping the code execution earlier as soon as we receive one "partAlreadyAtDest"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
if (!collisionErr.microVersionId) {
log.info('cascade putData: data at destination, ' +
'no microVersionId, proceeding with putMetadata', logMeta);
return { err: null, result: partAlreadyAtDest };

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this really happen?
→ if we get a collision, it means we have the "newer" cloudserver code, which returns the microVersionId : the field should always be set
→ so this case should either be an error (like decoding), or just the usual path of compareMicroVersionId() (since microVersionId is not set when "creating" object)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here microVersionId is the microVersionId at the destination, so it could be undefined.
I think it's unlikely but maybe a scenario like replication already setup before cascaded : we have a replica already but no microVersionId
This is quite defensive though but decode can crash if we don't check this

Keeping this open because in cloudserver there is still discussion about microVersionId beging set when creating the object

if (err.ObjNotFound || err.name === 'ObjNotFound') {
return cbOnce(err);
}
if (err instanceof MicroVersionIdAlreadyStoredException) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we have a single exception in the latest API, leaving the caller to check microVersionID to identify if this is a "loop" or "stale" ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I built it this way, prefer to have backbeat just checking error instead of reruning the whole microVersionId comparison that cloudserver already did

Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
Comment thread tests/unit/replication/ReplicateObject.spec.js Outdated

@francoisferrand francoisferrand left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multiple open topics open in scality/cloudserver#6179, regarding "empty" versionId, shape of errors returned, ... → best to settle these before reviewing

@SylvainSenechal SylvainSenechal force-pushed the improvement/BB-767 branch 2 times, most recently from 83c6e9f to 9da41bb Compare July 2, 2026 14:07
@@ -685,14 +744,14 @@ class ReplicateObject extends BackbeatTask {
return cbOnce(err);
}
log.error('an error occurred when putting metadata to S3',

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MicroVersionIdAlreadyStoredException and StaleMicroVersionIdException from putMetadata are expected cascade signals, not errors. They fall through to this log.error because the catch block only early-returns for ObjNotFound. In production cascade scenarios, every loop/stale detection at the metadata level will emit a misleading error log.

Add early returns before the error log, mirroring the ObjNotFound pattern:

if (err instanceof MicroVersionIdAlreadyStoredException ||
    err instanceof StaleMicroVersionIdException) {
    return cbOnce(err);
}

this._getAndPutPart(sourceEntry, destEntry, part, log, done);
}, (err, destLocations) => {
}, (err, partResults) => {
// partAlreadyAtDest signals data already at dest (cascade putData 409);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partAlreadyAtDest name is not correct : we cannot detect that a "part" is already at destination, only that there is already an object (i.e. a document in mongo with the specified key and versionId).
→ should really be objectAlreadyAtDest

....and it changes the logic a bit : when it happen (even on a single part), all successful parts must be deleted.

Suggested change
// partAlreadyAtDest signals data already at dest (cascade putData 409);
if (err) {
return this._deleteOrphans(destEntry, destLocations, log, () => cb(err));
}
const destLocations = (partResults || []).filter(result => result && result !== partAlreadyAtDest);
if (destLocations.length != partResults.length) {
// object already exist, release all parts then check if metadata needs to be updated
return this._deleteOrphans(destEntry, destLocations, log, () =>
cb(null, destLocations, true);
}
return this._deleteOrphans(destEntry, destLocations, log, () =>
cb(null, destLocations, false);

i.e. we cannot make a partial write of object data. To create the metadata, all parts must be known: so all of them must have been written successfully. If there is a "conflict" on any single part, it means another replicant has finished uploading parts and created the metadata document : so we must drop all the parts we wrote, and just update the object metadata if needed....

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also please create followups to

  • ignore (other) error if a single part has a conflict
  • abort other parts upload as soon as we identify a conflict (i.e. imagine a 1000 parts object: if we have a conflict on first part, no point trying the other parts)
  • consider changing the putPartData protocol to create actual MPU parts (which can be garbage-collected by lifecycle after transfer is aborted), for extra safety (not strictly related to CRR, but best to track it)

});
}

_resolveVersionIdCollision(collisionErr, sourceEntry, destEntry, log) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this whole function is not correct in case of putData(Parts) : if we receive VersionIdCollision, it means there is already a object with this versionId.

  • We must not create or replace the metadata with the new data (data is immutable)
  • We must immediately delete whatever we already uploaded
  • The microVersionId comparison can be used only after this, to decide if we need to proceed with the metadata update (e.g. not the location, but the other fields: tags, ...) or can skip it

→ here the function is used to compute partAlreadyAtDest, which would make _getAndPutData silently "hide" the error if there is already some data at the destination BUT the metadata we are trying to replicate is newer.
→ _getAndPutData() should probably return the max/last microVersionId it received (in case of conflict) or nothing it is wrote all data successfully (i.e. no conflict)

(the logic here is what should happen on microVersionId conflict on metadata, not here on VersionIdConflict)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also in that case the x-scal-replication-content sent to cloudserver MUST not be DATA+METADATA anymore, but be "downgraded" to METADATA (to keep the existing data)

Comment on lines +1019 to +1037
if (err instanceof MicroVersionIdAlreadyStoredException) {
log.info('replication completed via cascade loop: ' +
'object already at destination with the same revision',
{ entry: sourceEntry.getLogInfo() });
this._publishReplicationStatus(sourceEntry, 'COMPLETED', { kafkaEntry, log });
return done(null, { committable: false });
}
if (err instanceof StaleMicroVersionIdException) {
log.info('replication completed: destination already holds ' +
'this version with a newer revision',
{ entry: sourceEntry.getLogInfo() });
this._publishReplicationStatus(sourceEntry, 'COMPLETED', { kafkaEntry, log });
return done(null, { committable: false });
}
if (!err) {
log.debug('replication succeeded for object, publishing ' +
'replication status as COMPLETED',
{ entry: sourceEntry.getLogInfo() });
this._publishReplicationStatus(
sourceEntry, 'COMPLETED', { kafkaEntry, log });
this._publishReplicationStatus(sourceEntry, 'COMPLETED', { kafkaEntry, log });

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do exactly the same in all 3 branches : do we need 2 errors and different logs?

if (!err || err instanceof ...) {
    log.debug('replication succeeded for object, publishing ' +
                'replication status as COMPLETED',
                { entry: sourceEntry.getLogInfo(), err });
    this._publishReplicationStatus(sourceEntry, 'COMPLETED', { kafkaEntry, log });
}

@francoisferrand francoisferrand left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handling of MPU is not correct : each "replicant" must write their own object fully, there is no situation where 2 "sources" each replicate part of the data (nor a way for these to identify the data they have written). Each source create the metadata document with the parts they uploaded, so they MUST successfully upload all parts data and MUST not have a conflict on any part.

@bert-e

bert-e commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

The following reviewers are expecting changes from the author, or must review again:

Comment on lines +1019 to +1032
if (err instanceof MicroVersionIdAlreadyStoredException) {
log.info('replication completed via cascade loop: ' +
'object already at destination with the same revision',
{ entry: sourceEntry.getLogInfo() });
this._publishReplicationStatus(sourceEntry, 'COMPLETED', { kafkaEntry, log });
return done(null, { committable: false });
}
if (err instanceof StaleMicroVersionIdException) {
log.info('replication completed: destination already holds ' +
'this version with a newer revision',
{ entry: sourceEntry.getLogInfo() });
this._publishReplicationStatus(sourceEntry, 'COMPLETED', { kafkaEntry, log });
return done(null, { committable: false });
}

@francoisferrand francoisferrand Jul 3, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a risk of creating orphans here, please create a followup:

  1. new object was created → try to replicate DATA+META
  2. putDataParts succeeds (no conflict)
  3. before we could put metadata, another site was able to putData (same as us, not metadata → impossible to detect conflict) and putMetadata
  4. putMetadata will thus fail with MicroVersionIdAlreadyStoredException/StaleMicroVersionIdException

→ in that case, data created in 2. must be deleted, i.e. call _deleteOrphans. _deleteOrphans MUST NOT be called if this was a META-only update / if we did not call putDataParts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants