Skip to content

Process master entry on ObjectRestore:Retry#2768

Draft
maeldonn wants to merge 1 commit into
development/9.1from
bugfix/BB-794/requeue-restore
Draft

Process master entry on ObjectRestore:Retry#2768
maeldonn wants to merge 1 commit into
development/9.1from
bugfix/BB-794/requeue-restore

Conversation

@maeldonn

Copy link
Copy Markdown
Contributor

For null-version objects (master entry with versionId and isNull: true, typical of buckets that have had versioning Suspended at any point), a requeueRestore Kafka message without objectVersion updates the master metadata as expected but never produces a cold-restore-req message: the lifecycle queue populator's master-skip rule drops the resulting oplog event, and the retry never reaches DMF.

The skip rule was designed for the normal restore flow (s3:ObjectRestore:Post), where the populator receives events on both the master and the version entry and must dedup. The retrigger restore flow (s3:ObjectRestore:Retry) only updates the master entry, so the master event is the only event the populator will ever get for that retry.

This PR exempts s3:ObjectRestore:Retry from the master-skip rule. The normal s3:ObjectRestore:Post path is unchanged.

The current workaround is to pass the encoded internal VersionId returned by head-object in objectVersion, which routes the update through the version-entry path and avoids the skip rule. That requires the caller to know an internal versionId on objects that are otherwise non-versioned from the API's perspective. With this fix, omitting objectVersion works as expected on null-version objects, matching what the S3 API surface implies.

Issue: BB-794

The populator's master-skip rule was designed for the normal restore
flow. On that flow, the populator receives events on both the master
and the version entry, so skipping the master avoids producing two
`cold-restore-req` messages for the same object. The retrigger restore
flow only touches the master entry though, so the master oplog event
is the only event we will ever get. For null-version objects (master
with `isNull: true`), the existing skip rule silently dropped that
event and no restore request reached DMF.

Exempt `s3:ObjectRestore:Retry` from the skip rule. The master event
is the only one to come, so we must produce on it. Normal restore
(`s3:ObjectRestore:Post`) is unchanged.

Issue: BB-794
@maeldonn maeldonn requested review from a team, benzekrimaha and delthas June 24, 2026 06:31
@bert-e

bert-e commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Hello maeldonn,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

@bert-e

bert-e commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Incorrect fix version

The Fix Version/s in issue BB-794 contains:

  • None

Considering where you are trying to merge, I ignored possible hotfix versions and I expected to find:

  • 9.1.13

  • 9.2.8

  • 9.3.6

  • 9.4.1

  • 9.5.0

Please check the Fix Version/s of BB-794, or the target
branch of this pull request.

@codecov

codecov Bot commented Jun 24, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.79%. Comparing base (2172d4c) to head (aa6e5be).

Additional details and impacted files

Impacted file tree graph

Files with missing lines Coverage Δ
extensions/lifecycle/LifecycleQueuePopulator.js 80.19% <100.00%> (+0.96%) ⬆️

... and 3 files with indirect coverage changes

Components Coverage Δ
Bucket Notification 80.22% <ø> (ø)
Core Library 80.77% <ø> (-0.58%) ⬇️
Ingestion 70.33% <ø> (ø)
Lifecycle 79.12% <100.00%> (+0.03%) ⬆️
Oplog Populator 85.06% <ø> (ø)
Replication 61.18% <ø> (ø)
Bucket Scanner 85.76% <ø> (ø)
@@                 Coverage Diff                 @@
##           development/9.1    #2768      +/-   ##
===================================================
- Coverage            75.02%   74.79%   -0.23%     
===================================================
  Files                  200      200              
  Lines                13541    13541              
===================================================
- Hits                 10159    10128      -31     
- Misses                3372     3403      +31     
  Partials                10       10              
Flag Coverage Δ
api:retry 9.31% <0.00%> (ø)
api:routes 9.12% <0.00%> (ø)
bucket-scanner 85.76% <ø> (ø)
ft_test:queuepopulator 9.12% <0.00%> (-1.28%) ⬇️
ingestion 12.66% <0.00%> (ø)
lib 7.92% <0.00%> (ø)
lifecycle 18.93% <0.00%> (+0.01%) ⬆️
notification 1.03% <0.00%> (ø)
replication 18.93% <0.00%> (ø)
unit 51.05% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@maeldonn maeldonn changed the title Bugfix/bb 794/requeue restore Process master entry on ObjectRestore:Retry Jun 24, 2026
// the non-master entry will be processed. The retrigger restore flow
// (originOp 's3:ObjectRestore:Retry') only updates the master entry, so
// the master event is the only one we will ever get — do not skip it.
if (this._isVersionedObject(value) && isMasterKey(entry.key) && operation !== 's3:ObjectRestore:Retry') {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introducing this asymmetry seems very weird : AFAIK we are able to restore both "master-only" null versions (i.e. object created before versioning was enabled) and "version-suspended" versions : so _isVersionedObject() seems to handle its job, and prevent duplicate handling of restore while still processing master document when needed.

Generally, lifecycle (cold transition & restore) work for these objects (tested in zenko I think, to be confirmed) : so we need to understand exactly what Sorbet passes in message, so we can properly "forge" the message (+maybe add missing checks to ensure a "bad" message does not make things worse)

"only updates the master entry"

  • if there is only a master, should be covered (already) by _isVersionedObject ?
  • if there is both a master & a version documents, somehow the code ends up creating a desynchro between master & version : which is a separate but critical issue

@maeldonn maeldonn marked this pull request as draft June 26, 2026 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants