Skip to content

[opt](cloud) Check object existence before explicit object deletion#62973

Open
wyxxxcat wants to merge 4 commits intoapache:masterfrom
wyxxxcat:head_before_delete
Open

[opt](cloud) Check object existence before explicit object deletion#62973
wyxxxcat wants to merge 4 commits intoapache:masterfrom
wyxxxcat:head_before_delete

Conversation

@wyxxxcat
Copy link
Copy Markdown
Collaborator

What problem does this PR solve?

S3 DeleteObject is idempotent and can return success even when the target object does not exist. Because of this, explicit file deletion in cloud recycler could silently ignore missing objects.
This change makes explicit object deletion verify object existence before deleting by default, so missing files are surfaced as errors instead of being treated as successful deletes.

What changed?

  • Added an object existence check before S3ObjClient::delete_object and AzureObjClient::delete_object.
  • Updated S3Accessor::delete_file to propagate NOT_FOUND instead of treating it as success.
  • Added ObjClientOptions::check_exists_before_delete so explicit batch file deletes can request existence validation.
  • Added cloud config enable_delete_file_check_object_exists, defaulting to true, to allow disabling the extra HEAD request when needed.
  • Kept recursive/prefix cleanup behavior idempotent by default to avoid breaking retry and concurrent recycler cleanup flows.
  • Added DCHECK when the pre-delete HEAD check fails.
  • Added unit coverage for the S3 delete existence-check config behavior.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: S3-compatible object deletes can succeed even when the target object does not exist, so recycler file deletion could silently ignore missing files. Check object existence in the object storage client before explicit file deletion and propagate NOT_FOUND to the accessor.

### Release note

None

### Check List (For Author)

- Test: No need to test (requested to only change code and not write or run tests)
- Behavior changed: Yes (explicit file deletion now returns an error when the object is missing)
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: The object existence check before explicit file deletion should be configurable because it adds an extra HEAD request. Add a default-on cloud config and keep the strict delete behavior enabled by default.

### Release note

None

### Check List (For Author)

- Test: No need to test (requested to only change code and not write or run tests)
- Behavior changed: No (default behavior remains checking object existence before explicit file deletion)
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: The configurable object existence check before S3 object deletion needs unit coverage to verify the HEAD request is issued when enabled and skipped when disabled.

### Release note

None

### Check List (For Author)

- Test: Unit Test
    - ./run-cloud-ut.sh --run --filter=s3_accessor_client_test:S3ObjClientTest.DeleteObjectCheckExistsConfigTest
- Behavior changed: No
- Does this need documentation: No
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants