Skip to content

Improve dataset upload retries for completed and incomplete files #5938

Description

@carloea2

Task Summary

Improve dataset upload retry behavior so batch uploads distinguish between incomplete multipart uploads and files that already exist in the dataset.

Case Expected behavior
Active multipart upload session exists for the same path Prompt the user to resume or restart the incomplete upload.
A file with the same path and size already exists in committed or staged dataset files Prompt the user to upload again or skip the matching file.

The completed-file prompt should use cautious wording because matching by path and size does not prove byte-for-byte equality.

Implementation should include:

  • A backend dataset-scoped check for candidate upload paths and sizes.
  • Frontend logic that checks active multipart sessions first, then checks existing matching files.
  • Support for mixed retry batches where one file resumes and another file can be skipped.
  • Tests for multipart resume behavior, completed-file skip behavior, backend committed/staged matches, and invalid or unauthorized requests.

Related discussion: #5744
Related PR: #5929

Task Type

  • Refactor / Cleanup
  • DevOps / Deployment / CI
  • Testing / QA
  • Documentation
  • Performance
  • Other

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Task.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions