fix(#593): write integrity for obj_store_lib — opt-in verification, s3dlio 0.9.106#41
Merged
Merged
Conversation
Addresses silent write-corruption reported in mlcommons/storage#593. Data written during datagen or checkpointing could be silently truncated; the write appeared to succeed but the stored object was shorter than expected. This commit wires in the two-layer retry/verification architecture introduced in s3dlio 0.9.104. ## obj_store_lib.py — multipart upload retry (Python layer) ObjStoreLibStorage._mpu_upload_with_retry() is the Python-side retry wrapper for MultipartUploadWriter (used for objects at or above S3DLIO_MULTIPART_THRESHOLD_MB, default 16 MiB): - On RuntimeError from writer.write() or writer.close(), calls writer.abort() to free the in-progress upload slot on the server, then sleeps S3DLIO_MPU_RETRY_DELAY_S seconds and retries with a fresh MultipartUploadWriter. - After S3DLIO_MPU_MAX_RETRIES total attempts, raises RuntimeError chained to the original exception so the root cause is not lost. - Logs a WARNING on each retry and ERROR on final failure. ## obj_store_lib.py — single-part PUT retry (Rust layer, transparent) For objects below the multipart threshold, put_data() calls self._s3dlio.put_bytes(id, payload). As of s3dlio 0.9.104, put_bytes() / put_bytes_async() internally run put_verified_with_retry: after every PUT it issues a HEAD to verify the stored byte count, and retries automatically (up to S3DLIO_PUT_MAX_RETRIES, default 3) if there is a mismatch. No Python-layer retry is needed or added for this path — the Rust layer handles it transparently. ## obj_store_lib.py — environment variable documentation Class-level constants block fully annotated with # comments explaining every write-path environment variable, its type, default, allowable values, and the interaction between the two retry layers: S3DLIO_MULTIPART_THRESHOLD_MB (int ≥ 0 MiB, default 16) S3DLIO_MPU_MAX_RETRIES (int ≥ 1, default 3) S3DLIO_MPU_RETRY_DELAY_S (float ≥ 0 s, default 5) S3DLIO_PUT_MAX_RETRIES (int ≥ 1, default 3, Rust layer) S3DLIO_PUT_RETRY_DELAY_MS (int ≥ 0 ms, default 1000, Rust layer) _mpu_upload_with_retry() docstring expanded to NumPy-style with full Parameters, Retry policy, and Raises sections. ## pyproject.toml — bump minimum s3dlio version to 0.9.104 s3dlio>=0.9.102 → s3dlio>=0.9.104 0.9.104 is the first release that includes put_verified_with_retry (single-part integrity) and the multipart __exit__ error-propagation and stored-size verification fixes. Pinning below this version would silently omit the Rust-layer half of the write-integrity guarantee. ## README.md Added two bullet points under "Storage Backends" and "Correctness Fixes" summarising the write-integrity changes for storage#593. ## Tests 84 passed, 1 skipped (dftracer skip is pre-existing) via: uv run pytest tests/test_fast_ci.py
Companion to s3dlio v0.9.106 (russfellows/s3dlio#147), which changed the storage#593 write-verification behavior introduced in v0.9.104 from always-on to opt-in, default false. This was a deliberate follow-up: v0.9.104's always-on HEAD-after-write verification added a round-trip to every object write, which is a real throughput cost for benchmark workloads writing many small objects during datagen. No other S3 client library does this by default. ## dlio_benchmark/storage/obj_store_lib.py Documented the two new opt-in flags (both read entirely inside the s3dlio Rust library — DLIO does not read them itself, but they directly determine whether _mpu_upload_with_retry() below ever has anything to retry): S3DLIO_PUT_VERIFY (bool, default false) — single-part put_bytes() S3DLIO_MPU_PUT_VERIFY (bool, default false) — multipart MultipartUploadWriter Clarified in both the class-level comment block and the _mpu_upload_with_retry() docstring that S3DLIO_MPU_MAX_RETRIES / S3DLIO_MPU_RETRY_DELAY_S are only meaningful when S3DLIO_MPU_PUT_VERIFY=true — without it, the retry loop only ever fires on a genuine API error (e.g. a failed UploadPart), never on silent truncation, since nothing detects it. ## pyproject.toml s3dlio>=0.9.104 -> s3dlio>=0.9.106. Local [tool.uv.sources] wheel override commented back out — resolves from PyPI now that v0.9.106 is published. ## tests/test_write_verification_593.py (new) Live integration tests (opt-in via DLIO_OBJECT_STORAGE_TESTS=1, matching the existing tests/test_s3dlio_object_store.py convention; NOT part of the tests/test_fast_ci.py checkin suite) confirming, through the real installed s3dlio wheel rather than mocks: - test_put_bytes_default_off_no_verification / test_put_bytes_opt_in_runs_verification: single-part PUT correctly skips or runs HEAD verification based on S3DLIO_PUT_VERIFY. - test_mpu_upload_with_retry_default_off_no_verification / test_mpu_upload_with_retry_opt_in_runs_verification: multipart upload correctly skips or runs HEAD verification based on S3DLIO_MPU_PUT_VERIFY, exercising DLIO's actual ObjStoreLibStorage._mpu_upload_with_retry method (via a minimal stand-in for self, avoiding the full Hydra/ConfigArguments bootstrap) rather than a synthetic reimplementation. - test_mpu_upload_with_retry_retries_on_genuine_error: confirms DLIO's own retry/abort logic still works correctly on a genuine API error (zero parts written), independent of the verify flag. Verification state is observed via s3dlio.init_logging("debug") captured with pytest's capfd fixture (fd-level capture is required: s3dlio's tracing subscriber is native Rust code writing directly to the OS stdout file descriptor, bypassing Python's sys.stdout object) plus an independent stat() call the test performs itself, not by trusting s3dlio's internal state. Verified twice: once against s3dlio 0.9.106 installed from a local wheel, once against s3dlio 0.9.106 installed from PyPI (this commit's state) — 5/5 passing both times, and tests/test_fast_ci.py 84 passed/1 skipped (pre-existing, unrelated skip) against the PyPI wheel.
6 tasks
FileSystemGuy
approved these changes
Jul 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Addresses silent write corruption reported in mlcommons/storage#593, and wires in the follow-up fix that makes the resulting HEAD-verification behavior opt-in rather than mandatory overhead for every write.
Supersedes #40 (same underlying work, retargeted from
fix/593-write-integrity-s3dlio-0.9.104after s3dlio's write-verification defaults changed upstream).Background
MultipartUploadWriter.__exit__silently discarding errors, and no size verification afterCompleteMultipartUpload. It made HEAD-after-write verification always-on for both single-part PUT and multipart upload.This PR brings DLIO in line with s3dlio 0.9.106's opt-in model.
What changed
dlio_benchmark/storage/obj_store_lib.py_mpu_upload_with_retry()— multipart retry wrapper (unchanged logic, now documented against the opt-in model)Wraps every
MultipartUploadWritersession with automatic retry onRuntimeError:writer.abort()to free the in-progress upload slot (abort errors are swallowed so they don't mask the original failure).WARNING, sleepsS3DLIO_MPU_RETRY_DELAY_S, retries with a freshMultipartUploadWriter.S3DLIO_MPU_MAX_RETRIESattempts, raisesRuntimeErrorchained to the original exception.Important: this retry loop only catches storage#593-style silent truncation when
S3DLIO_MPU_PUT_VERIFY=trueis also set (a new s3dlio-side flag, defaultfalse). Without it, the loop still works correctly — it just never has a size-mismatch error to retry, since s3dlio doesn't issue the HEAD that would detect one.Single-part path — no DLIO-side change needed
put_data()still callsself._s3dlio.put_bytes(id, payload)unchanged. As of s3dlio 0.9.106,put_bytes()'s HEAD-verify-retry behavior is gated behindS3DLIO_PUT_VERIFY(defaultfalse), entirely inside the Rust library — transparent to DLIO either way.Environment variable documentation
The class-level constants block documents all five write-path variables, their defaults, and — critically — that
S3DLIO_MPU_MAX_RETRIES/S3DLIO_MPU_RETRY_DELAY_Sare only meaningful whenS3DLIO_MPU_PUT_VERIFY=true:S3DLIO_MULTIPART_THRESHOLD_MB16S3DLIO_MPU_PUT_VERIFYfalseS3DLIO_MPU_MAX_RETRIES3S3DLIO_MPU_RETRY_DELAY_S5S3DLIO_PUT_VERIFYfalseS3DLIO_PUT_MAX_RETRIES3S3DLIO_PUT_RETRY_DELAY_MS1000pyproject.tomls3dlio>=0.9.104→s3dlio>=0.9.106, resolved from PyPI.tests/test_write_verification_593.py(new)Live integration tests (opt-in via
DLIO_OBJECT_STORAGE_TESTS=1, matching the existingtests/test_s3dlio_object_store.pyconvention — not part of thetests/test_fast_ci.pycheckin suite) that confirm, through the real installed s3dlio wheel:S3DLIO_PUT_VERIFY/S3DLIO_MPU_PUT_VERIFYunset): no HEAD verification runs for either write path, and writes still land correctly.=true): HEAD verification runs for the corresponding path._mpu_upload_with_retryretry/abort logic works correctly on a genuine API error, independent of the verify flag.Verification state is observed via
s3dlio.init_logging("debug")captured with pytest'scapfdfixture (fd-level capture is required — s3dlio's tracing subscriber writes directly to the OS stdout file descriptor) plus an independentstat()call the test performs itself.README.mdNew Storage Backends / Correctness Fixes bullets summarizing the write-integrity and opt-in-verification changes (from the prior commit on this branch).
Test results
Verified twice: once against s3dlio 0.9.106 from a local wheel, once against s3dlio 0.9.106 installed from PyPI (this PR's final state).
The 1 skip (
test_dftracer_core) is pre-existing and unrelated.Checklist
s3dlio>=0.9.106inpyproject.toml, resolved from PyPI (no local wheel override)_mpu_upload_with_retry()docstring clarifies verify-flag dependencymlcommons/DLIO_local_changes, notargonne-lcf/dlio_benchmark