Skip to content

fix(#593): write integrity for obj_store_lib — opt-in verification, s3dlio 0.9.106#41

Merged
FileSystemGuy merged 2 commits into
mainfrom
fix/593-write-integrity-s3dlio-0.9.106
Jul 2, 2026
Merged

fix(#593): write integrity for obj_store_lib — opt-in verification, s3dlio 0.9.106#41
FileSystemGuy merged 2 commits into
mainfrom
fix/593-write-integrity-s3dlio-0.9.106

Conversation

@russfellows

Copy link
Copy Markdown

Summary

Addresses silent write corruption reported in mlcommons/storage#593, and wires in the follow-up fix that makes the resulting HEAD-verification behavior opt-in rather than mandatory overhead for every write.

Supersedes #40 (same underlying work, retargeted from fix/593-write-integrity-s3dlio-0.9.104 after s3dlio's write-verification defaults changed upstream).

Background

  • s3dlio 0.9.104 (PR #145) fixed two silent-corruption bugs: MultipartUploadWriter.__exit__ silently discarding errors, and no size verification after CompleteMultipartUpload. It made HEAD-after-write verification always-on for both single-part PUT and multipart upload.
  • s3dlio 0.9.106 (PR #147) changed that verification to opt-in, default off — the always-on HEAD check added a round-trip to every object write, a real throughput cost for datagen workloads writing many small objects, and more aggressive than other S3 client libraries' defaults.

This PR brings DLIO in line with s3dlio 0.9.106's opt-in model.

What changed

dlio_benchmark/storage/obj_store_lib.py

_mpu_upload_with_retry() — multipart retry wrapper (unchanged logic, now documented against the opt-in model)

Wraps every MultipartUploadWriter session with automatic retry on RuntimeError:

  1. On failure, calls writer.abort() to free the in-progress upload slot (abort errors are swallowed so they don't mask the original failure).
  2. Logs a WARNING, sleeps S3DLIO_MPU_RETRY_DELAY_S, retries with a fresh MultipartUploadWriter.
  3. After S3DLIO_MPU_MAX_RETRIES attempts, raises RuntimeError chained to the original exception.

Important: this retry loop only catches storage#593-style silent truncation when S3DLIO_MPU_PUT_VERIFY=true is also set (a new s3dlio-side flag, default false). Without it, the loop still works correctly — it just never has a size-mismatch error to retry, since s3dlio doesn't issue the HEAD that would detect one.

Single-part path — no DLIO-side change needed

put_data() still calls self._s3dlio.put_bytes(id, payload) unchanged. As of s3dlio 0.9.106, put_bytes()'s HEAD-verify-retry behavior is gated behind S3DLIO_PUT_VERIFY (default false), entirely inside the Rust library — transparent to DLIO either way.

Environment variable documentation

The class-level constants block documents all five write-path variables, their defaults, and — critically — that S3DLIO_MPU_MAX_RETRIES/S3DLIO_MPU_RETRY_DELAY_S are only meaningful when S3DLIO_MPU_PUT_VERIFY=true:

Variable Layer Default What it controls
S3DLIO_MULTIPART_THRESHOLD_MB DLIO (Python) 16 MiB above which multipart is used
S3DLIO_MPU_PUT_VERIFY s3dlio (Rust) false Opt-in: HEAD-verify after multipart complete
S3DLIO_MPU_MAX_RETRIES DLIO (Python) 3 Total multipart attempts (only bites when verify is on)
S3DLIO_MPU_RETRY_DELAY_S DLIO (Python) 5 Seconds between multipart retries
S3DLIO_PUT_VERIFY s3dlio (Rust) false Opt-in: HEAD-verify after single-part PUT
S3DLIO_PUT_MAX_RETRIES s3dlio (Rust) 3 Total single-part attempts (only bites when verify is on)
S3DLIO_PUT_RETRY_DELAY_MS s3dlio (Rust) 1000 Milliseconds between single-part retries

pyproject.toml

s3dlio>=0.9.104s3dlio>=0.9.106, resolved from PyPI.

tests/test_write_verification_593.py (new)

Live integration tests (opt-in via DLIO_OBJECT_STORAGE_TESTS=1, matching the existing tests/test_s3dlio_object_store.py convention — not part of the tests/test_fast_ci.py checkin suite) that confirm, through the real installed s3dlio wheel:

  • Default (S3DLIO_PUT_VERIFY/S3DLIO_MPU_PUT_VERIFY unset): no HEAD verification runs for either write path, and writes still land correctly.
  • Opt-in (=true): HEAD verification runs for the corresponding path.
  • DLIO's own _mpu_upload_with_retry retry/abort logic works correctly on a genuine API error, independent of the verify flag.

Verification state is observed via s3dlio.init_logging("debug") captured with pytest's capfd fixture (fd-level capture is required — s3dlio's tracing subscriber writes directly to the OS stdout file descriptor) plus an independent stat() call the test performs itself.

README.md

New Storage Backends / Correctness Fixes bullets summarizing the write-integrity and opt-in-verification changes (from the prior commit on this branch).

Test results

DLIO_OBJECT_STORAGE_TESTS=1 BUCKET=mlp-s3dlio uv run pytest tests/test_write_verification_593.py -v
5 passed

Verified twice: once against s3dlio 0.9.106 from a local wheel, once against s3dlio 0.9.106 installed from PyPI (this PR's final state).

uv run pytest tests/test_fast_ci.py
84 passed, 1 skipped in 17.77s

The 1 skip (test_dftracer_core) is pre-existing and unrelated.

Checklist

  • s3dlio>=0.9.106 in pyproject.toml, resolved from PyPI (no local wheel override)
  • All five write-path environment variables documented, including the opt-in/default-off interaction
  • _mpu_upload_with_retry() docstring clarifies verify-flag dependency
  • Fast CI: 84 passed, 1 skipped (pre-existing)
  • Live write-verification tests: 5/5 passed against the real PyPI wheel
  • PR targets mlcommons/DLIO_local_changes, not argonne-lcf/dlio_benchmark

Addresses silent write-corruption reported in mlcommons/storage#593.
Data written during datagen or checkpointing could be silently truncated;
the write appeared to succeed but the stored object was shorter than
expected.  This commit wires in the two-layer retry/verification
architecture introduced in s3dlio 0.9.104.

## obj_store_lib.py — multipart upload retry (Python layer)

ObjStoreLibStorage._mpu_upload_with_retry() is the Python-side retry
wrapper for MultipartUploadWriter (used for objects at or above
S3DLIO_MULTIPART_THRESHOLD_MB, default 16 MiB):

- On RuntimeError from writer.write() or writer.close(), calls
  writer.abort() to free the in-progress upload slot on the server,
  then sleeps S3DLIO_MPU_RETRY_DELAY_S seconds and retries with a
  fresh MultipartUploadWriter.
- After S3DLIO_MPU_MAX_RETRIES total attempts, raises RuntimeError
  chained to the original exception so the root cause is not lost.
- Logs a WARNING on each retry and ERROR on final failure.

## obj_store_lib.py — single-part PUT retry (Rust layer, transparent)

For objects below the multipart threshold, put_data() calls
self._s3dlio.put_bytes(id, payload).  As of s3dlio 0.9.104,
put_bytes() / put_bytes_async() internally run put_verified_with_retry:
after every PUT it issues a HEAD to verify the stored byte count, and
retries automatically (up to S3DLIO_PUT_MAX_RETRIES, default 3) if
there is a mismatch.  No Python-layer retry is needed or added for
this path — the Rust layer handles it transparently.

## obj_store_lib.py — environment variable documentation

Class-level constants block fully annotated with # comments explaining
every write-path environment variable, its type, default, allowable
values, and the interaction between the two retry layers:

  S3DLIO_MULTIPART_THRESHOLD_MB  (int ≥ 0 MiB, default 16)
  S3DLIO_MPU_MAX_RETRIES         (int ≥ 1, default 3)
  S3DLIO_MPU_RETRY_DELAY_S       (float ≥ 0 s, default 5)
  S3DLIO_PUT_MAX_RETRIES         (int ≥ 1, default 3, Rust layer)
  S3DLIO_PUT_RETRY_DELAY_MS      (int ≥ 0 ms, default 1000, Rust layer)

_mpu_upload_with_retry() docstring expanded to NumPy-style with full
Parameters, Retry policy, and Raises sections.

## pyproject.toml — bump minimum s3dlio version to 0.9.104

  s3dlio>=0.9.102  →  s3dlio>=0.9.104

0.9.104 is the first release that includes put_verified_with_retry
(single-part integrity) and the multipart __exit__ error-propagation
and stored-size verification fixes.  Pinning below this version would
silently omit the Rust-layer half of the write-integrity guarantee.

## README.md

Added two bullet points under "Storage Backends" and "Correctness
Fixes" summarising the write-integrity changes for storage#593.

## Tests

84 passed, 1 skipped (dftracer skip is pre-existing) via:
  uv run pytest tests/test_fast_ci.py
Companion to s3dlio v0.9.106 (russfellows/s3dlio#147), which changed the
storage#593 write-verification behavior introduced in v0.9.104 from
always-on to opt-in, default false. This was a deliberate follow-up:
v0.9.104's always-on HEAD-after-write verification added a round-trip to
every object write, which is a real throughput cost for benchmark
workloads writing many small objects during datagen. No other S3 client
library does this by default.

## dlio_benchmark/storage/obj_store_lib.py

Documented the two new opt-in flags (both read entirely inside the s3dlio
Rust library — DLIO does not read them itself, but they directly determine
whether _mpu_upload_with_retry() below ever has anything to retry):

  S3DLIO_PUT_VERIFY      (bool, default false) — single-part put_bytes()
  S3DLIO_MPU_PUT_VERIFY  (bool, default false) — multipart MultipartUploadWriter

Clarified in both the class-level comment block and the
_mpu_upload_with_retry() docstring that S3DLIO_MPU_MAX_RETRIES /
S3DLIO_MPU_RETRY_DELAY_S are only meaningful when S3DLIO_MPU_PUT_VERIFY=true
— without it, the retry loop only ever fires on a genuine API error (e.g. a
failed UploadPart), never on silent truncation, since nothing detects it.

## pyproject.toml

s3dlio>=0.9.104 -> s3dlio>=0.9.106. Local [tool.uv.sources] wheel override
commented back out — resolves from PyPI now that v0.9.106 is published.

## tests/test_write_verification_593.py (new)

Live integration tests (opt-in via DLIO_OBJECT_STORAGE_TESTS=1, matching
the existing tests/test_s3dlio_object_store.py convention; NOT part of the
tests/test_fast_ci.py checkin suite) confirming, through the real installed
s3dlio wheel rather than mocks:

- test_put_bytes_default_off_no_verification /
  test_put_bytes_opt_in_runs_verification: single-part PUT correctly skips
  or runs HEAD verification based on S3DLIO_PUT_VERIFY.
- test_mpu_upload_with_retry_default_off_no_verification /
  test_mpu_upload_with_retry_opt_in_runs_verification: multipart upload
  correctly skips or runs HEAD verification based on S3DLIO_MPU_PUT_VERIFY,
  exercising DLIO's actual ObjStoreLibStorage._mpu_upload_with_retry method
  (via a minimal stand-in for self, avoiding the full Hydra/ConfigArguments
  bootstrap) rather than a synthetic reimplementation.
- test_mpu_upload_with_retry_retries_on_genuine_error: confirms DLIO's own
  retry/abort logic still works correctly on a genuine API error (zero
  parts written), independent of the verify flag.

Verification state is observed via s3dlio.init_logging("debug") captured
with pytest's capfd fixture (fd-level capture is required: s3dlio's tracing
subscriber is native Rust code writing directly to the OS stdout file
descriptor, bypassing Python's sys.stdout object) plus an independent
stat() call the test performs itself, not by trusting s3dlio's internal
state.

Verified twice: once against s3dlio 0.9.106 installed from a local wheel,
once against s3dlio 0.9.106 installed from PyPI (this commit's state) — 5/5
passing both times, and tests/test_fast_ci.py 84 passed/1 skipped
(pre-existing, unrelated skip) against the PyPI wheel.
@FileSystemGuy FileSystemGuy merged commit f6796ed into main Jul 2, 2026
7 checks passed
@FileSystemGuy FileSystemGuy deleted the fix/593-write-integrity-s3dlio-0.9.106 branch July 2, 2026 03:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants