fix(reader): O_DIRECT path triggers on uri_scheme=direct (mlcommons/storage#567)#39
Merged
russfellows merged 1 commit intoJun 29, 2026
Conversation
_LocalFSIterableMixin._localfs_init previously selected the s3dlio
O_DIRECT prefetch path only when storage_options.storage_library was
exactly "direct". But storage_type=direct_fs (the configuration
mlcommons/storage reaches via --o-direct) is *required* by
utils/config.py to set storage_library=s3dlio, NOT "direct". The two
gates disagreed: no value of storage_library satisfied both. The
buffered path was taken, plain open() was handed a direct://... URI,
and every local NPZ / NPY / JPEG run crashed in the warmup batch
with FileNotFoundError.
Fix: select the O_DIRECT path when EITHER
- storage_options.uri_scheme == "direct" (canonical signal —
matches what direct_fs validation produces), OR
- storage_options.storage_library == "direct" (legacy single-knob
form, kept for backward compatibility).
uri_scheme is the semantically honest gate: it asks "is this a
direct:// URI?" — which is precisely what _prefetch_direct requires.
storage_library is a coarser library-selector that has to mean
something different to S3 callers (where storage_library=s3dlio +
uri_scheme=s3 is correct and must NOT trigger local O_DIRECT).
Regression tests in tests/test_direct_fs_iterable_mixin_gate.py lock:
- the exact storage#567 config shape (storage_library=s3dlio +
uri_scheme=direct) reaches the O_DIRECT path,
- the legacy single-knob form still works,
- storage_library=s3dlio + uri_scheme in {s3, file} does NOT
spuriously trigger local O_DIRECT,
- bookkeeping counters are seeded on every code path.
Resolves mlcommons/storage#567.
Author
|
@russfellows You've been very productive today! Could you please do the needful here too? |
russfellows
approved these changes
Jun 29, 2026
russfellows
left a comment
There was a problem hiding this comment.
Seems correct, I guess now that we have someone to test O_DIRECT we should be able to make progress. Fingers crossed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a
storage_librarysentinel mismatch that made--o-direct/storage_type=direct_fscrash on every local NPZ / NPY / JPEG workload (UNet3D, ResNet, …) in the warmup batch withFileNotFoundError: 'direct:///mnt/…/foo.npz'.dlio_benchmark/utils/config.pyvalidates thatstorage_type=direct_fsrequiresstorage_options.storage_library == "s3dlio".dlio_benchmark/reader/_local_fs_iterable_mixin.pypreviously selected the s3dlio O_DIRECT prefetch path only whenstorage_options.storage_library == "direct".No value of
storage_librarysatisfied both, so_use_directwas alwaysFalsefordirect_fsruns. The buffered path was taken, plainopen()was handed adirect://…URI, and the run crashed.The fix
Select the O_DIRECT path when EITHER:
storage_options.uri_scheme == "direct"— canonical signal, matches whatdirect_fsvalidation produces (and what mlpstorage's--o-directemits).storage_options.storage_library == "direct"— legacy single-knob form, preserved for backward compatibility.uri_schemeis the semantically honest gate: it directly asks "is this adirect://URI?" — exactly the precondition_prefetch_directneeds.storage_libraryis a coarser library-selector that must keep meaning something different for real S3 callers, wherestorage_library=s3dlio+uri_scheme=s3is correct and must NOT trigger local O_DIRECT.Test plan
tests/test_direct_fs_iterable_mixin_gate.py— 13 tests, all pass:storage#567config shape (storage_library=s3dlio+uri_scheme=direct) →_use_direct=True.uri_scheme=directalone is sufficient.storage_library=directalone still works.storage_library=s3dlio+uri_scheme in {s3, file}does NOT spuriously trigger local O_DIRECT (S3 / file:// regression guards).storage_options→ buffered fallback, no crash.tests/test_fast_ci.py— 84 passed, 1 skipped.test_data_generator_improvements.py,test_odirect_preflight.py,test_issue_regressions.py,test_skip_listing_config.py— 47 passed.mlcommons/storage#567should now reach the workload instead of crashing in warmup.Linked
mlcommons/storage#567— the original report, with full FileNotFoundError trace and the s3dlio / config.py vs. mixin disagreement diagnosed.mlcommons/storage'spyproject.tomlshould be bumped (the file already has a comment block for tracking DLIO PR history atpyproject.toml:131-145).Notes for reviewers
_use_direct=Truebefore still does. Only the previously-brokendirect_fsshape becomes reachable._localfs_init) updated to describe both signals and explain whyuri_schemeis the canonical one.mlcommons/storage@6553c0e. Open for early review meanwhile.