fix(#538): wire StorageType.DIRECT_FS for --o-direct local O_DIRECT I/O#38
Merged
Conversation
Introduces a clean distinction between: storage_type=local_fs — standard POSIX I/O (unchanged) storage_type=direct_fs — O_DIRECT local FS via s3dlio direct:// URI storage_type=s3 — S3 object storage (always a bucket, never a path) Previously, --o-direct in mlp-storage injected storage_type=s3 to route through ObjStoreLibStorage, which is conceptually wrong: s3 always means an S3 bucket. A local filesystem path as storage_root with storage_type=s3 is now explicitly rejected with a clear error message. Changes: - storage_factory.py: DIRECT_FS → ObjStoreLibStorage (not FileStorage), so s3dlio handles all reads/writes with the direct:// URI scheme - torch_data_loader.py: add DIRECT_FS to _s3_types so the s3dlio iterable reader is selected for training data loading under --o-direct - config.py: validate DIRECT_FS (requires s3dlio library); validate that storage_type=s3 is never combined with a local path in storage_root; auto-size write_threads as I/O-bound for DIRECT_FS (same as S3) - obj_store_lib.py: preflight auto-creates checkpoint model subdirectory if the parent exists and is writable (first run has no checkpoint yet) - test_odirect_preflight.py: update missing-dir tests to use two-level deep paths; add test confirming auto-create behavior Fixes: mlcommons/storage#538 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2 tasks
Author
|
@FileSystemGuy , can you check this update, approve and merge if you're OK with it. This SHOULD fix the issue with --o-direct attempting to use object in some cases. The "--o-direct" flag is now an OPTIONAL feature flag to "--file". It will fail if used with "--object" along with other checks and test conditions. This needs to go in first so that we can then merge in the mlp-storage PR. |
FileSystemGuy
approved these changes
Jun 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix #538 (companion to mlcommons/storage PR): wire
StorageType.DIRECT_FSthroughDLIO so
--o-direct --fileuses local O_DIRECT I/O via s3dlio'sdirect://URIscheme rather than hitting an S3 endpoint.
StorageFactory:DIRECT_FS→ObjStoreLibStorage(was wronglyFileStorage)torch_data_loader._s3_types: addedDIRECT_FSso the s3dlio async reader is used during trainingconfig.py: validation thatdirect_fsrequiresstorage_library=s3dlio; validation thatstorage_type=s3with a localstorage_rootpath is rejected with a clear errorconfig.py:write_threadsauto-sizing treatsDIRECT_FSas I/O-bound (same as S3)obj_store_lib._preflight(): auto-creates a missing checkpoint subdirectory when its parent directory exists and is writable (first run has no checkpoint dir yet)New tests (DLIO)
test_creates_missing_dir_when_parent_existsTwo existing preflight tests were also updated to use two-level-deep missing paths so they still raise
ValueErrorafter the auto-create logic was added.Test plan
pytest tests/test_fast_ci.py tests/test_odirect_preflight.py— 92/92 pass🤖 Generated with Claude Code