Support raw datastore-path folders as command inputs#16
Closed
corcra wants to merge 1 commit into
Closed
Conversation
Add a datastore-path branch to `_get_data_assets` so inputs accept `alias=datastore/folder` and mount/download `azureml://datastores/...` URIs directly, symmetric with the existing output behaviour. This removes the need to pre-register a data asset for transient datastore folders. - Add `_is_alias_datastore_path_string` predicate (pure string check, since `_extract_alias_datastore_path` exits rather than raises). - Extract shared `_datastore_uri` helper, used by inputs and outputs. - Document the new form in the --download/--mount CLI help. - Add tests for the predicate and the input routing. Closes #14 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
📖 Docs preview: https://smokeshow.helpmanual.io/6j2f3o0w0m384m3r2224/ |
Contributor
There was a problem hiding this comment.
Pull request overview
Adds symmetric support for using raw datastore folder paths as command inputs (in addition to existing output support), allowing users to mount/download alias=datastore/folder/... without registering a data asset.
Changes:
- Add
_is_alias_datastore_path_stringplus a new datastore-path routing branch in_get_data_assetsto constructazureml://datastores/<ds>/paths/<folder>inputs. - Factor shared datastore URI formatting into
_datastore_uri, used by both inputs and outputs. - Update CLI help and documentation, and add tests to validate routing and ensure
ml_client.data.getis bypassed for datastore paths.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
src/submit_aml/data.py |
Adds datastore-path detection and input construction; extracts _datastore_uri and reuses it for outputs. |
tests/test_data.py |
Adds predicate tests and input-routing tests (including “no AML lookup for datastore paths”). |
src/submit_aml/__main__.py |
Updates --download/--mount help text to document alias=datastore/folder input form. |
docs/examples.md |
Adds CLI + Python examples demonstrating mounting a raw datastore folder as an input. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Collaborator
|
Closing in favour of #17 as discussed offline. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #14
Summary
Inputs previously accepted only registered data assets (
alias=name[:version]) and prior-job output dirs (alias=job_dir:<job_id>:<path>). Outputs already supported raw datastore folders (alias=datastore/folder). This adds the symmetric input branch so jobs can mount/download an existing datastore folder directly, without pre-registering a data asset.Changes
data.py: add_is_alias_datastore_path_stringpredicate and a datastore-path branch in_get_data_assetsthat builds anazureml://datastores/<ds>/paths/<folder>Input. Branch order isjob_dir:→ datastore-path → data-asset.data.py: extract shared_datastore_urihelper, now used by both inputs and outputs.__main__.py: document thealias=datastore/folderform in--download/--mounthelp.docs/examples.md: add CLI + Python examples for the new input form.tests/test_data.py: parametrized predicate tests; input-routing tests asserting the resulting URI/mode and thatml_client.data.getis not called for datastore paths but is forname:version.Design notes
/in the RHS reliably signals a datastore path (AML data-asset names can't contain/);name:versionandjob_dir:are matched without relying on/._extract_alias_datastore_pathcallssys.exit(1)rather than raising, so it can't be wrapped in try/except.name:version(unchanged).Verification
ruffclean,tyclean,80 passed.Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com