Skip to content

Support raw datastore-path folders as command inputs#16

Closed
corcra wants to merge 1 commit into
mainfrom
corcra/datastore-path-inputs
Closed

Support raw datastore-path folders as command inputs#16
corcra wants to merge 1 commit into
mainfrom
corcra/datastore-path-inputs

Conversation

@corcra

@corcra corcra commented Jun 9, 2026

Copy link
Copy Markdown
Member

🤖 This PR was authored by an AI agent (GitHub Copilot CLI, model Claude Opus 4.8) working with @corcra. Please review accordingly.

Closes #14

Summary

Inputs previously accepted only registered data assets (alias=name[:version]) and prior-job output dirs (alias=job_dir:<job_id>:<path>). Outputs already supported raw datastore folders (alias=datastore/folder). This adds the symmetric input branch so jobs can mount/download an existing datastore folder directly, without pre-registering a data asset.

Changes

  • data.py: add _is_alias_datastore_path_string predicate and a datastore-path branch in _get_data_assets that builds an azureml://datastores/<ds>/paths/<folder> Input. Branch order is job_dir: → datastore-path → data-asset.
  • data.py: extract shared _datastore_uri helper, now used by both inputs and outputs.
  • __main__.py: document the alias=datastore/folder form in --download/--mount help.
  • docs/examples.md: add CLI + Python examples for the new input form.
  • tests/test_data.py: parametrized predicate tests; input-routing tests asserting the resulting URI/mode and that ml_client.data.get is not called for datastore paths but is for name:version.

Design notes

  • Disambiguation: a / in the RHS reliably signals a datastore path (AML data-asset names can't contain /); name:version and job_dir: are matched without relying on /.
  • The predicate is a pure string check on purpose — _extract_alias_datastore_path calls sys.exit(1) rather than raising, so it can't be wrapped in try/except.
  • Tradeoff (opt-in): a raw datastore path is mutable and unversioned, so runs record no data-asset version/lineage — identical to how the output datastore form already behaves. Users wanting versioning keep using name:version (unchanged).

Verification

ruff clean, ty clean, 80 passed.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Add a datastore-path branch to `_get_data_assets` so inputs accept
`alias=datastore/folder` and mount/download `azureml://datastores/...`
URIs directly, symmetric with the existing output behaviour. This
removes the need to pre-register a data asset for transient datastore
folders.

- Add `_is_alias_datastore_path_string` predicate (pure string check,
  since `_extract_alias_datastore_path` exits rather than raises).
- Extract shared `_datastore_uri` helper, used by inputs and outputs.
- Document the new form in the --download/--mount CLI help.
- Add tests for the predicate and the input routing.

Closes #14

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds symmetric support for using raw datastore folder paths as command inputs (in addition to existing output support), allowing users to mount/download alias=datastore/folder/... without registering a data asset.

Changes:

  • Add _is_alias_datastore_path_string plus a new datastore-path routing branch in _get_data_assets to construct azureml://datastores/<ds>/paths/<folder> inputs.
  • Factor shared datastore URI formatting into _datastore_uri, used by both inputs and outputs.
  • Update CLI help and documentation, and add tests to validate routing and ensure ml_client.data.get is bypassed for datastore paths.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
src/submit_aml/data.py Adds datastore-path detection and input construction; extracts _datastore_uri and reuses it for outputs.
tests/test_data.py Adds predicate tests and input-routing tests (including “no AML lookup for datastore paths”).
src/submit_aml/__main__.py Updates --download/--mount help text to document alias=datastore/folder input form.
docs/examples.md Adds CLI + Python examples demonstrating mounting a raw datastore folder as an input.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@fepegar

fepegar commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Closing in favour of #17 as discussed offline.

@fepegar fepegar closed this Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support raw datastore-path folders as command inputs (symmetric with outputs)

3 participants