Skip to content

Add explicit per-source data flags; deprecate --mount/--download/--output#17

Merged
fepegar merged 16 commits into
mainfrom
fepegar/explicit-data-flags
Jun 23, 2026
Merged

Add explicit per-source data flags; deprecate --mount/--download/--output#17
fepegar merged 16 commits into
mainfrom
fepegar/explicit-data-flags

Conversation

@fepegar

@fepegar fepegar commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

[Generated by a coding agent]


Summary

Closes #14.

Instead of #14's proposed string-sniffing heuristic (detecting a datastore path by the presence of /), this makes the source type explicit in the flag name and deprecates the ambiguous flags. The plumbing for all three input source types and both output target types now lives behind one set of parsers.

New flags (in the Data help panel, repeatable, long-name only)

Flag Value Source / target
--mount-asset / --download-asset alias=name[:version] registered data asset
--mount-datastore / --download-datastore alias=datastore/folder raw datastore folder (new, #14)
--mount-job / --download-job alias=<job_id>:<path> previous job output
--output-datastore alias=datastore/folder write to datastore folder
--output-asset alias=name[:version] register outputs as a data asset (new)

Note --mount-job / --download-job drop the now-redundant job_dir: prefix (the flag name already says it's a job).

Deprecations

--mount / --download / --output (and the Python datasets_mount / datasets_download / datasets_output parameters) still work: each legacy value is classified and routed to the new builders, so there is a single parsing implementation with no duplicated logic. Every legacy value emits a source-specific deprecation message that names the exact replacement and echoes the corrected invocation, e.g. --mount my_alias=data_asset produces "Replace '--mount my_alias=data_asset' with '--mount-asset my_alias=data_asset'."

The two audiences are kept separate. CLI users get a log line that mentions only flags; Python API callers get a real DeprecationWarning that mentions only the replacement parameters (e.g. "Pass ['ref=...'] to 'mount_datastore' instead."), with a stacklevel that blames the caller.

A comment by _warn_legacy_input / _warn_legacy_output in data.py documents the phased, semver-aligned removal plan: the legacy flags stay visible (with the [DEPRECATED] marker) and warn throughout 1.x, then are removed in the next major version (2.0.0).

Alias uniqueness

Every alias becomes a single ${{inputs.<alias>}} / ${{outputs.<alias>}} reference, so reusing one (across modes, source types, or flags) is a user error rather than something to silently resolve. build_command_inputs / build_command_outputs now raise a clear ValueError naming the duplicated alias.

Implementation

  • data.py_datastore_uri helper; per-source input builders (_input_from_asset/_input_from_datastore/_input_from_job) and output builders (_output_from_datastore/_output_from_asset); legacy classifier + source-specific deprecation warnings (_warn_legacy_input/_warn_legacy_output, emitting both a CLI log line and a Python DeprecationWarning); _assign_unique alias-collision guard; rewritten build_command_inputs / build_command_outputs.
  • aml.py — new submit_to_aml parameters, passed through.
  • __main__.py — eight new options; deprecated help text on the old three.
  • docs/examples.md — migrated examples + datastore-input and output-asset examples + deprecation note.

Tests

tests/test_data.py covers the per-source builders (datastore/job assert the client is not called; asset asserts it is), the new job value format, output-asset name/version/type, legacy routing, the source-specific deprecation messages (CLI flags vs a Python DeprecationWarning naming the new parameters), and duplicate-alias errors across modes, source types, legacy flags, and outputs.

Verification

  • uv run tox green (pytest, ty, ruff lint + format).
  • uv run submit-aml --help groups the new options under Data.
  • uv run submit-aml --dry-run with a mix of new and deprecated flags shows the constructed azureml:// paths and the source-specific deprecation warnings; reusing an alias under two flags fails fast with a clear ValueError.

…tput

Replace the string-format inference for command inputs/outputs with explicit
per-source CLI flags, so the source type is stated in the flag name instead of
guessed from the value:

  --mount-asset / --download-asset      alias=name[:version]
  --mount-datastore / --download-datastore  alias=datastore/folder
  --mount-job / --download-job          alias=<job_id>:<path>
  --output-datastore                    alias=datastore/folder
  --output-asset                        alias=name[:version]

This adds raw datastore folders as inputs (issue #14) and registering outputs
as data assets, symmetric across mount/download/output.

The old --mount/--download/--output flags (and their Python equivalents) are
deprecated: they still work but emit a warning and are classified and routed to
the new builders through a single set of parsers.

Closes #14

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Comment thread src/submit_aml/data.py Outdated
Comment thread tests/test_data.py
Comment thread src/submit_aml/data.py Outdated
- Legacy --mount/--download values that use the new "job_id:path" form (a colon
  before the first slash) now route to the job-output builder instead of being
  misread as a datastore path.
- The data-asset lookup failure message now reports "latest" when no version was
  given, matching the label used for the lookup.
- Add tests for new-syntax job classification/routing and the error message.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comment thread src/submit_aml/__main__.py Outdated
Comment thread src/submit_aml/__main__.py
The deprecated --mount/--download flags route datastore-folder values via
legacy parsing, so their help text now mentions the 'alias=datastore/folder'
form alongside the dataset and job-output forms.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comment thread src/submit_aml/data.py
Comment thread src/submit_aml/__main__.py Outdated
…th help

- The new --mount-job/--download-job flags now reject values that still carry
  the legacy "job_dir:" prefix with a clear error, instead of silently building
  an invalid ExperimentRun/dcid.job_dir/... URI. (Legacy --mount/--download
  still accept and translate the prefix.)
- Reworded the job help/example to make clear the path may point at any run
  artifact, not only files under outputs/.
- Added a test for the rejected legacy prefix.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comment thread src/submit_aml/data.py Outdated
Comment thread src/submit_aml/data.py Outdated
The deprecation warnings now reference both the CLI flag and the equivalent
Python parameter (e.g. "--mount (datasets_mount)"), so the message is accurate
whether the deprecated path was reached via the CLI or the submit_to_aml API.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

Add a comment by _warn_deprecated outlining the phased, semver-aligned
removal of the legacy --mount/--download/--output flags (keep visible and
warning through 1.x, remove in 2.0.0).

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

Previously --mount/--download/--output emitted one generic warning per
flag listing all replacements. Now each legacy value is classified and
gets a tailored message naming the exact replacement flag and echoing the
corrected invocation, e.g. '--mount my_alias=data_asset' ->
'--mount-asset my_alias=data_asset'. Job values additionally show the
translated form (the redundant job_dir: prefix dropped).

Replace _warn_deprecated with _warn_legacy_input/_warn_legacy_output;
thread the flag base through _legacy_input; warn per value. Add tests
asserting the per-source suggested flag and value.
build_command_inputs builds a single dict where later writes win. The new
explicit/legacy flags processed mounts before downloads, so a duplicate
alias resolved to download mode -- reversing main's {**downloads, **mounts}
precedence where mount won. Process downloads first and mounts second (and
legacy after explicit) so mount keeps precedence. Add regression tests.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

An alias maps to a single ${{inputs.<alias>}} / ${{outputs.<alias>}}
reference, so reusing one (across modes, source types, or flags) is a user
error. Previously the colliding entry silently overwrote the earlier one.
Add _assign_unique and route all input/output insertions through it, raising
a clear ValueError naming the alias. This supersedes the earlier
mount-over-download precedence (Copilot suggested duplicate-alias validation
as the alternative). Add tests for cross-mode, same-mode, legacy, and output
collisions.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comment thread src/submit_aml/data.py Outdated
Comment thread src/submit_aml/__main__.py Outdated
… help

- Removal-plan comment named non-existent add_inputs/add_outputs; point it
  at build_command_inputs/build_command_outputs and the legacy_* kwargs.
- The deprecated --download help only showed the legacy job_dir: syntax;
  document the preferred 'alias=<job_id>:<path>' form (still accepting the
  old one).

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comment thread src/submit_aml/data.py Outdated
Comment thread src/submit_aml/data.py Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comment thread src/submit_aml/__main__.py Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comment thread src/submit_aml/data.py
Comment thread src/submit_aml/data.py
- ty failed because dict is invariant, so passing TypeInputsDict /
  dict[str, Output] to _assign_unique(mapping: dict[str, object]) was
  rejected. Make _assign_unique generic over a TypeVar.
- pytest failed under CI's coloured rich output: the deprecation-message
  tests asserted on captured stdout, where ANSI codes and line wrapping
  split the matched substrings. Assert on the raw message passed to
  logger.warning (patched) instead, via a _deprecation_log helper.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

- The 'CLI sees only flags / API sees only params' inline comments were
  misleading: both messages are always emitted. Reword to say the log line
  is phrased for CLI users and the DeprecationWarning for API users.
- --download help implied the bare 'alias=<job_id>:<path>' job form works on
  the legacy flag, but it only routes as a job when <path> has a '/'. Steer
  users to --download-job (or the legacy job_dir: form) and note the caveat.
- build_command_inputs/outputs became keyword-only, breaking pre-existing
  positional callers (build_command_inputs(client, downloads, mounts);
  build_command_outputs(uploads)). Keep legacy_download/legacy_mount and
  legacy_output as leading positional params (new flags stay keyword-only).
  Add positional-compat regression tests.
@fepegar

fepegar commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

@corcra are you happy with the new interface? Would you like to give it a try?

@fepegar fepegar merged commit 880c293 into main Jun 23, 2026
15 checks passed
@fepegar fepegar deleted the fepegar/explicit-data-flags branch June 23, 2026 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support raw datastore-path folders as command inputs (symmetric with outputs)

2 participants