Add explicit per-source data flags; deprecate --mount/--download/--output#17
Merged
Conversation
…tput Replace the string-format inference for command inputs/outputs with explicit per-source CLI flags, so the source type is stated in the flag name instead of guessed from the value: --mount-asset / --download-asset alias=name[:version] --mount-datastore / --download-datastore alias=datastore/folder --mount-job / --download-job alias=<job_id>:<path> --output-datastore alias=datastore/folder --output-asset alias=name[:version] This adds raw datastore folders as inputs (issue #14) and registering outputs as data assets, symmetric across mount/download/output. The old --mount/--download/--output flags (and their Python equivalents) are deprecated: they still work but emit a warning and are classified and routed to the new builders through a single set of parsers. Closes #14 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
📖 Docs preview: https://smokeshow.helpmanual.io/451x4i196c2554093s5q/ |
- Legacy --mount/--download values that use the new "job_id:path" form (a colon before the first slash) now route to the job-output builder instead of being misread as a datastore path. - The data-asset lookup failure message now reports "latest" when no version was given, matching the label used for the lookup. - Add tests for new-syntax job classification/routing and the error message. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The deprecated --mount/--download flags route datastore-folder values via legacy parsing, so their help text now mentions the 'alias=datastore/folder' form alongside the dataset and job-output forms. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…th help - The new --mount-job/--download-job flags now reject values that still carry the legacy "job_dir:" prefix with a clear error, instead of silently building an invalid ExperimentRun/dcid.job_dir/... URI. (Legacy --mount/--download still accept and translate the prefix.) - Reworded the job help/example to make clear the path may point at any run artifact, not only files under outputs/. - Added a test for the rejected legacy prefix. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The deprecation warnings now reference both the CLI flag and the equivalent Python parameter (e.g. "--mount (datasets_mount)"), so the message is accurate whether the deprecated path was reached via the CLI or the submit_to_aml API. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add a comment by _warn_deprecated outlining the phased, semver-aligned removal of the legacy --mount/--download/--output flags (keep visible and warning through 1.x, remove in 2.0.0).
Previously --mount/--download/--output emitted one generic warning per flag listing all replacements. Now each legacy value is classified and gets a tailored message naming the exact replacement flag and echoing the corrected invocation, e.g. '--mount my_alias=data_asset' -> '--mount-asset my_alias=data_asset'. Job values additionally show the translated form (the redundant job_dir: prefix dropped). Replace _warn_deprecated with _warn_legacy_input/_warn_legacy_output; thread the flag base through _legacy_input; warn per value. Add tests asserting the per-source suggested flag and value.
build_command_inputs builds a single dict where later writes win. The new
explicit/legacy flags processed mounts before downloads, so a duplicate
alias resolved to download mode -- reversing main's {**downloads, **mounts}
precedence where mount won. Process downloads first and mounts second (and
legacy after explicit) so mount keeps precedence. Add regression tests.
An alias maps to a single ${{inputs.<alias>}} / ${{outputs.<alias>}}
reference, so reusing one (across modes, source types, or flags) is a user
error. Previously the colliding entry silently overwrote the earlier one.
Add _assign_unique and route all input/output insertions through it, raising
a clear ValueError naming the alias. This supersedes the earlier
mount-over-download precedence (Copilot suggested duplicate-alias validation
as the alternative). Add tests for cross-mode, same-mode, legacy, and output
collisions.
… help - Removal-plan comment named non-existent add_inputs/add_outputs; point it at build_command_inputs/build_command_outputs and the legacy_* kwargs. - The deprecated --download help only showed the legacy job_dir: syntax; document the preferred 'alias=<job_id>:<path>' form (still accepting the old one).
- ty failed because dict is invariant, so passing TypeInputsDict / dict[str, Output] to _assign_unique(mapping: dict[str, object]) was rejected. Make _assign_unique generic over a TypeVar. - pytest failed under CI's coloured rich output: the deprecation-message tests asserted on captured stdout, where ANSI codes and line wrapping split the matched substrings. Assert on the raw message passed to logger.warning (patched) instead, via a _deprecation_log helper.
- The 'CLI sees only flags / API sees only params' inline comments were misleading: both messages are always emitted. Reword to say the log line is phrased for CLI users and the DeprecationWarning for API users. - --download help implied the bare 'alias=<job_id>:<path>' job form works on the legacy flag, but it only routes as a job when <path> has a '/'. Steer users to --download-job (or the legacy job_dir: form) and note the caveat. - build_command_inputs/outputs became keyword-only, breaking pre-existing positional callers (build_command_inputs(client, downloads, mounts); build_command_outputs(uploads)). Keep legacy_download/legacy_mount and legacy_output as leading positional params (new flags stay keyword-only). Add positional-compat regression tests.
Collaborator
Author
|
@corcra are you happy with the new interface? Would you like to give it a try? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
[Generated by a coding agent]
Summary
Closes #14.
Instead of #14's proposed string-sniffing heuristic (detecting a datastore path by the presence of
/), this makes the source type explicit in the flag name and deprecates the ambiguous flags. The plumbing for all three input source types and both output target types now lives behind one set of parsers.New flags (in the Data help panel, repeatable, long-name only)
--mount-asset/--download-assetalias=name[:version]--mount-datastore/--download-datastorealias=datastore/folder--mount-job/--download-jobalias=<job_id>:<path>--output-datastorealias=datastore/folder--output-assetalias=name[:version]Note
--mount-job/--download-jobdrop the now-redundantjob_dir:prefix (the flag name already says it's a job).Deprecations
--mount/--download/--output(and the Pythondatasets_mount/datasets_download/datasets_outputparameters) still work: each legacy value is classified and routed to the new builders, so there is a single parsing implementation with no duplicated logic. Every legacy value emits a source-specific deprecation message that names the exact replacement and echoes the corrected invocation, e.g.--mount my_alias=data_assetproduces "Replace '--mount my_alias=data_asset' with '--mount-asset my_alias=data_asset'."The two audiences are kept separate. CLI users get a log line that mentions only flags; Python API callers get a real
DeprecationWarningthat mentions only the replacement parameters (e.g. "Pass ['ref=...'] to 'mount_datastore' instead."), with astacklevelthat blames the caller.A comment by
_warn_legacy_input/_warn_legacy_outputindata.pydocuments the phased, semver-aligned removal plan: the legacy flags stay visible (with the[DEPRECATED]marker) and warn throughout1.x, then are removed in the next major version (2.0.0).Alias uniqueness
Every alias becomes a single
${{inputs.<alias>}}/${{outputs.<alias>}}reference, so reusing one (across modes, source types, or flags) is a user error rather than something to silently resolve.build_command_inputs/build_command_outputsnow raise a clearValueErrornaming the duplicated alias.Implementation
data.py—_datastore_urihelper; per-source input builders (_input_from_asset/_input_from_datastore/_input_from_job) and output builders (_output_from_datastore/_output_from_asset); legacy classifier + source-specific deprecation warnings (_warn_legacy_input/_warn_legacy_output, emitting both a CLI log line and a PythonDeprecationWarning);_assign_uniquealias-collision guard; rewrittenbuild_command_inputs/build_command_outputs.aml.py— newsubmit_to_amlparameters, passed through.__main__.py— eight new options; deprecated help text on the old three.docs/examples.md— migrated examples + datastore-input and output-asset examples + deprecation note.Tests
tests/test_data.pycovers the per-source builders (datastore/job assert the client is not called; asset asserts it is), the new job value format, output-asset name/version/type, legacy routing, the source-specific deprecation messages (CLI flags vs a PythonDeprecationWarningnaming the new parameters), and duplicate-alias errors across modes, source types, legacy flags, and outputs.Verification
uv run toxgreen (pytest, ty, ruff lint + format).uv run submit-aml --helpgroups the new options under Data.uv run submit-aml --dry-runwith a mix of new and deprecated flags shows the constructedazureml://paths and the source-specific deprecation warnings; reusing an alias under two flags fails fast with a clearValueError.