Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 46 additions & 14 deletions docs/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,71 +168,104 @@ Run a grid sweep over hyperparameters:

Datasets are passed to the job as
[`Input`](https://learn.microsoft.com/en-us/python/api/azure-ai-ml/azure.ai.ml.input?view=azure-python)
objects.
objects. There is one flag per source type (registered data asset, datastore
folder, or previous job output), in either `--mount-*` or `--download-*` form.

=== "CLI"

Mount a dataset:
Mount or download a registered data asset:

```bash
submit-aml \
--script train.py \
--mount "data=MY-DATASET:2"
--mount-asset "data=MY-DATASET:2"
```

Download a dataset:
```bash
submit-aml \
--script train.py \
--download-asset "data=MY-DATASET"
```

Mount a folder directly from a datastore (no data-asset registration
required):

```bash
submit-aml \
--script train.py \
--download "data=MY-DATASET"
--mount-datastore "ref=mystore/exports/reference"
```

Use outputs from a previous job:
Use the outputs of a previous job:

```bash
submit-aml \
--script evaluate.py \
--mount "checkpoint=job_dir:my-training-job:models/best.pth"
--mount-job "checkpoint=my-training-job:models/best.pth"
```

=== "Python"

```python
submit_to_aml(
script_path="train.py",
datasets_mount=["data=MY-DATASET:2"],
mount_asset=["data=MY-DATASET:2"],
)

# Or download instead of mount
submit_to_aml(
script_path="train.py",
datasets_download=["data=MY-DATASET"],
download_asset=["data=MY-DATASET"],
)

# Mount a datastore folder directly
submit_to_aml(
script_path="train.py",
mount_datastore=["ref=mystore/exports/reference"],
)

# Use outputs from a previous job
submit_to_aml(
script_path="evaluate.py",
datasets_mount=["checkpoint=job_dir:my-training-job:models/best.pth"],
mount_job=["checkpoint=my-training-job:models/best.pth"],
)
```

Configure an output datastore:
!!! note "Deprecated flags"

The `--mount`, `--download` and `--output` flags (and their
`datasets_mount`, `datasets_download` and `datasets_output` Python
equivalents) are deprecated in favour of the explicit per-source flags
above. They still work but emit a deprecation warning.

Write outputs to a datastore folder, or register them as a data asset:

=== "CLI"

```bash
submit-aml \
--script train.py \
--output "results=mydatastore/experiment-outputs"
--output-datastore "results=mydatastore/experiment-outputs"
```

```bash
submit-aml \
--script train.py \
--output-asset "results=my-experiment-results"
```

=== "Python"

```python
submit_to_aml(
script_path="train.py",
datasets_output=["results=mydatastore/experiment-outputs"],
output_datastore=["results=mydatastore/experiment-outputs"],
)

# Or register the outputs as a data asset
submit_to_aml(
script_path="train.py",
output_asset=["results=my-experiment-results"],
)
```

Expand Down Expand Up @@ -404,4 +437,3 @@ Submit and wait for the job to complete, streaming logs:
```python
submit_to_aml(script_path="train.py", wait_for_completion=True)
```

130 changes: 118 additions & 12 deletions src/submit_aml/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,13 +141,18 @@ def submit(
"--download",
"-d",
help=(
"Azure ML dataset or job output folder to download. To download an Azure ML"
" dataset, the argument should take the form: alias, name and version"
" of the dataset; for example: 'vindr_dir=VINDR-CXR-V2:1'."
" If the version is omitted, the last one will be used."
" To download the output folder of a previous job, the argument should take"
" the form 'alias=job_dir:<job_id>:<path/in/job/outputs>'; for example:"
" 'checkpoint=job_dir:crusty_hat_43s6lmvb25:outputs/checkpoint-10000'."
"[DEPRECATED] Use --download-asset, --download-datastore or"
" --download-job instead. Azure ML dataset, datastore folder or job"
" output folder to download. To download an Azure ML dataset, the"
" argument should take the form: alias, name and version of the"
" dataset; for example: 'vindr_dir=VINDR-CXR-V2:1'. If the version is"
" omitted, the last one will be used. To download a datastore folder,"
" use 'alias=datastore/folder'. To download the output folder of a"
" previous job, prefer --download-job; on this deprecated flag use"
" the 'alias=job_dir:<job_id>:<path>' form, for example"
" 'checkpoint=job_dir:crusty_hat_43s6lmvb25:outputs/checkpoint-10000'"
" (the bare 'alias=<job_id>:<path>' form is only recognised as a job"
" when <path> contains a '/', otherwise it is read as a data asset)."
" The alias can be used to pass input datasets to the script, e.g.,"
r" '${{inputs.vindr_dir}}' or '${{inputs.checkpoint}}'."
" This option can be used multiple times."
Expand All @@ -159,21 +164,88 @@ def submit(
"--mount",
"-m",
help=(
"Azure ML dataset or job output folder to mount."
" For an Azure ML dataset, the alias, name and version should be provided"
" while for a job output folder, the alias, job ID and path in the job"
"[DEPRECATED] Use --mount-asset, --mount-datastore or --mount-job"
" instead. Azure ML dataset, datastore folder or job output folder to"
" mount. For an Azure ML dataset, the alias, name and version should be"
" provided; for a datastore folder, use 'alias=datastore/folder'; while"
" for a job output folder, the alias, job ID and path in the job"
" outputs should be provided. See the --download option for more"
Comment thread
fepegar marked this conversation as resolved.
" information."
),
rich_help_panel=PANEL_DATA,
),
mount_asset: Optional[List[str]] = typer.Option( # noqa: UP006, UP007
None,
"--mount-asset",
help=(
"Registered Azure ML data asset to mount, expressed as"
' "alias=name[:version]". For example: "vindr_dir=VINDR-CXR-V2:1".'
" If the version is omitted, the latest one is used."
r" Pass it to the script with '${{inputs.vindr_dir}}'."
" This option can be used multiple times."
),
rich_help_panel=PANEL_DATA,
),
download_asset: Optional[List[str]] = typer.Option( # noqa: UP006, UP007
None,
"--download-asset",
help=(
"Registered Azure ML data asset to download. Same format as"
" --mount-asset. This option can be used multiple times."
),
rich_help_panel=PANEL_DATA,
),
mount_datastore: Optional[List[str]] = typer.Option( # noqa: UP006, UP007
None,
"--mount-datastore",
help=(
"Datastore folder to mount, expressed as"
' "alias=datastore/path/to/folder".'
' For example: "ref=mystore/exports/reference".'
r" Pass it to the script with '${{inputs.ref}}'."
" This option can be used multiple times."
),
rich_help_panel=PANEL_DATA,
),
download_datastore: Optional[List[str]] = typer.Option( # noqa: UP006, UP007
None,
"--download-datastore",
help=(
"Datastore folder to download. Same format as --mount-datastore."
" This option can be used multiple times."
),
rich_help_panel=PANEL_DATA,
),
mount_job: Optional[List[str]] = typer.Option( # noqa: UP006, UP007
None,
"--mount-job",
help=(
"Output of a previous job to mount, expressed as"
' "alias=<job_id>:<path/in/run/artifacts>". The path may point at any'
" run artifact, not just files under outputs/."
' For example: "checkpoint=crusty_hat_43s6lmvb25:models/best.pth".'
r" Pass it to the script with '${{inputs.checkpoint}}'."
" This option can be used multiple times."
),
rich_help_panel=PANEL_DATA,
),
download_job: Optional[List[str]] = typer.Option( # noqa: UP006, UP007
None,
"--download-job",
help=(
"Output of a previous job to download. Same format as"
" --mount-job. This option can be used multiple times."
),
rich_help_panel=PANEL_DATA,
),
output: Optional[List[str]] = typer.Option( # noqa: UP006, UP007
None,
"--output",
"-o",
help=(
"Alias, datastore and path to folder into which outputs will be written,"
' expressed as "alias=datastore/path/to/dir".'
"[DEPRECATED] Use --output-datastore or --output-asset instead."
" Alias, datastore and path to folder into which outputs will be"
' written, expressed as "alias=datastore/path/to/dir".'
' For example: "out_dir=mydatastore/my_dataset".'
" The alias can be used to pass outputs to the script, e.g.,"
r' "${{outputs.out_dir}}".'
Expand All @@ -182,6 +254,32 @@ def submit(
),
rich_help_panel=PANEL_DATA,
),
output_datastore: Optional[List[str]] = typer.Option( # noqa: UP006, UP007
None,
"--output-datastore",
help=(
"Datastore folder into which outputs will be written, expressed as"
' "alias=datastore/path/to/dir".'
' For example: "out_dir=mydatastore/my_dataset".'
r" Pass it to the script with '${{outputs.out_dir}}'."
" This option can be used multiple times."
),
rich_help_panel=PANEL_DATA,
),
output_asset: Optional[List[str]] = typer.Option( # noqa: UP006, UP007
None,
"--output-asset",
help=(
"Register the outputs as an Azure ML data asset, expressed as"
' "alias=name[:version]". For example: "out_dir=my-results".'
" The blobs are written to the workspace's default datastore and"
" registered as a data asset; if the version is omitted, Azure ML"
" auto-increments it."
r" Pass it to the script with '${{outputs.out_dir}}'."
" This option can be used multiple times."
),
rich_help_panel=PANEL_DATA,
),
command_prefix: str = typer.Option(
get_default("command_prefix"),
help="Prefix to prepend to the command. For example, `uv run`.",
Expand Down Expand Up @@ -408,6 +506,14 @@ def submit(
datasets_download=datasets_download,
datasets_mount=datasets_mount,
datasets_output=output,
mount_asset=mount_asset,
download_asset=download_asset,
mount_datastore=mount_datastore,
download_datastore=download_datastore,
mount_job=mount_job,
download_job=download_job,
output_datastore=output_datastore,
output_asset=output_asset,
debug=debug,
dependency_groups=dependency_groups,
description=description,
Expand Down
26 changes: 24 additions & 2 deletions src/submit_aml/aml.py
Original file line number Diff line number Diff line change
Expand Up @@ -345,6 +345,14 @@ def submit_to_aml(
datasets_download: TypeOptionalStrList = None,
datasets_mount: TypeOptionalStrList = None,
datasets_output: TypeOptionalStrList = None,
mount_asset: TypeOptionalStrList = None,
download_asset: TypeOptionalStrList = None,
mount_datastore: TypeOptionalStrList = None,
download_datastore: TypeOptionalStrList = None,
mount_job: TypeOptionalStrList = None,
download_job: TypeOptionalStrList = None,
output_datastore: TypeOptionalStrList = None,
output_asset: TypeOptionalStrList = None,
debug: bool = False,
dependency_groups: list[str] | None = None,
description: str | None = None,
Expand Down Expand Up @@ -512,8 +520,22 @@ def submit_to_aml(
add_service_for_tensorboard(services, tensorboard_dir)

# Data
inputs = build_command_inputs(ml_client, datasets_download, datasets_mount)
outputs = build_command_outputs(datasets_output)
inputs = build_command_inputs(
ml_client,
mount_asset=mount_asset,
download_asset=download_asset,
mount_datastore=mount_datastore,
download_datastore=download_datastore,
mount_job=mount_job,
download_job=download_job,
legacy_mount=datasets_mount,
legacy_download=datasets_download,
)
outputs = build_command_outputs(
output_datastore=output_datastore,
output_asset=output_asset,
legacy_output=datasets_output,
)

# Sweep jobs
is_sweep = sweep_inputs is not None and len(sweep_inputs) > 0
Expand Down
Loading
Loading