launcher: package as modelopt_launcher; mcp: call console script directly#1766
launcher: package as modelopt_launcher; mcp: call console script directly#1766ChenhanYu wants to merge 2 commits into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughConverts Changesmodelopt_launcher package formalization and MCP bridge integration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tools/mcp/modelopt_mcp/bridge.py (1)
554-560:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHandle missing
modelopt-launcherbinary as a structured failure.After switching to the console script, these subprocess calls no longer catch
FileNotFoundError. Ifmodelopt-launcheris not onPATH, the MCP flow raises and can take down the request path instead of returning anok=falsediagnostic.Suggested patch
+def _launcher_not_found(argv: list[str]) -> dict: + return { + "ok": False, + "reason": "launcher_not_installed", + "diagnostic": ( + "`modelopt-launcher` was not found on PATH. " + "Install modelopt-launcher (or fix the environment) and retry." + ), + "argv": argv, + } @@ - proc = subprocess.Popen( # nosec B603 - argv, - env=child_env, - stdout=subprocess.DEVNULL, - stderr=subprocess.DEVNULL, - start_new_session=True, - ) + try: + proc = subprocess.Popen( + argv, + env=child_env, + stdout=subprocess.DEVNULL, + stderr=subprocess.DEVNULL, + start_new_session=True, + ) + except FileNotFoundError: + return _launcher_not_found(argv) @@ - except subprocess.TimeoutExpired as e: + except FileNotFoundError: + return _launcher_not_found(argv) + except subprocess.TimeoutExpired as e: @@ - except subprocess.TimeoutExpired as e: + except FileNotFoundError: + return _launcher_not_found(argv) + except subprocess.TimeoutExpired as e:Also applies to: 580-587, 749-756
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/mcp/modelopt_mcp/bridge.py` around lines 554 - 560, The subprocess.Popen calls that execute modelopt-launcher (at lines 554-560, 580-587, and 749-756) do not catch FileNotFoundError when the binary is missing from PATH, causing the process to crash instead of returning a structured failure. Wrap each of these subprocess.Popen calls in a try-except block that catches FileNotFoundError and handles it by returning a diagnostic response with ok=false rather than allowing the exception to propagate and crash the request handler.
🧹 Nitpick comments (1)
tools/launcher/modelopt_launcher/__init__.py (1)
23-29: ⚡ Quick winDefine explicit exports in package
__init__.py.
PACKAGE_DIRis part of the package contract, but this module does not declare__all__. Please make exports explicit so the public API is curated.Suggested patch
+__all__ = ["PACKAGE_DIR"] + import os as _os @@ PACKAGE_DIR: str = _os.path.dirname(_os.path.abspath(__file__))As per coding guidelines: “Each module declares its public surface with
__all__ = [...]at the top of the file.”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/launcher/modelopt_launcher/__init__.py` around lines 23 - 29, The module is missing an explicit `__all__` declaration to curate the public API. Add an `__all__` list to the module that includes the `PACKAGE_DIR` variable as an explicit public export. This should be declared near the top of the file after the imports and comments, making it clear that `PACKAGE_DIR` is part of the package's public contract.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tools/launcher/launch.py`:
- Around line 63-70: The variable _mo_symlink is only initialized inside the
_has_modelopt_src conditional block, but it is dereferenced later in the code
(around line 119 and related lines 118-121) without being guarded by the same
condition. In installed mode where _has_modelopt_src is False, this causes a
NameError when --clean is used. Guard all references to _mo_symlink (and related
cleanup operations at lines 118-121) with an if _has_modelopt_src check to
ensure the variable exists before being accessed, or initialize _mo_symlink to
None before the conditional and check for None before dereferencing it.
In `@tools/launcher/modules/Megatron-LM`:
- Line 1: The Megatron-LM submodule in tools/launcher/modules/Megatron-LM is
pinned to commit 35d5c653e38a0b5b3772627a7454b059c7bca932 which does not exist
in the upstream repository. Verify the correct commit hash by checking the
upstream Megatron-LM repository for a valid commit that matches the intended
version or functionality needed, then update the submodule pin with the correct
commit hash to ensure the submodule can be properly cloned and initialized.
---
Outside diff comments:
In `@tools/mcp/modelopt_mcp/bridge.py`:
- Around line 554-560: The subprocess.Popen calls that execute modelopt-launcher
(at lines 554-560, 580-587, and 749-756) do not catch FileNotFoundError when the
binary is missing from PATH, causing the process to crash instead of returning a
structured failure. Wrap each of these subprocess.Popen calls in a try-except
block that catches FileNotFoundError and handles it by returning a diagnostic
response with ok=false rather than allowing the exception to propagate and crash
the request handler.
---
Nitpick comments:
In `@tools/launcher/modelopt_launcher/__init__.py`:
- Around line 23-29: The module is missing an explicit `__all__` declaration to
curate the public API. Add an `__all__` list to the module that includes the
`PACKAGE_DIR` variable as an explicit public export. This should be declared
near the top of the file after the imports and comments, making it clear that
`PACKAGE_DIR` is part of the package's public contract.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 9a9ab2ce-afb4-4c6e-96be-ee31b982aad5
📒 Files selected for processing (86)
.agents/skills/eagle3-new-model/SKILL.md.agents/skills/ptq/references/launcher-guide.mdtools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16/nemo_evaluator.yamltools/launcher/launch.pytools/launcher/modelopt_launcher/.gitignoretools/launcher/modelopt_launcher/__init__.pytools/launcher/modelopt_launcher/common/check_regression.pytools/launcher/modelopt_launcher/common/eagle3/dump_offline_data.shtools/launcher/modelopt_launcher/common/eagle3/dump_offline_data_hf.shtools/launcher/modelopt_launcher/common/eagle3/dump_offline_data_vllm.shtools/launcher/modelopt_launcher/common/eagle3/hf_ptq.shtools/launcher/modelopt_launcher/common/eagle3/make_dataset.shtools/launcher/modelopt_launcher/common/eagle3/train_eagle.shtools/launcher/modelopt_launcher/common/eagle3/train_eagle_streaming.shtools/launcher/modelopt_launcher/common/hf/ptq.shtools/launcher/modelopt_launcher/common/megatron_bridge/import/import.shtools/launcher/modelopt_launcher/common/megatron_lm/export/export.shtools/launcher/modelopt_launcher/common/megatron_lm/quantize/quantize.shtools/launcher/modelopt_launcher/common/megatron_lm/quantize/task.pytools/launcher/modelopt_launcher/common/query.pytools/launcher/modelopt_launcher/common/service_utils.shtools/launcher/modelopt_launcher/common/specdec/ar_eval_mtbench.shtools/launcher/modelopt_launcher/common/specdec/dflash_online_training.shtools/launcher/modelopt_launcher/common/specdec/vllm_smoke_test.shtools/launcher/modelopt_launcher/common/specdec_bench/quick_check.shtools/launcher/modelopt_launcher/common/specdec_bench/run.shtools/launcher/modelopt_launcher/common/specdec_bench/upload_to_s3.shtools/launcher/modelopt_launcher/common/tensorrt_llm/eval.shtools/launcher/modelopt_launcher/common/tensorrt_llm/extra_llm_api_options.yamltools/launcher/modelopt_launcher/common/tensorrt_llm/query.shtools/launcher/modelopt_launcher/common/vllm/gpqa_sample.jsonltools/launcher/modelopt_launcher/common/vllm/query.shtools/launcher/modelopt_launcher/examples/MiniMax/MiniMax-M2.7-DFlash/accelerate_fsdp2_hybrid.yamltools/launcher/modelopt_launcher/examples/MiniMax/MiniMax-M2.7-DFlash/chat_template_train.jinjatools/launcher/modelopt_launcher/examples/MiniMax/MiniMax-M2.7-DFlash/hf_offline_dflash.yamltools/launcher/modelopt_launcher/examples/MiniMax/MiniMax-M2.7-DFlash/hf_online_dflash.yamltools/launcher/modelopt_launcher/examples/MiniMax/MiniMax-M2.7-DFlash/specdec_bench.yamltools/launcher/modelopt_launcher/examples/Qwen/Qwen3-0.6B/chat_template_train.jinjatools/launcher/modelopt_launcher/examples/Qwen/Qwen3-0.6B/hf_offline_dflash.yamltools/launcher/modelopt_launcher/examples/Qwen/Qwen3-0.6B/hf_online_dflash.yamltools/launcher/modelopt_launcher/examples/Qwen/Qwen3-30B-A3B/hf_streaming_dflash_multi_node.yamltools/launcher/modelopt_launcher/examples/Qwen/Qwen3-30B-A3B/hf_streaming_eagle3_multi_node.yamltools/launcher/modelopt_launcher/examples/Qwen/Qwen3-30B-A3B/megatron_lm_ptq.yamltools/launcher/modelopt_launcher/examples/Qwen/Qwen3-8B/chat_template_train.jinjatools/launcher/modelopt_launcher/examples/Qwen/Qwen3-8B/eagle3_quick_check.yamltools/launcher/modelopt_launcher/examples/Qwen/Qwen3-8B/hf_eagle3_dryrun.yamltools/launcher/modelopt_launcher/examples/Qwen/Qwen3-8B/hf_offline_eagle3.yamltools/launcher/modelopt_launcher/examples/Qwen/Qwen3-8B/hf_offline_eagle3_ptq.yamltools/launcher/modelopt_launcher/examples/Qwen/Qwen3-8B/hf_online_dflash.yamltools/launcher/modelopt_launcher/examples/Qwen/Qwen3-8B/hf_online_eagle3.yamltools/launcher/modelopt_launcher/examples/Qwen/Qwen3-8B/hf_ptq.yamltools/launcher/modelopt_launcher/examples/Qwen/Qwen3-8B/hf_streaming_dflash.yamltools/launcher/modelopt_launcher/examples/Qwen/Qwen3-8B/hf_streaming_dflash_multi_node.yamltools/launcher/modelopt_launcher/examples/Qwen/Qwen3-8B/hf_streaming_eagle3.yamltools/launcher/modelopt_launcher/examples/Qwen/Qwen3-8B/hf_streaming_eagle3_multi_node.yamltools/launcher/modelopt_launcher/examples/Qwen/Qwen3-8B/megatron_lm_ptq.yamltools/launcher/modelopt_launcher/examples/Qwen/Qwen3-8B/megatron_lm_ptq_local.yamltools/launcher/modelopt_launcher/examples/Qwen/Qwen3.5-4B/chat_template_train.jinjatools/launcher/modelopt_launcher/examples/Qwen/Qwen3.5-4B/hf_online_dflash.yamltools/launcher/modelopt_launcher/examples/Qwen/Qwen3.5-4B/query_specdec_dataset.yamltools/launcher/modelopt_launcher/examples/Qwen/Qwen3.5-4B/specdec_bench.yamltools/launcher/modelopt_launcher/examples/Qwen/Qwen3.5-4B/specdec_bench_dflash_vllm.yamltools/launcher/modelopt_launcher/examples/Qwen/Qwen3.5-4B/specdec_bench_mtp.yamltools/launcher/modelopt_launcher/examples/Qwen/Qwen3.5-4B/specdec_bench_mtp_vllm.yamltools/launcher/modelopt_launcher/examples/google/gemma-4-E4B-it/specdec_bench_mtp_vllm.yamltools/launcher/modelopt_launcher/examples/moonshotai/Kimi-K2.5/hf_dflash_dryrun.yamltools/launcher/modelopt_launcher/examples/moonshotai/Kimi-K2.5/hf_eagle3_dryrun.yamltools/launcher/modelopt_launcher/examples/moonshotai/Kimi-K2.5/hf_offline_dflash.yamltools/launcher/modelopt_launcher/examples/moonshotai/Kimi-K2.5/hf_streaming_dflash.yamltools/launcher/modelopt_launcher/examples/moonshotai/Kimi-K2.5/hf_streaming_dflash_multi_node.yamltools/launcher/modelopt_launcher/examples/moonshotai/Kimi-K2.5/hf_streaming_eagle3.yamltools/launcher/modelopt_launcher/examples/moonshotai/Kimi-K2.5/hf_streaming_eagle3_multi_node.yamltools/launcher/modelopt_launcher/examples/moonshotai/Kimi-K2.5/specdec_bench.yamltools/launcher/modelopt_launcher/examples/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16/nemo_evaluator.yamltools/launcher/modelopt_launcher/examples/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/megatron_bridge_import.yamltools/launcher/modelopt_launcher/examples/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/megatron_lm_ptq.yamltools/launcher/modelopt_launcher/examples/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16/specdec_bench_dflash_vllm.yamltools/launcher/modelopt_launcher/examples/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16/megatron_lm_ptq.yamltools/launcher/modelopt_launcher/examples/openai/gpt-oss-20b/chat_template_train.jinjatools/launcher/modelopt_launcher/examples/openai/gpt-oss-20b/hf_streaming_dflash_multi_node.yamltools/launcher/modelopt_launcher/examples/openai/gpt-oss-20b/hf_streaming_eagle3_multi_node.yamltools/launcher/modules/Megatron-LMtools/launcher/pyproject.tomltools/mcp/modelopt_mcp/bridge.pytools/mcp/pyproject.tomltools/mcp/tests/test_bridge.py
💤 Files with no reviewable changes (2)
- tools/launcher/examples/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16/nemo_evaluator.yaml
- tools/mcp/tests/test_bridge.py
| @@ -1 +1 @@ | |||
| Subproject commit c69697d0510198acddf124694dcfd5057021092f | |||
| Subproject commit 35d5c653e38a0b5b3772627a7454b059c7bca932 | |||
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
commit="35d5c653e38a0b5b3772627a7454b059c7bca932"
submodule_path="tools/launcher/modules/Megatron-LM"
echo "Checking submodule URL from .gitmodules..."
url="$(git config --file .gitmodules --get "submodule.${submodule_path}.url")"
test -n "${url}"
echo "Confirming pinned commit exists on the configured remote..."
git ls-remote "${url}" "${commit}" | grep -F "${commit}"
echo "If the submodule is already initialized locally, verify launcher-required paths..."
if git -C "${submodule_path}" rev-parse --is-inside-work-tree >/dev/null 2>&1; then
git -C "${submodule_path}" cat-file -e "${commit}^{commit}"
for required_path in \
"megatron" \
"examples" \
"examples/post_training/modelopt/quantize.sh" \
"examples/post_training/modelopt/mmlu.sh" \
"examples/post_training/modelopt/export.sh"
do
git -C "${submodule_path}" ls-tree -r --name-only "${commit}" -- "${required_path}" | grep -q .
echo "OK: ${required_path}"
done
else
echo "Submodule is not initialized in this checkout; initialize it in CI/dev and run the path checks above."
fiRepository: NVIDIA/Model-Optimizer
Length of output: 169
The pinned Megatron-LM commit does not exist in the upstream repository.
The commit 35d5c653e38a0b5b3772627a7454b059c7bca932 was not found when querying the remote Megatron-LM repository. Cloning or initializing this submodule will fail. Verify the correct commit hash and update the pin before merge.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tools/launcher/modules/Megatron-LM` at line 1, The Megatron-LM submodule in
tools/launcher/modules/Megatron-LM is pinned to commit
35d5c653e38a0b5b3772627a7454b059c7bca932 which does not exist in the upstream
repository. Verify the correct commit hash by checking the upstream Megatron-LM
repository for a valid commit that matches the intended version or functionality
needed, then update the submodule pin with the correct commit hash to ensure the
submodule can be properly cloned and initialized.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1766 +/- ##
==========================================
+ Coverage 74.28% 74.31% +0.03%
==========================================
Files 511 511
Lines 56360 56360
==========================================
+ Hits 41867 41884 +17
+ Misses 14493 14476 -17
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
…ctly
tools/launcher/__init__.py gains PACKAGE_DIR so the directory is an
importable package named modelopt_launcher (via package-dir mapping in
pyproject.toml). common/ and examples/ stay in place — no file moves.
pyproject.toml adds:
- packages = ["modelopt_launcher"] + package-dir = {"modelopt_launcher": "."}
- package-data for common/**/* and examples/**/*.{yaml,jinja}
- console_scripts: modelopt-launcher = "modelopt_launcher.launch:main"
launch.py:
- imports from modelopt_launcher.{core,slurm_config} (works in both
uv run and installed console-script modes via editable install)
- adds _has_modelopt_src flag: skips packaging modelopt source when
running as an installed console script (cluster container already
has modelopt pre-installed)
- adds main() entry point
tools/mcp/modelopt_mcp/bridge.py:
- deletes _find_launcher_dir() (75-line filesystem walker) and
_launcher_dir_not_found_response(); 7 corresponding tests removed
- _find_launcher_examples_dir() simplified to 2 strategies:
env override → import modelopt_launcher
- submit_job_impl and _submit_job_dry_run switch from
["uv", "run", "launch.py"] with cwd=launcher_dir to
["modelopt-launcher"] with no cwd required
- _resolve_experiment_dir drops the _find_launcher_dir() fallback
- read_cluster_artifact_impl drops cwd= from nemo subprocess
tools/mcp/pyproject.toml declares modelopt-launcher as a proper
dependency (dev: editable path ../launcher; published: PyPI).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Chenhan Yu <chenhany@nvidia.com>
6320002 to
00b2c13
Compare
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tools/launcher/__init__.py (1)
16-20: ⚡ Quick winDeclare
PACKAGE_DIRin__all__.
PACKAGE_DIRis now a public package contract, but this__init__.pydoes not define an explicit public surface.Suggested patch
import os as _os +__all__ = ["PACKAGE_DIR"] + PACKAGE_DIR: str = _os.path.dirname(_os.path.abspath(__file__))As per coding guidelines, "
__init__.py: Define the public API with__all__at the top of each module."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/launcher/__init__.py` around lines 16 - 20, The `__init__.py` file is missing an explicit `__all__` declaration to define the public API surface. Add an `__all__` list after the module docstring and include the `PACKAGE_DIR` variable in it to explicitly declare it as a public export of the package. This makes the public contract clear to users and follows the project's coding guidelines.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tools/launcher/launch.py`:
- Around line 60-61: The issue is that the code checks for the ModelOpt source
at a symlink path (tools/launcher/modules/Model-Optimizer/modelopt) in the
_modelopt_src and _has_modelopt_src variables on lines 60-61, but this symlink
only exists after setup. In a clean dev checkout, this check will always fail,
causing _has_modelopt_src to be false and preventing proper ModelOpt
configuration in the include patterns (lines 79-89) and MODELOPT_SRC_PATH
assignment (line 96). Instead of checking the symlink destination path, detect
the actual source checkout location first by checking for an alternative path
where the ModelOpt source code would exist in the repository before the symlink
is created, then use that result for the subsequent logic.
---
Nitpick comments:
In `@tools/launcher/__init__.py`:
- Around line 16-20: The `__init__.py` file is missing an explicit `__all__`
declaration to define the public API surface. Add an `__all__` list after the
module docstring and include the `PACKAGE_DIR` variable in it to explicitly
declare it as a public export of the package. This makes the public contract
clear to users and follows the project's coding guidelines.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 9ffddbe8-88e5-403b-89e2-cdbcf5d6e575
📒 Files selected for processing (7)
tools/launcher/.gitignoretools/launcher/__init__.pytools/launcher/launch.pytools/launcher/pyproject.tomltools/mcp/modelopt_mcp/bridge.pytools/mcp/pyproject.tomltools/mcp/tests/test_bridge.py
💤 Files with no reviewable changes (1)
- tools/mcp/tests/test_bridge.py
✅ Files skipped from review due to trivial changes (1)
- tools/launcher/.gitignore
🚧 Files skipped from review as they are similar to previous changes (2)
- tools/mcp/pyproject.toml
- tools/mcp/modelopt_mcp/bridge.py
|
/claude review |
There was a problem hiding this comment.
Claude review summary
Scope: This PR touches only tools/ (launcher packaging + MCP bridge integration) — no modelopt/ source, examples/, or model-optimization code. Reviewed all 7 changed files; traced the modelopt_launcher.* import switch through core.py/slurm_config.py, the _has_modelopt_src gating in launch.py, and the bridge's subprocess + env-propagation changes.
Findings by severity: CRITICAL: 1 · IMPORTANT: 0 · SUGGESTION: 0
Most impactful finding
- [CRITICAL]
_mo_symlinkNameError on the installed--cleanpath (tools/launcher/launch.py:117). The variable is now only assigned inside theif _has_modelopt_src:block, but thecleanbranch references it unconditionally. In the installed console-script mode (modelopt-launcher, the feature this PR adds),_has_modelopt_srcisFalse, somodelopt-launcher --clean ...crashes with NameError rather than running. Inline comment has a suggested gate.
Assessment
The core refactor is sound: the modules/Model-Optimizer symlink is git-tracked, so _has_modelopt_src resolves True on dev checkouts (no chicken-and-egg with symlink auto-creation), and the dev-vs-installed packager split is coherent. Test-side bare imports (from core import ...) still work via conftest.py's sys.path insertion, unaffected by the launch.py package-qualified imports. The bridge simplification (dropping _find_launcher_dir/cwd walk-up in favor of import modelopt_launcher) is a clean reduction, and NEMORUN_HOME is still pinned via child_env so the cwd removal does not break experiment-dir resolution.
Risk: low-to-moderate, isolated to the one crash on the --clean flag in installed mode. Recommend fixing that gate before merge.
…__, launcher not found - launch.py: detect dev checkout via MODELOPT_ROOT/modelopt (actual dir) instead of the symlink path which doesn't exist in a clean checkout - launch.py: initialize _mo_symlink = None before the conditional; guard --clean to raise ValueError instead of NameError in installed mode - __init__.py: add __all__ = ["PACKAGE_DIR"] per coding guidelines - bridge.py: add _launcher_not_installed() helper; catch FileNotFoundError on all three modelopt-launcher subprocess call sites (Docker Popen, Slurm run, dry-run run) and return structured ok=false instead of crash Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Chenhan Yu <chenhany@nvidia.com>
Summary
tools/launcher/__init__.pygainsPACKAGE_DIR, making the directory an importable package namedmodelopt_launchervia apackage-dirmapping inpyproject.toml. No file moves —common/andexamples/stay exactly where they are.pyproject.tomladds thepackages/package-dir/package-datadeclarations and amodelopt-launcherconsole script entry point.launch.pyswitches imports tomodelopt_launcher.{core,slurm_config}(works in bothuv run launch.pyvia editable install and the installed console script), adds_has_modelopt_srcto skip packaging modelopt source when running installed (cluster container already has it), and addsmain().bridge.pydeletes the 75-line_find_launcher_dir()filesystem walker and_launcher_dir_not_found_response(); simplifies_find_launcher_examples_dir()to 2 strategies (env override →import modelopt_launcher); switchessubmit_jobsubprocesses from["uv", "run", "launch.py"]withcwd=launcher_dirto["modelopt-launcher"]with nocwd.tools/mcp/pyproject.tomldeclaresmodelopt-launcheras a proper dependency (dev: editable../launcher; published: PyPI), replacing a lengthy comment explaining why it could not be declared.Test plan
cd tools/launcher && uv run python3 -m pytest tests/ -v— all 65 tests passcd tools/mcp && uv run python3 -m pytest tests/ -v— 38 tests passuv run launch.py --yaml examples/Qwen/Qwen3-8B/megatron_lm_ptq.yaml --dryrun --yes -v— dry-run resolves correctlymodelopt-launcher --yaml examples/Qwen/Qwen3-8B/megatron_lm_ptq.yaml --dryrun --yes— console script works afterpip install -e tools/launchercd tools/mcp && uvx modelopt-mcp— MCP server starts without FileNotFoundError🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
modelopt-launcherCLI command as the standardized entry point for launching optimization jobs.Bug Fixes
Chores
Tests