feat(deployments): scaffold plugin API and registry (AIRCORE-755)#280
feat(deployments): scaffold plugin API and registry (AIRCORE-755)#280tylersbray wants to merge 3 commits into
Conversation
Add nemo-deployments-plugin as the substrate-agnostic contract layer for AIRCORE-755: DeploymentConfig/Deployment/Volume entities, v1 CRUD routes, controller status endpoints, DeploymentBackend ABC, and named executor registry. Plugin starts with zero backends; reconciler and docker/k8s backends land in follow-on tickets. Signed-off-by: Tyler Bray <tbray@nvidia.com>
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds a NeMo Deployments plugin scaffold: types and entity schemas, API v1 CRUD endpoints (deployment-configs, deployments, volumes, status), backend abstraction and executor registry, prerequisite-cycle validation, service wiring, README and comprehensive unit tests. ChangesNeMo Deployments Plugin
Suggested Reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 10
🧹 Nitpick comments (3)
plugins/nemo-deployments/README.md (1)
3-4: 💤 Low valueClarify "deployment lifecycle" with concrete scope.
Replace the vague term "deployment lifecycle" with specific capabilities: e.g., "Create, read, update, and delete deployment configurations and active deployments; manage volumes; track deployment status."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugins/nemo-deployments/README.md` around lines 3 - 4, Update the README description to replace the vague phrase "deployment lifecycle" with a concrete list of supported capabilities: state that the plugin provides create/read/update/delete (CRUD) for deployment configurations and active deployments, volume management, deployment status tracking, entity schemas, CRUD APIs, a DeploymentBackend abstract base class, and an executor registry; ensure the one-line summary in the README (top paragraph) explicitly enumerates these capabilities so readers immediately see the plugin's scope.Source: Coding guidelines
plugins/nemo-deployments/src/nemo_deployments_plugin/entities.py (1)
243-247: ⚡ Quick winNo-op validator is misleading.
This
@model_validatordoes not validate anything and always returnsself, so it implies an invariant that is not enforced. Convert this to a plain code comment/doc note, or implement an actual check where bothDeploymentand itsDeploymentConfig.restart_policyare available.Suggested cleanup
- `@model_validator`(mode="after") - def _document_status_restart_policy_consistency(self) -> Deployment: - """Document restart_policy vs terminal status expectations for the reconciler.""" - # Reconciler enforces: Never → SUCCEEDED terminal; Always/OnFailure → READY while running. - return self + # Restart-policy/terminal-status consistency is enforced by the reconciler + # when DeploymentConfig is resolved for this Deployment.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugins/nemo-deployments/src/nemo_deployments_plugin/entities.py` around lines 243 - 247, The _document_status_restart_policy_consistency method is a no-op model_validator that misleads readers; either remove the `@model_validator` and convert the docstring into a plain class/module comment, or implement an actual validation that accesses self.status and self.deployment_config.restart_policy (or DeploymentConfig.restart_policy) and raises a ValueError/ValidationError when the pair is inconsistent (e.g., restart_policy == "Never" requires terminal SUCCEEDED state, restart_policy in {"Always","OnFailure"} requires non-terminal/READY while running). Update or remove the decorator accordingly and keep the descriptive text explaining the reconciler expectations.plugins/nemo-deployments/tests/unit/test_registry.py (1)
35-53: ⚡ Quick winUse concrete return types in
_StubBackendinstead ofAny.These methods can return concrete
LogResult/VolumeStatusUpdatetypes directly; keepingAnyhere weakens contract checking in tests.As per coding guidelines, "Always prefer concrete type hints over string-based ones; do not import types under TYPE_CHECKING, instead prefer regular imports when possible."
Proposed diff
from typing import Any import pytest -from nemo_deployments_plugin.backends.abc import BackendStatusUpdate, DeploymentBackend +from nemo_deployments_plugin.backends.abc import ( + BackendStatusUpdate, + DeploymentBackend, + LogResult, + VolumeStatusUpdate, +) @@ - async def get_logs(self, **kwargs: Any) -> Any: - from nemo_deployments_plugin.backends.abc import LogResult - + async def get_logs(self, **kwargs: Any) -> LogResult: return LogResult(lines=[]) @@ - async def create_volume(self, **kwargs: Any) -> Any: - from nemo_deployments_plugin.backends.abc import VolumeStatusUpdate - + async def create_volume(self, **kwargs: Any) -> VolumeStatusUpdate: return VolumeStatusUpdate(status="PENDING") @@ - async def read_volume_status(self, **kwargs: Any) -> Any: - from nemo_deployments_plugin.backends.abc import VolumeStatusUpdate - + async def read_volume_status(self, **kwargs: Any) -> VolumeStatusUpdate: return VolumeStatusUpdate(status="BOUND") @@ - async def delete_volume(self, workspace: str, name: str) -> Any: - from nemo_deployments_plugin.backends.abc import VolumeStatusUpdate - + async def delete_volume(self, workspace: str, name: str) -> VolumeStatusUpdate: return VolumeStatusUpdate(status="RELEASED")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@plugins/nemo-deployments/tests/unit/test_registry.py` around lines 35 - 53, Update _StubBackend methods to use concrete return types instead of Any: change signatures of get_logs to return nemo_deployments_plugin.backends.abc.LogResult and create_volume/read_volume_status/delete_volume to return nemo_deployments_plugin.backends.abc.VolumeStatusUpdate; add regular imports for LogResult and VolumeStatusUpdate at top of the test file (do not use TYPE_CHECKING or string types) and return the concrete instances as already constructed in the method bodies so the type hints match the returned objects.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@plugins/nemo-deployments/pyproject.toml`:
- Around line 7-11: The dependencies list in pyproject.toml for the
nemo-deployments plugin is missing fastapi, causing runtime ModuleNotFoundError
when plugin API modules import it; update the [dependencies] block (the
dependencies = [...] list in pyproject.toml for the nemo-deployments package) to
include "fastapi" (optionally with a suitable version constraint, e.g. >=0.95)
so fastapi is installed at runtime alongside "nemo-platform",
"nemo-platform-plugin", and "pydantic>=2.10.6".
In `@plugins/nemo-deployments/README.md`:
- Around line 1-5: Add a top-level "Prerequisites" section before the existing
"# NeMo Deployments Plugin" content that lists required software and setup steps
(e.g., supported Python version, required package manager such as uv, NeMo
Platform runtime availability/configuration, any required environment variables
or credentials, and optional tooling like Docker/uv CLI if applicable); update
README.md to include this section title "Prerequisites" and concise bullet
points so readers know what must be installed or configured before continuing.
- Around line 15-18: Add a "Next Steps" section at the end of
plugins/nemo-deployments/README.md titled "Next Steps" that provides cross-links
to the plugin's API reference, the backend implementation guide (reference
AIRCORE-756, AIRCORE-757, AIRCORE-758), and any other related plugin
documentation; place it immediately after the setup/test commands shown (the uv
sync / pytest block), include one-line descriptions for each link (what the
reader will find there), and ensure links are relative or to the canonical docs
site consistent with other plugin READMEs.
- Around line 1-18: The README mixes reference, how-to, and explanatory content;
split it into a single-quadrant page or reorganize with progressive disclosure.
Choose either: (A) convert this README into a HOW-TO "Set up and test the
plugin" that keeps the uv sync/pytest commands and a brief 30s summary, then
move the API base path and architecture/Backend/DeploymentBackend ABC details
into separate REFERENCE and EXPLANATION pages; or (B) expand this README into
layered sections (Layer 1: 30s summary of purpose/value and intended audience;
Layer 2: 3–5min core concepts and mention of DeploymentBackend and executor
registry; Layer 3: 10min+ API endpoints with the
`/apis/deployments/v1/workspaces/{workspace}/...` example and cross-workspace
sentinel `-`; Layer 4: links to separate reference pages for full API spec and
schema). Update headings accordingly (e.g., "How to set up and test",
"Concepts", "API reference", "Further reading") and move details about the
DeploymentBackend ABC and executor registry into dedicated files or sections to
avoid mixing quadrants.
In `@plugins/nemo-deployments/src/nemo_deployments_plugin/api/v1/dependencies.py`:
- Around line 17-24: The code currently lets X-NMP-Principal-On-Behalf-Of
override X-NMP-Principal-Id via _effective_principal_id, allowing spoofing;
change require_service_principal to read the principal directly from
request.headers.get(_PRINCIPAL_ID_HEADER, "") (or alter _effective_principal_id
to not consult _ON_BEHALF_OF_HEADER) and perform the startswith("service:")
check against that trustworthy value (use the symbols _PRINCIPAL_ID_HEADER,
_ON_BEHALF_OF_HEADER, _effective_principal_id, and require_service_principal to
locate and update the logic).
In `@plugins/nemo-deployments/src/nemo_deployments_plugin/api/v1/deployments.py`:
- Around line 140-143: The update call to entity_client.update(deployment) can
raise NemoEntityConflictError during concurrent writes; modify the try/except
around that call to also catch NemoEntityConflictError (in addition to
NemoEntityNotFoundError) and return an HTTP 409 conflict response (matching the
status-update endpoints) with a concise message including the deployment name,
while keeping the existing logger.info for NemoEntityNotFoundError; locate the
entity_client.update(deployment) call and add the new except
NemoEntityConflictError branch to produce the 409 response.
In `@plugins/nemo-deployments/src/nemo_deployments_plugin/backends/registry.py`:
- Around line 56-62: The registry construction loop in the classmethod that
builds executors is not failure-atomic: if one backend instantiation
(classes[spec.backend](sdk, spec.config)) raises, previously-created backend
instances remain running. Wrap the executor creation loop in a try/except, and
on any exception iterate over the already-created executors (executors.values())
and call their shutdown/close method (e.g., shutdown() or close(), whichever the
backend implements) inside its own try/except to swallow secondary errors, then
re-raise the original exception; keep the existing UnknownBackendTypeError check
and final return of cls(executors, default_executor=default_executor).
In `@plugins/nemo-deployments/src/nemo_deployments_plugin/config.py`:
- Around line 32-33: Add a validation step to the config model that defines
port_range_start and port_range_end to enforce 1 <= port_range_start <= 65535, 1
<= port_range_end <= 65535 and port_range_start <= port_range_end at load time;
implement this as a Pydantic validator (e.g., a `@root_validator` or two
`@validator` methods) on the config class that contains the port_range_start and
port_range_end fields so it raises a clear ValueError when bounds or ordering
are invalid, preventing downstream runtime failures.
In `@plugins/nemo-deployments/src/nemo_deployments_plugin/schema.py`:
- Line 77: The restart_policy field in the schema currently uses a loose type
(restart_policy: str | None = None); change it to use the RestartPolicy enum
type (e.g., restart_policy: RestartPolicy | None = None) so only valid policies
are accepted; import or reference the RestartPolicy enum (or class) used
elsewhere in the repo and update any related validation or type annotations in
the same module (schema.py) to use RestartPolicy instead of raw str.
In `@plugins/nemo-deployments/tests/unit/test_api_volumes.py`:
- Around line 32-39: The test should assert the object passed into
mock_entity_client.create to ensure the route sets Volume.status="PENDING"
before persisting; in test_create_volume_201 capture the awaited call to
mock_entity_client.create (e.g., inspect mock_entity_client.create.await_args or
use assert_awaited_once and read call args) and assert the passed volume object
has status == "PENDING" (and optionally name/size match the request) rather than
only asserting the mocked return value.
---
Nitpick comments:
In `@plugins/nemo-deployments/README.md`:
- Around line 3-4: Update the README description to replace the vague phrase
"deployment lifecycle" with a concrete list of supported capabilities: state
that the plugin provides create/read/update/delete (CRUD) for deployment
configurations and active deployments, volume management, deployment status
tracking, entity schemas, CRUD APIs, a DeploymentBackend abstract base class,
and an executor registry; ensure the one-line summary in the README (top
paragraph) explicitly enumerates these capabilities so readers immediately see
the plugin's scope.
In `@plugins/nemo-deployments/src/nemo_deployments_plugin/entities.py`:
- Around line 243-247: The _document_status_restart_policy_consistency method is
a no-op model_validator that misleads readers; either remove the
`@model_validator` and convert the docstring into a plain class/module comment, or
implement an actual validation that accesses self.status and
self.deployment_config.restart_policy (or DeploymentConfig.restart_policy) and
raises a ValueError/ValidationError when the pair is inconsistent (e.g.,
restart_policy == "Never" requires terminal SUCCEEDED state, restart_policy in
{"Always","OnFailure"} requires non-terminal/READY while running). Update or
remove the decorator accordingly and keep the descriptive text explaining the
reconciler expectations.
In `@plugins/nemo-deployments/tests/unit/test_registry.py`:
- Around line 35-53: Update _StubBackend methods to use concrete return types
instead of Any: change signatures of get_logs to return
nemo_deployments_plugin.backends.abc.LogResult and
create_volume/read_volume_status/delete_volume to return
nemo_deployments_plugin.backends.abc.VolumeStatusUpdate; add regular imports for
LogResult and VolumeStatusUpdate at top of the test file (do not use
TYPE_CHECKING or string types) and return the concrete instances as already
constructed in the method bodies so the type hints match the returned objects.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 13375584-28cd-4917-9bf5-7df481d974cb
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (25)
plugins/nemo-deployments/README.mdplugins/nemo-deployments/pyproject.tomlplugins/nemo-deployments/src/nemo_deployments_plugin/api/v1/dependencies.pyplugins/nemo-deployments/src/nemo_deployments_plugin/api/v1/deployment_configs.pyplugins/nemo-deployments/src/nemo_deployments_plugin/api/v1/deployments.pyplugins/nemo-deployments/src/nemo_deployments_plugin/api/v1/status.pyplugins/nemo-deployments/src/nemo_deployments_plugin/api/v1/volumes.pyplugins/nemo-deployments/src/nemo_deployments_plugin/backends/abc.pyplugins/nemo-deployments/src/nemo_deployments_plugin/backends/registry.pyplugins/nemo-deployments/src/nemo_deployments_plugin/config.pyplugins/nemo-deployments/src/nemo_deployments_plugin/constants.pyplugins/nemo-deployments/src/nemo_deployments_plugin/entities.pyplugins/nemo-deployments/src/nemo_deployments_plugin/schema.pyplugins/nemo-deployments/src/nemo_deployments_plugin/service.pyplugins/nemo-deployments/src/nemo_deployments_plugin/validation.pyplugins/nemo-deployments/tests/unit/helpers.pyplugins/nemo-deployments/tests/unit/test_api_deployment_configs.pyplugins/nemo-deployments/tests/unit/test_api_deployments.pyplugins/nemo-deployments/tests/unit/test_api_status.pyplugins/nemo-deployments/tests/unit/test_api_volumes.pyplugins/nemo-deployments/tests/unit/test_entities.pyplugins/nemo-deployments/tests/unit/test_prerequisite_validation.pyplugins/nemo-deployments/tests/unit/test_registry.pyplugins/nemo-deployments/tests/unit/test_service_startup.pypyproject.toml
| # NeMo Deployments Plugin | ||
|
|
||
| Substrate-agnostic deployment lifecycle for the NeMo Platform. This plugin provides | ||
| entity schemas, CRUD APIs, a `DeploymentBackend` ABC, and an executor registry. | ||
| Backend implementations and the reconcile controller are delivered in follow-on tickets. | ||
|
|
||
| ## API base path | ||
|
|
||
| `/apis/deployments/v1/workspaces/{workspace}/...` | ||
|
|
||
| Cross-workspace bulk queries use the entity-store sentinel workspace ``-``: | ||
|
|
||
| ``GET /apis/deployments/v1/workspaces/-/deployments?status_in=pending,starting`` | ||
|
|
||
| ```bash | ||
| uv sync | ||
| uv run pytest plugins/nemo-deployments/tests/unit -v | ||
| ``` |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift
README violates Diataxis structure and progressive-disclosure guidelines.
This page mixes three Diataxis quadrants (REFERENCE for API path, HOW-TO for commands, EXPLANATION for plugin overview) into 18 lines. Per guidelines, "Each documentation page should fit ONE Diataxis quadrant; do not mix tutorials with reference tables or how-tos with architecture explanations."
Restructure by either narrowing to one quadrant (e.g., HOW-TO: "Set up and test the plugin") and moving API/architecture details to separate pages, or expanding into a proper multi-section structure with progressive disclosure: Layer 1 (30s: what/who/value) → Layer 2 (3-5min: core concepts and architecture) → Layer 3 (10min+: API endpoints and advanced features) → Layer 4 (separate reference pages for complete specs).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plugins/nemo-deployments/README.md` around lines 1 - 18, The README mixes
reference, how-to, and explanatory content; split it into a single-quadrant page
or reorganize with progressive disclosure. Choose either: (A) convert this
README into a HOW-TO "Set up and test the plugin" that keeps the uv sync/pytest
commands and a brief 30s summary, then move the API base path and
architecture/Backend/DeploymentBackend ABC details into separate REFERENCE and
EXPLANATION pages; or (B) expand this README into layered sections (Layer 1: 30s
summary of purpose/value and intended audience; Layer 2: 3–5min core concepts
and mention of DeploymentBackend and executor registry; Layer 3: 10min+ API
endpoints with the `/apis/deployments/v1/workspaces/{workspace}/...` example and
cross-workspace sentinel `-`; Layer 4: links to separate reference pages for
full API spec and schema). Update headings accordingly (e.g., "How to set up and
test", "Concepts", "API reference", "Further reading") and move details about
the DeploymentBackend ABC and executor registry into dedicated files or sections
to avoid mixing quadrants.
Source: Coding guidelines
| def test_create_volume_201(client: TestClient, mock_entity_client: AsyncMock) -> None: | ||
| mock_entity_client.create.return_value = make_volume() | ||
| resp = client.post( | ||
| "/apis/deployments/v1/workspaces/default/volumes", | ||
| json={"name": "vol1", "size": "5Gi"}, | ||
| ) | ||
| assert resp.status_code == 201 | ||
| assert resp.json()["status"] == "PENDING" |
There was a problem hiding this comment.
Assert the object passed to create, not just the mocked response.
This test can still pass if the route stops setting Volume.status="PENDING" before persistence. Assert the awaited create(...) argument to lock the contract.
Suggested patch
def test_create_volume_201(client: TestClient, mock_entity_client: AsyncMock) -> None:
mock_entity_client.create.return_value = make_volume()
resp = client.post(
"/apis/deployments/v1/workspaces/default/volumes",
json={"name": "vol1", "size": "5Gi"},
)
assert resp.status_code == 201
assert resp.json()["status"] == "PENDING"
+ mock_entity_client.create.assert_awaited_once()
+ created = mock_entity_client.create.await_args.args[0]
+ assert created.status == "PENDING"
+ assert created.workspace == "default"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_create_volume_201(client: TestClient, mock_entity_client: AsyncMock) -> None: | |
| mock_entity_client.create.return_value = make_volume() | |
| resp = client.post( | |
| "/apis/deployments/v1/workspaces/default/volumes", | |
| json={"name": "vol1", "size": "5Gi"}, | |
| ) | |
| assert resp.status_code == 201 | |
| assert resp.json()["status"] == "PENDING" | |
| def test_create_volume_201(client: TestClient, mock_entity_client: AsyncMock) -> None: | |
| mock_entity_client.create.return_value = make_volume() | |
| resp = client.post( | |
| "/apis/deployments/v1/workspaces/default/volumes", | |
| json={"name": "vol1", "size": "5Gi"}, | |
| ) | |
| assert resp.status_code == 201 | |
| assert resp.json()["status"] == "PENDING" | |
| mock_entity_client.create.assert_awaited_once() | |
| created = mock_entity_client.create.await_args.args[0] | |
| assert created.status == "PENDING" | |
| assert created.workspace == "default" |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@plugins/nemo-deployments/tests/unit/test_api_volumes.py` around lines 32 -
39, The test should assert the object passed into mock_entity_client.create to
ensure the route sets Volume.status="PENDING" before persisting; in
test_create_volume_201 capture the awaited call to mock_entity_client.create
(e.g., inspect mock_entity_client.create.await_args or use assert_awaited_once
and read call args) and assert the passed volume object has status == "PENDING"
(and optionally name/size match the request) rather than only asserting the
mocked return value.
Rename backends/abc.py to base.py and extract shared types to break the stdlib abc shadowing cycle. Fix auth to use Principal-Id only, add registry rollback and port-range validation, and wire pytest pythonpath for helpers. Signed-off-by: Tyler Bray <tbray@nvidia.com>
Use import abc to avoid self-import false positive and fail backends via init() hook so DeploymentBackend.__init__ is exercised in tests. Signed-off-by: Tyler Bray <tbray@nvidia.com>
Introduce DeploymentsController with deployment/volume reconcilers, prerequisite gating, drift recovery, and orphan cleanup on top of the 755 plugin scaffold. Stacks on PR #280 (AIRCORE-755). AIRCORE-758 Signed-off-by: Tyler Bray <tbray@nvidia.com>
benmccown
left a comment
There was a problem hiding this comment.
Mostly minor stuff that an agent should be able to address without too much issue. Nice work so far.
| Substrate-agnostic deployment lifecycle for the NeMo Platform. This plugin provides | ||
| entity schemas, CRUD APIs, a `DeploymentBackend` ABC, and an executor registry. | ||
|
|
||
| **Scope (this ticket):** scaffold only — entity types, v1 CRUD routes, backend contract, |
There was a problem hiding this comment.
Nit: We can probs drop everything from this Scope line through to the end of the file, save for the Tests section.
After the work is done and backends are ready, I'm sure an agent can cook up a good README that's helpful and doesn't have transient info in it.
|
|
||
| class CreateDeploymentRequest(BaseModel): | ||
| name: str | ||
| deployment_config_name: str |
There was a problem hiding this comment.
You might have lifted this one to one from the RFC, but I'm wondering aloud, should we just go with deployment_config so that we could in theory support workspace/name or name (assume same workspace)? Maybe there's a world where we want to support deploying a deployment config from workspace A into workspace B? Just keeps our options open.
| endpoints: list[Endpoint] = Field(default_factory=list) | ||
| exit_code: int | None = None | ||
| error_details: dict[str, Any] | None = None | ||
| status_history: list[StatusEvent] | None = None |
There was a problem hiding this comment.
I'm not sure we need status_history on the API request? The entity itself should keep a status_history field for us, but it shouldn't be on the requester to preserve/maintain that history field.
| class UpdateDeploymentStatusRequest(BaseModel): | ||
| status: DeploymentStatus | ||
| status_message: str = "" | ||
| endpoints: list[Endpoint] = Field(default_factory=list) |
There was a problem hiding this comment.
Does endpoints need to exist on this request object? Wouldn't it be something on the top level Deployment entity?
| prerequisites: list[Prerequisite] = Field(default_factory=list) | ||
| drift_recovery: DriftRecoveryPolicy = Field(default_factory=DriftRecoveryPolicy, alias="driftRecovery") | ||
| labels: dict[str, str] = Field(default_factory=dict) | ||
| backend_config: DeploymentBackendConfig = Field(default_factory=DeploymentBackendConfig, alias="backendConfig") |
There was a problem hiding this comment.
This is straight from the RFC, but now I'm that guy and thinking about one alternative out loud:
Wdyt about executor_config? I hadn't thought about the "Executor (instance) vs. Service Backend (class)" disambiguation much when writing the RFC. The class could become DeploymentBackendConfig -> DeploymentExecutorConfig. I feel like it's potentially clearer?
No strong feelings here, we can keep it as is as well.
| volumes, | ||
| ) | ||
|
|
||
| prefix = "/v1/workspaces/{workspace}" |
There was a problem hiding this comment.
This (and all API routes anywhere in the PR) should be /v2/ and not /v1/.
| @@ -0,0 +1,31 @@ | |||
| # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |||
There was a problem hiding this comment.
Nit: I think these types could all just go in schemas.py but it doesn't really matter, tomatoe/tomatoe.
| default=None, | ||
| description="Fallback executor when Deployment.executor is unset.", | ||
| ) | ||
| port_range_start: int = Field(default=9000, description="Default Docker port range start.") |
There was a problem hiding this comment.
Shouldn't the port range start/end go in the docker service/executor config? Cause k8s executors won't use these, right?
| from pydantic import BaseModel, Field | ||
|
|
||
|
|
||
| class CreateDeploymentConfigRequest(BaseModel): |
There was a problem hiding this comment.
Doesn't have to be in this PR, but we should probs think through Update APIs at some point (which fields are mutable, maybe implement a naive form of entity versioning copying the established pattern from models service).
| entity_client: NemoEntitiesClient = Depends(get_entity_client), | ||
| ) -> None: | ||
| try: | ||
| await entity_client.delete(DeploymentConfig, name=name, workspace=workspace) |
There was a problem hiding this comment.
Could we ensure that there are no existing Deployments that reference the config and 4xx with a helpful message if so? Same goes for Volumes (if they're referenced by deployments).
Summary
Scaffolds
nemo-deployments-pluginas the substrate-agnostic contract layer for AIRCORE-755:DeploymentConfig,Deployment,Volumeplus RFC supporting types/apis/deployments/v1/workspaces/{workspace}/...status_inlist filter; controller-onlyPUT .../status(service-principal gated viaX-NMP-Principal-Id)DeploymentBackendABC +BackendStatusUpdate/VolumeStatusUpdateNo substrate backends or reconcile controller in this PR — those are AIRCORE-756/757/758.
Test plan
uv run pytest plugins/nemo-deployments/tests/unit -v(31 tests)uv run ruff check plugins/nemo-deploymentsuv run ty check plugins/nemo-deploymentsno-nmp-common-in-plugins)nemo services run→ CRUD against/apis/deployments/v1/...(follow-on if desired)Summary by CodeRabbit
New Features
Documentation
Tests