Add instances option to target specific fleet instances#3925
Add instances option to target specific fleet instances#3925fededagos wants to merge 13 commits into
Conversation
Introduce an `instances` run profile option that pins a run to specific existing fleet instances (nodes). Each value matches an instance by its name (e.g. `my-fleet-0`) or by its hostname/IP address. When set, `filter_instances` keeps only matching instances and the job assignment phase never provisions new capacity to satisfy a node selector, terminating with a no-capacity error instead.
Reject runs that target fewer instances than the number of nodes they require, surfaced during planning via `validate_run_spec_and_set_defaults`. Exclude new-capacity backend offers from the run plan when `instances` is set, since they are never provisioned and would otherwise mislead the `dstack apply`/`dstack offer` output.
Add a 'Targeting specific instances' section to the shared fleets snippet (dev environments, tasks, services) and a corresponding tip in the protips guide.
Handle an explicit empty `instances` list consistently across the assignment gate, plan output, and instance filtering by checking `is not None` instead of truthiness, so an empty list targets existing instances only (rather than silently allowing new-capacity provisioning and showing unusable offers). Add regression tests ensuring the instance selector is applied on the multinode and shared-instances filter paths.
instances option to target specific fleet nodes| ) -> List[InstanceModel]: | ||
| fleet_load = joinedload(InstanceModel.fleet) | ||
| if load_fleet_project: | ||
| fleet_load = fleet_load.joinedload(FleetModel.project) |
There was a problem hiding this comment.
Added .load_only(ProjectModel.name) to the fleet project loads.
| instances = run_spec.merged_profile.instances | ||
| if instances is not None: | ||
| nodes_required_num = get_nodes_required_num(run_spec) | ||
| if len(instances) < nodes_required_num: | ||
| raise ServerClientError( | ||
| f"`instances` specifies {len(instances)} instance(s)" | ||
| f" but the run requires {nodes_required_num} nodes." | ||
| " Specify at least as many instances as nodes." | ||
| ) |
There was a problem hiding this comment.
Even if there are less instances than nodes_required_num, they may still be able to accommodate the run if they have enough blocks.
There are a few other places in the PR that appear to not take blocks into consideration (search by required_instance_offers)
There was a problem hiding this comment.
Right about services: replicas can pack onto one instance with enough idle blocks, so the up-front check now applies only to multinode tasks, where each node takes a whole instance (min_blocks = total_blocks for multinode in get_shared_instances_with_offers, and multinode skips instances with any busy block). For the same reason the required_instance_offers comparisons should be blocks-safe: len(jobs_to_provision) > 1 only happens in the multinode master path, and each instance yields at most one offer (the block-size loop breaks on first match), so offer count equals distinct usable instances. Let me know if there's a path I'm missing.
Adds an
instancesoption to run configurations (dev environments, tasks, services) that restricts a run to specific existing fleet instances.Syntax
Long forms:
Short form for matching by instance name:
The
fleetform also supports<project name>/<fleet name>for fleets from another project.Behavior
instanceshas allow-list semantics: a run is placed only on a matching existing instance.instancesis set,dstacknever provisions new instances to satisfy the run.retrycan be used to wait for a selected busy instance to free up.instancesis set because they cannot satisfy the selector.Implementation
ProfileParams:name,hostname, andfleet+instance, while preserving the string shorthand as an instance-name selector.instancesfor older client/server compatibility paths.Docs
Updated the shared fleet-management snippet and protips guide. The docs promote the explicit syntax first and keep the short instance-name syntax in a collapsible section.
Testing
uv run ruff check .uv run pyright -p .uv run pytest—2607 passed, 1055 skippeddstack server:fleetsplus all fourinstancessyntaxes completed successfully.instancessyntaxes completed successfully.AI Assistance
This PR includes AI-assisted changes. The original PR noted Claude Code assistance; follow-up schema, implementation review, tests, docs, and E2E verification were assisted by Codex.