Skip to content

Support router as replica with pipelines#3721

Merged
Bihan merged 16 commits intodstackai:masterfrom
Bihan:support_router_replica_with_pipelines
Apr 15, 2026
Merged

Support router as replica with pipelines#3721
Bihan merged 16 commits intodstackai:masterfrom
Bihan:support_router_replica_with_pipelines

Conversation

@Bihan
Copy link
Copy Markdown
Collaborator

@Bihan Bihan commented Mar 31, 2026

Refer design document for this PR is here.

@Bihan Bihan force-pushed the support_router_replica_with_pipelines branch from 2fe5e14 to bafd2d9 Compare April 1, 2026 07:22
@Bihan Bihan requested review from jvstme and r4victor April 7, 2026 10:33


class ServiceRouterWorkerSyncFetcher(Fetcher[ServiceRouterWorkerSyncPipelineItem]):
@sentry_utils.instrument_named_task("pipeline_tasks.ServiceRouterWorkerSyncFetcher.fetch")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recently added @sentry_utils.instrument_pipeline_task – use it to avoid hardcoding pipeline_tasks prefix.

Comment on lines +201 to +205
run_model = sync_row.run
if run_model is None:
await session.delete(sync_row)
await session.commit()
return
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can run_model be None here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought what if the run row can be hard-deleted, so sync_row.run becomes None. If this is not possible we can delete this block.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you defined run_id as non-optional with ondelete="CASCADE" - how can it be possible?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Maybe I delete this block.

Comment on lines +220 to +227
.options(
selectinload(RunModel.project),
selectinload(RunModel.jobs).selectinload(JobModel.project),
selectinload(RunModel.jobs)
.selectinload(JobModel.instance)
.selectinload(InstanceModel.project),
)
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is potentially a very inefficient select – a run can have thousands of job submissions. Select only the jobs that the processing needs, i.e. only the router replica job. Also every selectinload will be a separate query here – not sure if it's justified. joinedload may be a better suited for a one-to-one rel. Also, try to avoid loading all models's columns and use load_only to select only the necessary.

Copy link
Copy Markdown
Collaborator Author

@Bihan Bihan Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check if below proposed query addresses the concerns

  1. Avoid loading thousands of job submissions: no longer load RunModel.jobs unconditionally. The selectinload(RunModel.jobs.and_(...)) restricts the loaded jobs to only RUNNING + registered replicas, which are the only ones sync_router_workers_for_run_model() can use (router job selection and worker list building both ignore non‑running / unregistered jobs).

  2. selectinload is intentional: RunModel.jobs is a one‑to‑many collection; using joinedload would duplicate the RunModel row per job.

  3. joinedload for one‑to‑one/many‑to‑one: RunModel.project, JobModel.project, JobModel.instance, InstanceModel.project are loaded with joinedload because these are scalar relationships from from run,job and instance.

  4. Use load_only: This limits columns required by sync_router_workers_for_run_model(run_for_sync) and _get_service_replica_client(job_model)

res = await session.execute(
    select(RunModel)
    .where(RunModel.id == item.run_id)
    .options(
        load_only(RunModel.id, RunModel.run_spec),
        selectinload(
            RunModel.jobs.and_(
                JobModel.status == JobStatus.RUNNING,
                JobModel.registered == true(),
            )
        )
        .load_only(
            JobModel.id,
            JobModel.status,
            JobModel.registered,
            JobModel.job_spec_data,
            JobModel.job_provisioning_data,
            JobModel.job_runtime_data,
        )
        .options(
            joinedload(JobModel.project).load_only(ProjectModel.id, ProjectModel.ssh_private_key),
            joinedload(JobModel.instance)
            .load_only(InstanceModel.id, InstanceModel.remote_connection_info)
            .joinedload(InstanceModel.project)
            .load_only(ProjectModel.id, ProjectModel.ssh_private_key),
        ),
    )
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, at least at a glance

Comment on lines +105 to +112
router_jobs = [
j
for j in run_model.jobs
if job_belongs_to_group(j, group_name) and j.status == JobStatus.RUNNING
]
if not router_jobs or not is_replica_registered(router_jobs):
return None
return router_jobs[0]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can there be multiple router jobs? If so, how does that work?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first iteration, I suggest restricting the router replica group to count: 1 via configuration validation. The current sync logic effectively assumes a single active router job. We can extend this later to support multiple router replicas for HA.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's worth a comment!

Comment on lines +98 to +107
def run_spec_has_router_replica_group(run_spec: RunSpec) -> bool:
if run_spec.configuration.type != "service":
return False
cfg = run_spec.configuration
if not isinstance(cfg, ServiceConfiguration):
return False
return any(g.router is not None for g in cfg.replica_groups)


async def ensure_service_router_worker_sync_row(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why put these router-speicfic functions in top of runs services.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept it there because they are used by run lifecycle. Should I shift them to src/dstack/_internal/server/services/router_worker_sync.py?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean at least they should not be at the top of the file.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

Comment thread src/dstack/_internal/server/services/runs/__init__.py
Comment on lines +112 to +120
if not run_spec_has_router_replica_group(run_spec):
return
res = await session.execute(
select(ServiceRouterWorkerSyncModel.id).where(
ServiceRouterWorkerSyncModel.run_id == run_model.id
)
)
if res.scalar_one_or_none() is not None:
return
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can it be that ServiceRouterWorkerSyncModel already exists for a run if ensure_service_router_worker_sync_row is called only on run submit?

return
run_model = sync_row.run
if run_model is None:
await session.delete(sync_row)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally use soft deletes in dstack server easier debugging and historical data. Assuming there will be very few ServiceRouterWorkerSyncModel rows (one per service replica router), I'd also soft-delete it for consistency.

)


class ServiceRouterWorkerSyncModel(PipelineModelMixin, BaseModel):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put it somewhere in the end of the file so that "core" models come first.

@@ -0,0 +1,49 @@
"""SSH-tunneled async HTTP client to a job's service port (same path as probes)."""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put this file in jobs services?

@@ -0,0 +1,345 @@
"""Reconcile SGLang router /workers with dstack's registered worker replicas (async, SSH-tunneled)."""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put this file in runs services

Copy link
Copy Markdown
Collaborator

@r4victor r4victor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a quick review of the pipeline code. Haven't looked into the worker sync logic.

@Bihan Bihan force-pushed the support_router_replica_with_pipelines branch from e155d17 to 7b268cb Compare April 9, 2026 10:36
Comment thread src/dstack/_internal/server/services/job_replica_http_client.py Outdated
Comment thread src/dstack/_internal/core/models/configurations.py Outdated
Comment thread src/dstack/_internal/server/services/runs/router_worker_sync.py
Comment on lines +39 to +45
async def _stream_response_body_bytes(resp: Response, max_bytes: int) -> bytes:
buf = bytearray()
async for chunk in resp.aiter_bytes():
buf.extend(chunk)
if len(buf) > max_bytes:
raise _ResponseTooLargeError()
return bytes(buf)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) We have the join_byte_stream_checked function that appears to do the same thing

Comment thread src/dstack/_internal/proxy/gateway/services/registry.py Outdated
Comment thread src/dstack/_internal/core/models/configurations.py
Comment thread src/dstack/_internal/proxy/gateway/services/registry.py
Comment thread src/dstack/_internal/proxy/lib/services/service_connection.py Outdated
Comment thread src/dstack/_internal/server/services/runs/router_worker_sync.py Outdated
Comment thread src/dstack/_internal/server/services/runs/router_worker_sync.py Outdated
Comment thread src/dstack/_internal/server/services/runs/router_worker_sync.py Outdated
Comment thread src/dstack/_internal/server/services/runs/router_worker_sync.py Outdated
Comment thread src/dstack/_internal/proxy/gateway/services/registry.py
@Bihan Bihan force-pushed the support_router_replica_with_pipelines branch from 3bc04df to 8fe01e5 Compare April 13, 2026 07:33
@Bihan Bihan changed the title [Draft PR] Support router as replica with pipelines Support router as replica with pipelines Apr 14, 2026
Comment thread docs/blog/posts/pd-disaggregation.md
Comment thread docs/docs/concepts/gateways.md Outdated
Comment thread examples/inference/sglang/README.md Outdated
Comment thread examples/inference/sglang/README.md Outdated
fleets: [pd-disagg]

# Custom probe is required for PD disaggregation
# Custom probe is required for PD disaggregation.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) By the way, is it still required? I thought sync_router_workers_for_run_model can gracefully handle the router or workers not being ready, and perform the registration eventually, once they become ready

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is still required. Because probes queries /v1/chat/completions to register the job but router fails to serve /v1/chat/completions until workers are registered. Meanwhile, the router-worker sync pipeline only considers RUNNING jobs that are also registered=True.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, so our default probe is the problem. But I assume it's possible to work around it by either setting probes: [], or not setting model. If that's the case, a custom probe is more of a recommendation, not a strict requirement.

Anyways, I think we were going to improve the UX here by introducing a different default probe for services with the SGLang router. Not in this PR, of course.

Comment thread src/dstack/_internal/core/models/configurations.py
Comment thread src/dstack/_internal/server/services/proxy/services/service_proxy.py Outdated
Comment thread src/dstack/_internal/server/services/proxy/services/service_proxy.py Outdated
Copy link
Copy Markdown
Collaborator

@jvstme jvstme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall, but the following may require more attention:

  • Forbid unsupported in-place updates (thread)
  • Fix the path whitelist in the in-server proxy (thread)

These may be uncommon cases, but they are security-related, so I would prefer to
address them before merging, or at least before the release

Comment on lines 218 to +222
set_processed_update_map_fields(early_cleanup_update_map)
set_unlock_update_map_fields(early_cleanup_update_map)
now = get_current_datetime()
resolve_now_placeholders(early_cleanup_update_map, now=now)
await session.execute(
update(ServiceRouterWorkerSyncModel)
.where(
ServiceRouterWorkerSyncModel.id == item.id,
ServiceRouterWorkerSyncModel.lock_token == item.lock_token,
)
.values(**early_cleanup_update_map)
await _update_sync_row_or_log_lock_token_changed(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) Identical set_processed_update_map_fields, set_unlock_update_map_fields, and resolve_now_placeholders calls are also repeated in three places in this method. It's worth moving them inside _update_sync_row_or_log_lock_token_changed

Comment thread src/dstack/_internal/server/services/services/__init__.py Outdated
Comment thread src/dstack/_internal/server/services/proxy/services/service_proxy.py Outdated
@Bihan
Copy link
Copy Markdown
Collaborator Author

Bihan commented Apr 15, 2026

Looks good to me overall, but the following may require more attention:

  • Forbid unsupported in-place updates (thread)
  • Fix the path whitelist in the in-server proxy (thread)

These may be uncommon cases, but they are security-related, so I would prefer to address them before merging, or at least before the release

@jvstme Done

@Bihan Bihan merged commit 46ec81f into dstackai:master Apr 15, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants