Skip to content

[Bug]: Scale delay cooldowns don't reset when autoscaler intent changes during the delay period #3824

@megheaiulian

Description

@megheaiulian

Steps to reproduce

  1. Deploy a service with replicas: 0..1 and scaling: { metric: rps, target: 0.5, scale_down_delay: 900 }
  2. Send a request to trigger scale-up (0→1)
  3. Use the service actively for 14 minutes (RPS > 0, requests completing regularly)
  4. Pause for 1 minute (RPS drops to 0)
  5. Observe: service scales down at T=15min, even though it was actively used for 14 of those 15 minutes

Expected behaviour

scale_down_delay should protect the instance for the specified duration after the autoscaler first decides to scale down — and reset if traffic returns during that window. Instead, the cooldown is calculated from last_scaled_at (the timestamp of the last applied scale event), not from when the autoscaler's intent changed.

The bug

In src/dstack/_internal/server/services/services/autoscalers.py, the RPSAutoscaler uses last_scaled_at to enforce cooldowns:

if (now - last_scaled_at).total_seconds() < self.scale_down_delay:
    # too early to scale down, wait for the delay
    return current_desired_count

last_scaled_at is only updated when a scale event is actually applied (the cooldown expires and the scale happens). It is not updated when the autoscaler's desired count changes (intent changes) during the cooldown period.

This means:

  • Active traffic during the cooldown does not reset last_scaled_at
  • For replicas: 0..1, last_scaled_at is set once at initial scale-up (0→1) and never updated, because desired_count == current_count (1 == 1) produces no scale event
  • Scale-down can happen immediately after the cooldown expires, regardless of activity during the cooldown

Timeline demonstrating the bug

T=0:    Scale up 0→1, last_scaled_at = T0
T=1m:   RPS=0.02, desired=1, current=1 → no scale event → last_scaled_at stays T0
T=5m:   RPS=0.03, desired=1, current=1 → no scale event → last_scaled_at stays T0
T=10m:  RPS=0.01, desired=1, current=1 → no scale event → last_scaled_at stays T0
T=14m:  User pauses, RPS drops to 0
T=15m:  (now - T0) >= scale_down_delay(900s), RPS=0 → scale down happens
        → User was active for 14 of 15 minutes, but got scaled down after a 1-minute pause

Proposed fix

Replace last_scaled_at with last_count_change — track when the autoscaler's desired count changes from the current count, not when a scale event is applied:

  • When desired_count > current_desired_count: update last_count_change (scale-up intent)
  • When desired_count < current_desired_count: update last_count_change (scale-down intent)
  • Use last_count_change instead of last_scaled_at for cooldown calculations

This ensures:

  1. The cooldown starts from when the autoscaler first decided to scale down, not from the last applied scale event
  2. If traffic returns during the cooldown (desired_count goes back up), the cooldown resets because the autoscaler's intent changed
  3. This works symmetrically for both scale_up_delay and scale_down_delay
# Proposed logic
if new_desired_count > current_desired_count:
    if current_desired_count == 0:
        return new_desired_count  # immediate scale-up from zero
    if last_count_change is not None and (now - last_count_change).total_seconds() < self.scale_up_delay:
        return current_desired_count
    return new_desired_count
elif new_desired_count < current_desired_count:
    if last_count_change is not None and (now - last_count_change).total_seconds() < self.scale_down_delay:
        return current_desired_count
    return new_desired_count
return new_desired_count

Impact

This bug makes RPS-based autoscaling unreliable for interactive workloads (LLM serving, chat bots, etc.) where traffic is bursty with pauses between active periods. The cooldown provides a false sense of protection — it doesn't actually guarantee N seconds of inactivity before scaling down.

dstack version

0.19.x (current master)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions