[Bug]: Scale delay cooldowns don't reset when autoscaler intent changes during the delay period

### Steps to reproduce

1. Deploy a service with `replicas: 0..1` and `scaling: { metric: rps, target: 0.5, scale_down_delay: 900 }`
2. Send a request to trigger scale-up (0→1)
3. Use the service actively for 14 minutes (RPS > 0, requests completing regularly)
4. Pause for 1 minute (RPS drops to 0)
5. Observe: service scales down at T=15min, even though it was actively used for 14 of those 15 minutes

### Expected behaviour

`scale_down_delay` should protect the instance for the specified duration after the autoscaler first decides to scale down — and reset if traffic returns during that window. Instead, the cooldown is calculated from `last_scaled_at` (the timestamp of the last **applied** scale event), not from when the autoscaler's intent changed.

### The bug

In `src/dstack/_internal/server/services/services/autoscalers.py`, the `RPSAutoscaler` uses `last_scaled_at` to enforce cooldowns:

```python
if (now - last_scaled_at).total_seconds() < self.scale_down_delay:
    # too early to scale down, wait for the delay
    return current_desired_count
```

`last_scaled_at` is only updated when a scale event is **actually applied** (the cooldown expires and the scale happens). It is **not** updated when the autoscaler's desired count changes (intent changes) during the cooldown period.

This means:
- Active traffic during the cooldown does not reset `last_scaled_at`
- For `replicas: 0..1`, `last_scaled_at` is set once at initial scale-up (0→1) and never updated, because `desired_count == current_count` (1 == 1) produces no scale event
- Scale-down can happen immediately after the cooldown expires, regardless of activity during the cooldown

### Timeline demonstrating the bug

```
T=0:    Scale up 0→1, last_scaled_at = T0
T=1m:   RPS=0.02, desired=1, current=1 → no scale event → last_scaled_at stays T0
T=5m:   RPS=0.03, desired=1, current=1 → no scale event → last_scaled_at stays T0
T=10m:  RPS=0.01, desired=1, current=1 → no scale event → last_scaled_at stays T0
T=14m:  User pauses, RPS drops to 0
T=15m:  (now - T0) >= scale_down_delay(900s), RPS=0 → scale down happens
        → User was active for 14 of 15 minutes, but got scaled down after a 1-minute pause
```

### Proposed fix

Replace `last_scaled_at` with `last_count_change` — track when the autoscaler's desired count changes from the current count, not when a scale event is applied:

- When `desired_count > current_desired_count`: update `last_count_change` (scale-up intent)
- When `desired_count < current_desired_count`: update `last_count_change` (scale-down intent)
- Use `last_count_change` instead of `last_scaled_at` for cooldown calculations

This ensures:
1. The cooldown starts from when the autoscaler **first decided** to scale down, not from the last applied scale event
2. If traffic returns during the cooldown (desired_count goes back up), the cooldown **resets** because the autoscaler's intent changed
3. This works symmetrically for both `scale_up_delay` and `scale_down_delay`

```python
# Proposed logic
if new_desired_count > current_desired_count:
    if current_desired_count == 0:
        return new_desired_count  # immediate scale-up from zero
    if last_count_change is not None and (now - last_count_change).total_seconds() < self.scale_up_delay:
        return current_desired_count
    return new_desired_count
elif new_desired_count < current_desired_count:
    if last_count_change is not None and (now - last_count_change).total_seconds() < self.scale_down_delay:
        return current_desired_count
    return new_desired_count
return new_desired_count
```

### Impact

This bug makes RPS-based autoscaling unreliable for interactive workloads (LLM serving, chat bots, etc.) where traffic is bursty with pauses between active periods. The cooldown provides a false sense of protection — it doesn't actually guarantee N seconds of inactivity before scaling down.

### dstack version

0.19.x (current master)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Scale delay cooldowns don't reset when autoscaler intent changes during the delay period #3824

Steps to reproduce

Expected behaviour

The bug

Timeline demonstrating the bug

Proposed fix

Impact

dstack version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Scale delay cooldowns don't reset when autoscaler intent changes during the delay period #3824

Description

Steps to reproduce

Expected behaviour

The bug

Timeline demonstrating the bug

Proposed fix

Impact

dstack version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions