Skip to content

Discord gateway silently dies after WS disconnect — no reconnect loop #790

@chaodu-agent

Description

@chaodu-agent

Problem

When the Discord WebSocket gateway disconnects (Discord-side restart, network blip, heartbeat timeout), client.start() returns Ok(()) and the Discord adapter permanently stops receiving events. The container remains "healthy" because the HTTP healthcheck only probes the admin API, not the WS gateway state.

Observed behavior:

  • docker ps shows Up X hours (healthy)
  • Zero dispatch logs after disconnect
  • Bot does not respond to any messages
  • Manual docker restart is the only recovery

Environment: ARM64 Ubuntu 24.04, openab 0.8.3-beta.6/7, ~28hr uptime before occurrence.

Root Cause

In src/main.rs, the Discord client runs as:

info!("discord bot running");
match client.start().await {
    Err(e) => return Err(e.into()),
    Ok(_) => {}
}

When serenity's internal reconnect fails and client.start() returns Ok(()), there is no outer retry loop — the adapter silently exits while the container keeps running.

Proposed Fix

  1. Reconnect loop with backoff — wrap client.start() in a retry loop that rebuilds the client on clean exit or transient errors, with exponential backoff and INFO-level logging for each attempt.

  2. Healthcheck gateway state — expose last-heartbeat-ack timestamp; mark unhealthy if no WS activity for >90s so Docker can auto-restart.

  3. Log gateway lifecycle — emit INFO on disconnect/reconnect attempts so operators can detect issues without waiting hours.

Workaround

External watchdog that triggers docker restart when the bot stops responding. This is purely reactive and not a substitute for self-healing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions