Problem
When the Discord WebSocket gateway disconnects (Discord-side restart, network blip, heartbeat timeout), client.start() returns Ok(()) and the Discord adapter permanently stops receiving events. The container remains "healthy" because the HTTP healthcheck only probes the admin API, not the WS gateway state.
Observed behavior:
docker ps shows Up X hours (healthy)
- Zero dispatch logs after disconnect
- Bot does not respond to any messages
- Manual
docker restart is the only recovery
Environment: ARM64 Ubuntu 24.04, openab 0.8.3-beta.6/7, ~28hr uptime before occurrence.
Root Cause
In src/main.rs, the Discord client runs as:
info!("discord bot running");
match client.start().await {
Err(e) => return Err(e.into()),
Ok(_) => {}
}
When serenity's internal reconnect fails and client.start() returns Ok(()), there is no outer retry loop — the adapter silently exits while the container keeps running.
Proposed Fix
-
Reconnect loop with backoff — wrap client.start() in a retry loop that rebuilds the client on clean exit or transient errors, with exponential backoff and INFO-level logging for each attempt.
-
Healthcheck gateway state — expose last-heartbeat-ack timestamp; mark unhealthy if no WS activity for >90s so Docker can auto-restart.
-
Log gateway lifecycle — emit INFO on disconnect/reconnect attempts so operators can detect issues without waiting hours.
Workaround
External watchdog that triggers docker restart when the bot stops responding. This is purely reactive and not a substitute for self-healing.
Problem
When the Discord WebSocket gateway disconnects (Discord-side restart, network blip, heartbeat timeout),
client.start()returnsOk(())and the Discord adapter permanently stops receiving events. The container remains "healthy" because the HTTP healthcheck only probes the admin API, not the WS gateway state.Observed behavior:
docker psshowsUp X hours (healthy)docker restartis the only recoveryEnvironment: ARM64 Ubuntu 24.04, openab 0.8.3-beta.6/7, ~28hr uptime before occurrence.
Root Cause
In
src/main.rs, the Discord client runs as:When serenity's internal reconnect fails and
client.start()returnsOk(()), there is no outer retry loop — the adapter silently exits while the container keeps running.Proposed Fix
Reconnect loop with backoff — wrap
client.start()in a retry loop that rebuilds the client on clean exit or transient errors, with exponential backoff and INFO-level logging for each attempt.Healthcheck gateway state — expose last-heartbeat-ack timestamp; mark unhealthy if no WS activity for >90s so Docker can auto-restart.
Log gateway lifecycle — emit INFO on disconnect/reconnect attempts so operators can detect issues without waiting hours.
Workaround
External watchdog that triggers
docker restartwhen the bot stops responding. This is purely reactive and not a substitute for self-healing.