Skip to content

Discord adapter: latency UX improvements (perception + proactive context management) #795

@RayKuo-Mantis

Description

@RayKuo-Mantis

Description

Discord adapter: latency UX improvements (perception + proactive context management)

Context

Running openab-claude:0.8.3-beta.7 + PR #791 swap on GB10 ARM64 host
(meta + mentor instances, ~28h uptime, normal daily-driver usage with Discord
adapter).

Observed behavior

Two distinct latency signals from openab::dispatch logs over a full day of
real usage:

Metric Range Comment
wait_ms (OpenAB queue wait) 300–500 ms Healthy across all turns
agent_dispatch_ms (inner agent processing) 2 sec – 395 sec Wide variance

Breakdown by usage pattern:

  • Idle chat: 2–5 sec
  • Active sessions with tool chains: 30–100 sec common
  • Multi-tool reasoning-heavy turns: 100–400 sec
  • Peak observed today: 395 sec (6.5 min) on one turn

Root cause investigation

After issuing /compact inside the Claude session, agent_dispatch_ms
immediately dropped back to the 2–5 sec normal range. So:

  • ~80% of "Discord feels slow" is Claude CLI session jsonl context bloat
    (full history sent on every Anthropic API call grows linearly with session size)
  • ~10% is ACP / Discord chunked-send overhead (json-rpc wrap + Discord API
    message chunking for long replies)
  • ~10% is perception (no streaming visibility — user sees nothing for
    30+ sec, which feels indistinguishable from "stuck")

OpenAB itself is not the latency root cause — wait_ms is consistently
healthy. The dominant factor is internal Claude/Anthropic processing, which
OpenAB only measures.

That said, OpenAB sits between the user and Claude CLI — it is the only layer
that can mitigate the user-facing experience of these long turns.

Suggested improvements

1. Typing indicator / partial output during long dispatch

Currently the Discord adapter waits for the agent to fully complete its turn
before pushing the reply. For 30+ sec turns this looks like the bot died.

  • Maintain Discord typing indicator while dispatch is in flight, OR
  • Stream partial output as Claude produces it (if ACP supports incremental
    chunks)

2. Auto progress hint when agent_dispatch_ms is unusually long

When a turn exceeds a threshold (e.g. > 60 sec):

  • Auto-send a short ⏳ still processing... message to keep channel alive
  • Optionally include a hint like session context is X% full — consider /compact

3. Expose session size to user

  • Slash command or /status showing current jsonl size + last agent_dispatch_ms
  • Lets the user see context-bloat accumulating and proactively /compact
    before turns slow down

4. Auto-compact (most impactful, but biggest change)

OpenAB is an agent layer with full ownership — it can do proactive context
management that the upstream Claude Code CLI itself does not (Anthropic's
auto-compact only triggers near hard context limit, not proactively).

Proposed mechanism:

  • Monitor signals: any of —
    • jsonl file size exceeds threshold (e.g. 10 MB)
    • rolling average agent_dispatch_ms over last N turns exceeds threshold (e.g. 30 sec)
    • time since last /compact exceeds T hours
  • Trigger: before delivering the next user message to the agent, OpenAB
    injects a synthetic /compact dispatch
  • Transparent to user: just a maintenance turn, not visible in Discord
  • Result: user never has to think about compacting, session stays in
    fresh-context regime indefinitely

This is structurally the kind of thing only an agent wrapper can do —
the upstream commercial Claude API can't modify Claude Code's own behavior,
but a layer that invokes Claude CLI can drive it proactively.

Priority

(1)(2)(3) are perception-layer fixes — keep user informed that the system is
alive during slow turns. Cheap wins.

(4) is structural — eliminates the dominant 80% latency contributor entirely
for daily-driver use cases.

Logging suggestion (separate)

Consider splitting agent_dispatch_ms into:

  • claude_cli_ms (inner CLI + API time)
  • acp_overhead_ms (json-rpc serialization + Discord chunked-send)

Currently they're conflated. Splitting helps users (and you) distinguish
"Claude is slow today" from "OpenAB has overhead" when triaging issues.

Environment

  • Host: GB10 ARM64, Ubuntu 24.04
  • OpenAB image: ghcr.io/openabdev/openab-claude:0.8.3-beta.7 (also tested
    with PR fix: reconnect Discord gateway on silent WS disconnect #791 reconnect fix swapped on meta instance)
  • Two compose instances side by side (meta + mentor)
  • Use case: continuous daily-driver, Discord-only interface

Use Case

Discord adapter: latency UX improvements

Proposed Solution

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions