keeper: expose CCB_KEEPER_PING_TIMEOUT_S env override for config-check ping by SevenX77 · Pull Request #186 · bfly123/claude_code_bridge

SevenX77 · 2026-04-23T22:47:49Z

Summary

Replace the hardcoded 0.2s timeout in daemon_matches_project_config's CcbdClient.ping('ccbd') call with _keeper_ping_timeout_s(), a helper that reads CCB_KEEPER_PING_TIMEOUT_S (default 2.0s) and falls back safely on invalid / empty input.

Why

The keeper's reconcile loop calls CcbdClient(socket_path, timeout_s=0.2).ping('ccbd') on every tick to verify config identity. 0.2s is aggressive: whenever ccbd is in the middle of a paste + verify cycle or a completion-tracker poll burst, the ping races against a busy event loop, the keeper marks the lifecycle as failed:config_check_failed:timed out, and from that point every ccb ask returns socket_unreachable until the user manually ccb kill + restart.

We've hit this repeatedly on a project that does heavy multi-agent dispatch (Gemini analyst + Codex reviewer in parallel). Raising the timeout to 2s (env-overridable) eliminates the false-positive "failed" transitions without weakening the actual-unreachable detection — a healthy ccbd responds to ping in <10ms, so the extra headroom costs nothing on the success path.

Scope

lib/ccbd/keeper_runtime/loop.py — add _keeper_ping_timeout_s() helper; plumb into the single call site in daemon_matches_project_config.
lib/runtime_env/control_plane.py — allowlist CCB_KEEPER_PING_TIMEOUT_S so the keeper subprocess inherits the env var.

Mirrors the precedent set by CCB_CCBD_CLIENT_TIMEOUT_S (CLI→ccbd path).

What is unchanged

All other CcbdClient(...) call sites with explicit timeout_s= (daemon_process health probe at 0.2s, etc.) — untouched.
If CCB_KEEPER_PING_TIMEOUT_S is not set (or is empty / invalid / non-positive), the new default is 2.0s. This is a default change from upstream's 0.2s.

Alternative: keep default at 0.2s

If you prefer the upstream default unchanged and only want the env-override capability, the helper is trivial to switch — replace return 2.0 with return 0.2 in _keeper_ping_timeout_s(). The env-override behavior itself is what resolves the operational pain; the default bump is our recommendation but not load-bearing.

Test plan

pytest test/ -k "keeper" → 13 passed
Running on personal fork for ~24h: config_check_failed transitions eliminated; no observed regression.

Keeper calls `CcbdClient(...).ping('ccbd')` with a hardcoded 0.2s timeout during every reconcile tick. When ccbd is busy (paste+verify, poll loop), the ping races → keeper marks lifecycle.failed:config_check_failed:timed out → all `ccb ask` return socket_unreachable until manual restart. - add `CCB_KEEPER_PING_TIMEOUT_S` (default 2.0s, invalid/neg/empty → default) - allowlist the env in runtime_env/control_plane.py Mirrors the earlier CCB_CCBD_CLIENT_TIMEOUT_S fix (CLI→ccbd path).

SevenX77 mentioned this pull request Apr 24, 2026

Discuss: per-agent cgroup v2 sub-scope isolation to prevent one agent starving siblings #192

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

keeper: expose CCB_KEEPER_PING_TIMEOUT_S env override for config-check ping#186

keeper: expose CCB_KEEPER_PING_TIMEOUT_S env override for config-check ping#186
SevenX77 wants to merge 1 commit intobfly123:mainfrom
SevenX77:td-006-keeper-ping-timeout-env-override

SevenX77 commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SevenX77 commented Apr 23, 2026

Summary

Why

Scope

What is unchanged

Alternative: keep default at 0.2s

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant