fix(task-store): retry on transient transport errors instead of dropping prompt#2090
fix(task-store): retry on transient transport errors instead of dropping prompt#2090yashrajshuklaaa wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
Adds retry handling in the shared KAgentTaskStore HTTP layer to prevent BYO agents from silently dropping prompts when the agent→controller hop encounters transient httpx.TransportError conditions (e.g., stale keep-alive connections reset by the mesh).
Changes:
- Introduce
_request_with_retry()inKAgentTaskStoreto retry once onhttpx.TransportError. - Route
save/get/deletethrough the new retry helper and document the new error behavior. - Add logging for transport retry attempts.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
0175791 to
084d78b
Compare
…ing prompt Fixes kagent-dev#2086 Signed-off-by: Yashraj Shukla <shuklayashraj68@gmail.com> fix: clean up stale docstring in _request_with_retry Signed-off-by: Yashraj Shukla <shuklayashraj68@gmail.com>
084d78b to
4a87931
Compare
|
wanted to share how i actually approached this before anyone reviews |
When the agent - - > controller HTTP hop raises httpx.TransportError ( idle keep-alive connection reset by Istio/HBONE mesh , controller pod reschedule , etc ) the error previously propagated uncaught out of KAgentTaskStore.get/save silently dropping the user prompt with no error surfaced and no recovery short of a pod restart
Fix :
introduce _request_with_retry( ) in KAgentTaskStore that catches TransportError calls aclose( ) to flush the stale connection pool and retries once on a fresh connection. Non-transport HTTP errors (4xx/5xx) are re-raised immediately without retrying. If the transport error persists after all retries it is re-raised so the caller sees a real error rather than a silent drop
fix lives entirely in kagent-core/_task_store.py and covers all
four framework adapters (langgraph, adk, openai, crewai) automatically
since they all share KAgentTaskStore
Fixes #2086