Merge/codex integration by williamkasasa · Pull Request #5 · codeaashu/claude-code

williamkasasa · 2026-04-04T10:01:00Z

Summary by CodeRabbit

New Features
- Runnable Python backend with REST/SSE APIs for chat, orchestration, MES utilities, and history persistence
- Multi-provider chat streaming, provider health probing, and local fixture/testing modes
- Research Workbench (orchestrate, MES retrieval, log slimming, screen interpretation)
- Buddy companion, collaboration annotations, file explorer/viewer, conversation sharing, and provider-config UI
Documentation
- New setup, runbook, clean-room boundary, MES and vision runbooks, and operational/readme updates
Tests
- Unit, live-provider, vision, and Playwright end-to-end test suites added
Chores
- Added .gitignore and packaging/config updates

…e unit tests

coderabbitai · 2026-04-04T10:01:19Z

📝 Walkthrough

Walkthrough

This PR creates the AG‑Claw reference workspace: a new Python backend (orchestration, MES services, provider abstraction, history persistence, tracing), a reworked Next.js web UI (provider-aware chat, research workbench, file explorer, collaboration, buddy system), datasets/registries, tests, docs, tooling, and MCP server rebranding.

Changes

Cohort / File(s)	Summary
Backend Core (Python) `backend/agclaw_backend/__init__.py`, `backend/agclaw_backend/contracts.py`, `backend/agclaw_backend/http_api.py`, `backend/agclaw_backend/server.py`, `backend/agclaw_backend/history_store.py`, `backend/agclaw_backend/mes_services.py`, `backend/agclaw_backend/orchestrator.py`, `backend/agclaw_backend/providers.py`, `backend/agclaw_backend/tracing.py`	New Python package implementing typed contracts, REST/SSE API server, provider probing/chat streaming, orchestration logic, MES retrieval/log-slim/screen-interpret, file-backed history persistence, and optional OpenTelemetry bootstrap.
Backend Data, Tests & Scripts `backend/agclaw_backend/data/`, `backend/tests/`, `backend/scripts/*`, `backend/pyproject.toml`, `backend/README.md`, `backend/TRACING.md`, `backend/test_fixtures.py`	Seed MES datasets and registry, unit/integration/e2e-like tests (including live-provider guards), fixture server, benchmark script, packaging config and backend docs/runbooks.
Web API / Proxy `web/app/api/_backendProxy.ts`, `web/app/api/chat/route.ts`, `web/app/api/provider-health/route.ts`, `web/app/api//route.ts`, `web/app/health/route.ts`, `web/app/api/files/`	Added backend proxy helpers and provider-aware chat routing with SSE transformation, provider health probing, orchestration/MES proxy endpoints, file listing/read/write with root-bounded resolution, share and history endpoints.
Web UI & Components `web/components/...` (chat, layout, settings, research, collaboration, file-viewer, buddy, share, ui)	Large UI additions and refactors: buddy helper, Research Workbench (orchestrate/retrieve/log-slim/interpret), SettingsDialog, FileExplorer/DesktopFileViewer, AnnotationThread, ChatInput/Layout changes, accessibility and branding updates.
Web Libraries & Types `web/lib/`, `web/hooks/`, `web/lib/constants.ts`, `web/lib/buddy.ts`, `web/lib/pretextSpike.ts`, `web/lib/store.ts`, `web/lib/types.ts`	Provider-scoped model/config helpers, buddy profile generator, pretext measurement utilities, store expansions (research/buddy), new ChatProvider type and settings, exports formatting, collaboration socket type tightening.
Web Testing & E2E `web/e2e/`, `web/playwright.config.ts`, `web/scripts/start-e2e-server.mjs`, `web/scripts/`	Playwright E2E test suite added, E2E server orchestration script, pretext eval/report, local start scripts and PS utilities.
Prompt Safety Pack `promptfoo/*`	Added Promptfoo safety pack, provider stub that enforces advisory/human-review responses, config and README.
MCP Server Rebranding & Config `mcp-server/*`, `.vscode/mcp.json`	Rebranded MCP server to `agclaw-source-explorer`, switched env var to `AGCLAW_REFERENCE_SRC_ROOT` (with legacy fallback), tightened safePath checks, adjusted package/metadata and Dockerfile/README.
Docs & Runbooks `README.md`, `docs/*.md`, `backend/README.md`	Replaced repository README with AG‑Claw reference workspace docs, clean-room boundary, runbooks, vision/MES docs, naming inventory and migration/backlog artifacts.
Workspace & Config `.gitignore`, `package.json`, `server.json`, `tsconfig.json`, `web/next.config.mjs`, `web/tsconfig.json`, `web/package.json`, `web/.env.example`	Added .gitignore, rebranded root package/manifest, updated TypeScript/Next configs, environment examples and build/test scripts.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Web Client
    participant WebAPI as Next.js API
    participant Backend as Python Backend
    participant Provider as Chat Provider
    participant MES as MES Registry/Docs
    participant History as History Store

    Client->>WebAPI: POST /api/chat (messages, settings)
    alt AGCLAW_BACKEND_URL configured
        WebAPI->>Backend: Forward /api/chat
        Backend->>Provider: Probe/POST chat request
        Provider-->>Backend: SSE stream responses
        Backend-->>WebAPI: SSE stream (proxied)
    else No backend proxy
        WebAPI->>Provider: Direct POST chat request
        Provider-->>WebAPI: SSE stream
    end
    WebAPI-->>Client: SSE stream (text chunks)

    Client->>WebAPI: POST /api/orchestrate (ResearchRequest)
    WebAPI->>Backend: /api/orchestrate
    Backend->>Backend: run_research_orchestration()
    Backend->>MES: retrieve_mes_context(query, dataset_ids)
    MES-->>Backend: results
    Backend->>History: append_orchestration_history(entry + artifact)
    Backend-->>WebAPI: ResearchResponse (summary, role_plans)
    WebAPI-->>Client: JSON response

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~105 minutes

Poem

🐰 In burrows bright the AG‑Claw hops anew,

Clean rooms, guardrails, datasets lined in view.
Backend hums and web panels softly sing,
Buddy and research bring the curious spring.
A gentle rabbit cheers: “Safe studies, pursue!”

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

…ng rule

coderabbitai

Actionable comments posted: 4

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)

web/lib/workers/markdown.worker.ts (1)
59-59: ⚠️ Potential issue | 🟡 Minor

codeBlockCount counts fence delimiter lines, not blocks.

The logic counts all lines starting with ``` (both opening and closing fences), but the variable name and documentation ("Number of code blocks found") suggest it should count actual code blocks. In properly formatted markdown, each block has an opening and closing fence, making the actual count double what consumers would expect.

However, note that codeBlockCount is not used elsewhere in the codebase, so the practical impact is minimal. If this field becomes used, consider either:

Dividing by 2 (assumes all fences are properly paired), or

Tracking open/close state during parsing for robustness.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/lib/workers/markdown.worker.ts` at line 59, The variable codeBlockCount
currently counts fence delimiter tokens (both opening and closing) by using
tokens.filter(t => t.type === "code-fence").length; update the logic to count
actual code blocks instead: iterate the tokens array and toggle an inCodeBlock
flag on encountering a "code-fence" token, incrementing codeBlockCount only when
entering a code block (i.e., when inCodeBlock transitions from false to true) so
each fenced block is counted once; reference the codeBlockCount variable and the
tokens token.type checks in markdown.worker.ts (use this approach instead of
simply dividing by 2 to be robust to unpaired fences).
web/components/ui/input.tsx (1)
46-50: ⚠️ Potential issue | 🟠 Major

Restore invalid state signaling on the input control.

When error is present, the field should explicitly expose invalid state using aria-invalid="true"; relying only on aria-describedby weakens screen reader feedback for form errors.
✅ Minimal fix
          <input
            id={inputId}
            ref={ref}
+           aria-invalid={error ? true : undefined}
            aria-describedby={describedBy || undefined}
            className={cn(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/components/ui/input.tsx` around lines 46 - 50, The input JSX currently
only sets aria-describedby; when an error prop is present you must restore
explicit invalid state signaling by adding aria-invalid (e.g.,
aria-invalid={!!error} or aria-invalid="true" when error exists) to the <input
/> element; update the input element in web/components/ui/input.tsx (the <input
id={inputId} ref={ref} ... />) to include aria-invalid tied to the error prop so
screen readers get explicit invalid state.
web/components/ui/textarea.tsx (1)
56-62: ⚠️ Potential issue | 🟠 Major

Add aria-invalid attribute to signal invalid state to assistive technologies.

The textarea currently lacks the aria-invalid attribute. When error exists, this attribute should be set to communicate the invalid state directly to screen readers and other assistive technologies.
✅ Minimal fix
         <textarea
           id={textareaId}
           ref={resolvedRef}
           value={value}
           onChange={handleChange}
+          aria-invalid={error ? true : undefined}
           aria-describedby={describedBy || undefined}
           className={cn(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/components/ui/textarea.tsx` around lines 56 - 62, The textarea is missing
aria-invalid to expose validation state to ATs; update the JSX for the
<textarea> (the element using textareaId, resolvedRef, value, handleChange,
describedBy and className={cn(...)}) to include aria-invalid when error is
present (e.g., set aria-invalid to true when error exists and undefined/false
otherwise) so screen readers receive the invalid state.
web/components/collaboration/AnnotationBadge.tsx (1)
24-41: ⚠️ Potential issue | 🟡 Minor

Add explicit button type to avoid unintended form submission.

The comment-toggle button should declare type="button"; otherwise, it defaults to submit in form contexts.
Suggested fix
       <button
+        type="button"
         onClick={() => setOpen((v) => !v)}
         className={cn(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/components/collaboration/AnnotationBadge.tsx` around lines 24 - 41, The
button inside the AnnotationBadge component currently lacks an explicit type,
which can cause it to act as a form submitter when rendered inside a form;
update the JSX for the button (the element that calls onClick={() => setOpen((v)
=> !v)} and renders MessageSquare) to include type="button" so it no longer
triggers form submission unexpectedly.
web/lib/store.ts (1)
440-463: ⚠️ Potential issue | 🟠 Major

Add a migration path for the persisted storage-key rename.

Switching the persist name to agclaw-chat means the app no longer reads the previous browser entry, so existing conversations, tags, searches, and settings will all appear lost after upgrade.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/lib/store.ts` around lines 440 - 463, Add a migration step to preserve
users' existing data when renaming the persist key to "agclaw-chat": in the
merge function (and/or initialization path that configures the persist name)
detect when persisted is missing or empty and attempt to read and parse the
previous storage entry via window.localStorage.getItem('<OLD_PERSIST_NAME>'),
JSON.parse it, and merge that object into current just like the existing
persisted merge logic; ensure you still apply DEFAULT_SETTINGS and reset UI-only
flags (settingsOpen, researchOpen, buddyOpen, isSearchOpen, sidebarTab,
selectedConversationIds) so settings, conversations, pinnedIds, recentSearches
and tags are restored while UI state remains cleared. Replace
'<OLD_PERSIST_NAME>' with the actual prior persist key used before renaming.

🟠 Major comments (33)

promptfoo/providers/agclawProvider.js-2-3 (1)

2-3: ⚠️ Potential issue | 🟠 Major

Constructor can crash when options is omitted.

On Line 3, options.id is accessed unguarded. If Promptfoo instantiates without config, this throws at startup.

🛠️ Proposed fix

-  constructor(options) {
-    this.providerId = options.id || "agclaw-safety-provider";
+  constructor(options = {}) {
+    this.providerId = options.id || "agclaw-safety-provider";
   }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@promptfoo/providers/agclawProvider.js` around lines 2 - 3, The constructor
currently accesses options.id unguarded which will throw if options is
undefined; update the constructor of the agclaw provider to safely handle
missing config by giving options a default (e.g., constructor(options = {})) or
by initializing a local safeOptions = options || {} and then use safeOptions.id
to set this.providerId (keep symbol names constructor and providerId unchanged);
ensure any other accesses to options in the constructor similarly use the
safeOptions or default parameter.

web/components/share/ShareDialog.tsx-129-134 (1)

129-134: ⚠️ Potential issue | 🟠 Major

Password field is exposed as plain text.

On Line 129, the password input is missing type="password", so sensitive input is visible.

🔒 Proposed fix

               <input
+                type="password"
+                autoComplete="new-password"
                 value={password}
                 onChange={(event) => setPassword(event.target.value)}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@web/components/share/ShareDialog.tsx` around lines 129 - 134, The password
input in ShareDialog is rendered as plain text; change the input element that
uses value={password} and onChange={(event) => setPassword(event.target.value)}
to be a password field by adding type="password" (and optionally
autocomplete="current-password") so the sensitive value is masked.

web/components/share/ShareDialog.tsx-73-89 (1)

73-89: ⚠️ Potential issue | 🟠 Major

Add modal accessibility semantics (role="dialog" + labeling).

The container currently renders visually as a modal but lacks core dialog semantics for screen readers.

♿ Proposed fix

-    <div className="fixed inset-0 z-50 flex items-center justify-center bg-black/60 p-4">
+    <div
+      className="fixed inset-0 z-50 flex items-center justify-center bg-black/60 p-4"
+      role="dialog"
+      aria-modal="true"
+      aria-labelledby="share-dialog-title"
+    >
...
-            <h2 className="text-lg font-semibold text-surface-100">Share Conversation</h2>
+            <h2 id="share-dialog-title" className="text-lg font-semibold text-surface-100">
+              Share Conversation
+            </h2>

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@web/components/share/ShareDialog.tsx` around lines 73 - 89, The modal markup
in ShareDialog lacks dialog semantics; update the outer container div (the fixed
inset-0 wrapper) to include role="dialog" and aria-modal="true", add an id on
the <h2> (e.g., "share-dialog-title") and an id on the descriptive <p> (e.g.,
"share-dialog-desc"), then set aria-labelledby and aria-describedby on the
dialog div to those ids so screen readers can properly announce the dialog title
and description; keep existing onClose handling and the X button as the dismiss
control.

backend/agclaw_backend/test_fixtures.py-69-69 (1)

69-69: ⚠️ Potential issue | 🟠 Major

Change stream default from True to False for OpenAI API compatibility.

The OpenAI Chat Completions API defaults stream to false when omitted. Currently, the fixture at lines 69 and 83 defaults to true, causing clients that omit the parameter to receive streaming responses unexpectedly.
Proposed fix
-            if body.get("stream", True):
+            if body.get("stream", False):
Apply to both line 69 and line 83.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/agclaw_backend/test_fixtures.py` at line 69, The fixture currently
treats the Chat Completions `stream` flag as True by default, causing omitted
client requests to be handled as streaming; update both occurrences of
body.get("stream", True) to body.get("stream", False) so the default matches
OpenAI's non-streaming behavior (search for the body.get("stream", ...) calls in
the test_fixtures fixture and replace True with False).

backend/agclaw_backend/history_store.py-134-140 (1)

134-140: ⚠️ Potential issue | 🟠 Major

Prevent history/detail write divergence on partial failures.

Line 135-Line 140 can persist history while failing artifact write, leaving an entry whose detail_id cannot be loaded.

💾 Suggested fix

     with _LOCK:
-        with history_path.open("a", encoding="utf-8") as handle:
-            handle.write(json.dumps(asdict(entry), ensure_ascii=True) + "\n")
-        (artifact_dir / f"{record_id}.json").write_text(
+        (artifact_dir / f"{record_id}.json").write_text(
             json.dumps(detail_payload, ensure_ascii=True, indent=2),
             encoding="utf-8",
         )
+        with history_path.open("a", encoding="utf-8") as handle:
+            handle.write(json.dumps(asdict(entry), ensure_ascii=True) + "\n")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/agclaw_backend/history_store.py` around lines 134 - 140, The history
file is appended before the artifact file is written, which can leave a history
entry whose detail_id cannot be loaded if the artifact write fails; to fix,
write the artifact payload (detail_payload) first to a temporary file in
artifact_dir and atomically rename it to f"{record_id}.json" (or use write_text
to a temp and Path.replace) and only after that acquire _LOCK and append the
history entry to history_path (writing json.dumps(asdict(entry))). If the
artifact write fails, raise/log and do not append to history; this ensures no
divergence between history entries and their artifact files and references the
symbols _LOCK, history_path, entry, artifact_dir, record_id, and detail_payload.

backend/agclaw_backend/providers.py-142-147 (1)

142-147: ⚠️ Potential issue | 🟠 Major

Fix inconsistency: skip system roles from incoming messages in OpenAI builder.

The _build_openai_messages function coerces system role messages to user role (line 146), which differs from the _build_anthropic_messages implementation that explicitly skips them (lines 154–155). Since incoming messages are separate from the system_prompt parameter, system roles in the message list should be skipped rather than converted to user, matching the Anthropic behavior.

Suggested fix

 def _build_openai_messages(messages: list[dict[str, Any]], system_prompt: str) -> list[dict[str, str]]:
     prepared: list[dict[str, str]] = []
     if system_prompt.strip():
         prepared.append({"role": "system", "content": system_prompt.strip()})
     for message in messages:
         role = message.get("role", "user")
+        if role == "system":
+            continue
         if role == "tool":
             role = "user"
         prepared.append({"role": "assistant" if role == "assistant" else "user", "content": extract_message_text(message.get("content"))})
     return prepared

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/agclaw_backend/providers.py` around lines 142 - 147, In
_build_openai_messages, the loop currently coerces messages with role "system"
(and "tool") into "user", which is inconsistent with _build_anthropic_messages
that skips system messages; update the logic in _build_openai_messages to skip
any message whose role is "system" (and continue to map "tool" to "user" or skip
if desired), so prepared only gets entries created by mapping
assistant->"assistant" and others->"user" after extracting content via
extract_message_text; locate the loop handling messages and the prepared list
and change the role-handling branch to ignore "system" roles rather than
converting them.

web/lib/buddy.ts-159-171 (1)

159-171: ⚠️ Potential issue | 🟠 Major

Guard localStorage access in getBuddySeed to prevent initialization crashes.

Lines 164 and 170 can throw SecurityError in multiple scenarios (e.g., Firefox with disabled cookies, Chrome with third-party blocking, incognito mode, file:// URLs, or iframe restrictions), breaking buddy initialization in ChatLayout.tsx:62.

Wrap the localStorage calls in a try-catch block with a fallback:

Suggested fix

 export function getBuddySeed(): string {
   if (typeof window === "undefined") {
     return DEFAULT_BUDDY_SEED;
   }
-
-  const existing = window.localStorage.getItem(BUDDY_STORAGE_KEY);
-  if (existing) {
-    return existing;
-  }
-
-  const seed = `${DEFAULT_BUDDY_SEED}:${window.location.hostname || "local"}`;
-  window.localStorage.setItem(BUDDY_STORAGE_KEY, seed);
-  return seed;
+  try {
+    const existing = window.localStorage.getItem(BUDDY_STORAGE_KEY);
+    if (existing) {
+      return existing;
+    }
+
+    const seed = `${DEFAULT_BUDDY_SEED}:${window.location.hostname || "local"}`;
+    window.localStorage.setItem(BUDDY_STORAGE_KEY, seed);
+    return seed;
+  } catch {
+    return `${DEFAULT_BUDDY_SEED}:local`;
+  }
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@web/lib/buddy.ts` around lines 159 - 171, The getBuddySeed function currently
calls window.localStorage.getItem and setItem which can throw SecurityError in
some environments; wrap both localStorage accesses in a try-catch inside
getBuddySeed and, on error, fall back to returning DEFAULT_BUDDY_SEED (or the
constructed `${DEFAULT_BUDDY_SEED}:${window.location.hostname || "local"}`)
without attempting to read/write storage; ensure you still use BUDDY_STORAGE_KEY
when storage is available and only persist the seed if setItem succeeds.

.vscode/mcp.json-1-14 (1)

1-14: ⚠️ Potential issue | 🟠 Major

Hardcoded developer-specific absolute paths will break for other contributors.

The file contains absolute Windows paths pointing to a specific OneDrive directory (D:/OneDrive - AG SOLUTION/...). This configuration will not work for any other developer cloning the repository.

Consider one of the following:

Add .vscode/mcp.json to .gitignore and provide a .vscode/mcp.json.example template

Use relative paths (e.g., ./mcp-server/dist/src/index.js)

Document that developers must create this file locally with their own paths
🔧 Suggested template approach

Create .vscode/mcp.json.example:
{
  "servers": {
    "agclaw-source-explorer": {
      "type": "stdio",
      "command": "node",
      "args": [
        "${workspaceFolder}/mcp-server/dist/src/index.js"
      ],
      "env": {
        "AGCLAW_REFERENCE_SRC_ROOT": "${workspaceFolder}/src"
      }
    }
  }
}
And add to .gitignore:
.vscode/mcp.json
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.vscode/mcp.json around lines 1 - 14, The .vscode/mcp.json contains
hardcoded absolute Windows OneDrive paths (e.g., under the
"agclaw-source-explorer" server's "args" and "env" values) which will break for
other contributors; fix by either adding .vscode/mcp.json to .gitignore and
committing a template file (e.g., .vscode/mcp.json.example), or replace the
absolute paths with workspace-relative tokens like
${workspaceFolder}/mcp-server/dist/src/index.js and ${workspaceFolder}/src in
the "args" and "env" settings, or document that each developer must create their
own local file—update the repo to include the chosen approach and ensure the
example/template uses the "agclaw-source-explorer" server key so it's easy to
locate.

web/lib/export/plaintext.ts-41-44 (1)

41-44: ⚠️ Potential issue | 🟠 Major

Fix corrupted ellipsis character in truncation output.

The ellipsis character appears as â€¦ instead of …. This UTF-8/Latin-1 encoding corruption will produce garbled output in user-visible exported files, degrading the export quality.

🐛 Proposed fix

           const text =
             !options.includeFileContents && raw.length > 500
-              ? raw.slice(0, 500) + " â€¦[truncated]"
+              ? raw.slice(0, 500) + " …[truncated]"
               : raw;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@web/lib/export/plaintext.ts` around lines 41 - 44, The truncation suffix
currently uses a mojibake sequence (" â€¦[truncated]"); update the assignment
that builds the text (the variable text where options.includeFileContents and
raw are used) to append a proper ellipsis character (either the Unicode
character U+2026 "…[truncated]" or the escape "\u2026[truncated]") so exported
plaintext is not corrupted; ensure the change is applied where text is set using
raw.slice(0, 500) + ... to replace the bad bytes with the correct ellipsis.

web/lib/export/markdown.ts-35-38 (1)

35-38: ⚠️ Potential issue | 🟠 Major

Fix corrupted ellipsis character in truncation output.

Same encoding corruption issue as in plaintext.ts: the ellipsis appears as â€¦ instead of …. This affects user-visible Markdown exports.

🐛 Proposed fix

         const truncated =
           !options.includeFileContents && raw.length > 500
-            ? raw.slice(0, 500) + "\nâ€¦[truncated]"
+            ? raw.slice(0, 500) + "\n…[truncated]"
             : raw;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@web/lib/export/markdown.ts` around lines 35 - 38, The truncation string uses
a corrupted ellipsis sequence ("â€¦[truncated]"); update the ternary that
defines truncated (the expression using options.includeFileContents and
raw.slice) to use a proper Unicode ellipsis character (… or \u2026) so the
fallback becomes "…[truncated]" instead of "â€¦[truncated]"; ensure the change
is made where truncated is defined and uses raw and options.includeFileContents.

web/app/api/orchestration/history/route.ts-5-6 (1)

5-6: ⚠️ Potential issue | 🟠 Major

Validate and clamp limit before proxying.

Forwarding arbitrary limit values enables abusive/expensive requests and unstable backend latency. Parse to an integer and enforce bounds.

Suggested fix

 export async function GET(request: Request) {
   const url = new URL(request.url);
-  const limit = url.searchParams.get("limit") ?? "10";
-  return proxyBackendGet(`/api/orchestration/history?limit=${encodeURIComponent(limit)}`);
+  const rawLimit = url.searchParams.get("limit");
+  const parsed = Number.parseInt(rawLimit ?? "10", 10);
+  const limit = Number.isFinite(parsed) ? Math.min(100, Math.max(1, parsed)) : 10;
+  return proxyBackendGet(`/api/orchestration/history?limit=${limit}`);
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@web/app/api/orchestration/history/route.ts` around lines 5 - 6, The code is
forwarding the raw "limit" query value to proxyBackendGet which allows abusive
or expensive requests; parse the value from url.searchParams.get("limit") into
an integer (e.g., via Number.parseInt or Number), treat NaN or non-positive
values as a default (10), clamp the parsed value to an allowed range (for
example min 1 and max 100), and then call proxyBackendGet with the clamped value
(encodeURIComponent(clampedLimit)) so the backend only receives safe, bounded
limits; update the variable names in the route handler where "limit" is read and
passed to proxyBackendGet.

web/app/layout.tsx-21-21 (1)

21-21: ⚠️ Potential issue | 🟠 Major

Remove or define the font CSS variables in Tailwind config.

The --font-inter and --font-jetbrains-mono CSS variables are referenced in tailwind.config.ts (lines 125–126) but are not defined in globals.css. This will cause font fallback to system fonts, creating visual regressions. Either remove these variable references from the Tailwind config or restore their definitions in globals.css.
References found in tailwind.config.ts
web/tailwind.config.ts:125:        sans: ["var(--font-inter)", "system-ui", "sans-serif"],
web/tailwind.config.ts:126:        mono: ["var(--font-jetbrains-mono)", "ui-monospace", "monospace"],
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/app/layout.tsx` at line 21, The Tailwind theme keys for sans and mono
reference CSS variables --font-inter and --font-jetbrains-mono but those
variables are missing; either remove the variable references from the theme's
sans/mono arrays or restore definitions for --font-inter and
--font-jetbrains-mono in your global CSS by adding `@font-face` imports or setting
:root variables that point to the loaded font stacks (e.g., assign Inter and
JetBrains Mono to the respective variables), then ensure the body class
(font-sans) uses them; update the theme or globals so the symbols --font-inter
and --font-jetbrains-mono resolve.

web/playwright.config.ts-14-21 (1)

14-21: ⚠️ Potential issue | 🟠 Major

PowerShell command breaks cross-platform CI/development.

The webServer.command uses PowerShell syntax (powershell -NoProfile -Command, Test-Path, Remove-Item, $env:, $LASTEXITCODE) which will fail on Linux/macOS environments. Most CI runners (GitHub Actions, etc.) use Linux by default.

🛠️ Proposed cross-platform fix

Consider using a Node.js script or npm script that works cross-platform:

   webServer: {
-    command:
-      "powershell -NoProfile -Command \"if (Test-Path .next) { Remove-Item -LiteralPath .next -Recurse -Force -ErrorAction SilentlyContinue }; $env:AGCLAW_WEB_ROOT='..'; npm run build; if ($LASTEXITCODE -ne 0) { exit $LASTEXITCODE }; node scripts/start-e2e-server.mjs\"",
+    command: "node scripts/prepare-e2e.mjs && npm run build && AGCLAW_WEB_ROOT=.. node scripts/start-e2e-server.mjs",
     cwd: path.resolve(__dirname),

Then create scripts/prepare-e2e.mjs:

import { rm } from "fs/promises";
import { existsSync } from "fs";

if (existsSync(".next")) {
  await rm(".next", { recursive: true, force: true });
}

Alternatively, use cross-env and rimraf packages for cross-platform compatibility.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@web/playwright.config.ts` around lines 14 - 21, The PowerShell-based
webServer.command in playwright.config.ts is not cross-platform; replace the
inline PowerShell command with a cross-platform npm/script invocation and update
webServer.command to run that script (e.g., change webServer.command to "node
./scripts/prepare-e2e.mjs && npm run build && node scripts/start-e2e-server.mjs"
or an equivalent npm lifecycle script), add a new scripts/prepare-e2e.mjs that
removes the .next directory using fs/promises (rm) and existsSync for safe
deletion, and ensure any env var setting needed for AGCLAW_WEB_ROOT is done in a
cross-platform way (via cross-env in package.json scripts or within the
prepare-e2e.mjs) so webServer.command and related settings (url, timeout,
reuseExistingServer) remain unchanged.

web/app/api/_backendProxy.ts-20-35 (1)

20-35: ⚠️ Potential issue | 🟠 Major

Wrap upstream fetch failures and return explicit 502.

If backend is unreachable, fetch throws and this route fails with an unstructured 500. Return a controlled gateway error for callers.

💡 Suggested fix

-  const response = await fetch(`${backendBaseUrl}${path}`, {
-    method: request.method,
-    headers: {
-      "Content-Type": request.headers.get("Content-Type") ?? "application/json",
-    },
-    body: rawBody || undefined,
-    cache: "no-store",
-  });
+  let response: Response;
+  try {
+    response = await fetch(`${backendBaseUrl}${path}`, {
+      method: request.method,
+      headers: {
+        "Content-Type": request.headers.get("Content-Type") ?? "application/json",
+      },
+      body: rawBody || undefined,
+      cache: "no-store",
+    });
+  } catch {
+    return NextResponse.json({ error: "Upstream backend unavailable" }, { status: 502 });
+  }
@@
-  const response = await fetch(`${backendBaseUrl}${pathWithQuery}`, { cache: "no-store" });
+  let response: Response;
+  try {
+    response = await fetch(`${backendBaseUrl}${pathWithQuery}`, { cache: "no-store" });
+  } catch {
+    return NextResponse.json({ error: "Upstream backend unavailable" }, { status: 502 });
+  }

Also applies to: 50-56

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@web/app/api/_backendProxy.ts` around lines 20 - 35, Wrap the upstream fetch
calls (the ones calling fetch(`${backendBaseUrl}${path}`, ...) that assign to
the local response variable) in try/catch blocks so network or DNS errors are
caught; on catch return a controlled NextResponse with status 502 and a small
JSON or text payload indicating "Bad Gateway" (or similar) instead of allowing
an unhandled exception to produce a 500. Ensure both occurrences (the fetch
using rawBody and the other fetch at lines ~50-56) are protected, and continue
to preserve response.headers and status for successful fetches; use the existing
NextResponse constructor to build the 502 response.

backend/scripts/benchmark_backend.py-51-55 (1)

51-55: ⚠️ Potential issue | 🟠 Major

Validate --iterations to prevent empty-sample crashes.

iterations=0 (or negative) leads to empty samples_ms and runtime failures in stats/max.

💡 Suggested fix

     parser.add_argument("--iterations", type=int, default=10)
@@
     args = parser.parse_args()
+    if args.iterations < 1:
+        parser.error("--iterations must be >= 1")

Also applies to: 31-46

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/scripts/benchmark_backend.py` around lines 51 - 55, The
args.iterations value isn't validated so passing 0 or negative causes empty
samples_ms and failures; after parsing (args = parser.parse_args()) check if
args.iterations <= 0 and call parser.error("iterations must be a positive
integer") or otherwise exit with a clear message, and ensure any code that
builds samples_ms or computes stats (references to samples_ms and iterations)
assumes iterations > 0; apply the same validation for the earlier argument
parsing block (lines 31-46) so iterations is always a positive integer before
the benchmark run.

web/components/file-viewer/DesktopFileViewer.tsx-36-51 (1)

36-51: ⚠️ Potential issue | 🟠 Major

Handle save failures explicitly instead of throwing in event handler.

Throwing inside handleSave creates an unhandled rejection path and gives users no feedback when save fails.

💡 Suggested fix

   const handleSave = async () => {
@@
     setIsSaving(true);
     try {
       const response = await fetch("/api/files/write", {
@@
       if (!response.ok) {
-        throw new Error(`HTTP ${response.status}`);
+        return;
       }
       markSaved(activeTab.id);
+    } catch {
+      // show toast/banner here if available
     } finally {
       setIsSaving(false);
     }
   };

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@web/components/file-viewer/DesktopFileViewer.tsx` around lines 36 - 51,
handleSave currently throws on non-OK fetch responses which creates unhandled
rejections and no user feedback; change handleSave to catch errors instead of
throwing: wrap the fetch/response.ok check in try/catch (keep the existing
finally to call setIsSaving(false)), and on error call the UI feedback mechanism
(e.g., show a toast/notification or set a save error state) and log the error
instead of throwing; ensure markSaved(activeTab.id) is only called on successful
response.ok and that any error paths do not leave isSaving true.

web/components/settings/ModelSettings.tsx-70-71 (1)

70-71: ⚠️ Potential issue | 🟠 Major

Guard max-token input against NaN state.

Clearing or entering invalid numeric text makes Number(event.target.value) become NaN, which can poison settings and downstream requests.

💡 Suggested fix

- onChange={(event) => updateSettings({ maxTokens: Number(event.target.value) })}
+ onChange={(event) => {
+   const parsed = Number.parseInt(event.target.value, 10);
+   if (Number.isNaN(parsed)) return;
+   const clamped = Math.min(200000, Math.max(1000, parsed));
+   updateSettings({ maxTokens: clamped });
+ }}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@web/components/settings/ModelSettings.tsx` around lines 70 - 71, The onChange
handler for the max-tokens input currently calls updateSettings({ maxTokens:
Number(event.target.value) }) which can produce NaN when the input is empty or
invalid; change the handler in ModelSettings.tsx to coerce and guard the parsed
value: parse/trim the input (e.g. const raw = event.target.value.trim(); const
parsed = Number(raw)); if Number.isNaN(parsed) use a safe fallback (e.g. 0 or
previous valid value) before calling updateSettings (updateSettings({ maxTokens:
parsedOrFallback })). Ensure you reference the existing onChange callback and
updateSettings identifier so the input never writes NaN into settings.

backend/scripts/benchmark_backend.py-15-28 (1)

15-28: ⚠️ Potential issue | 🟠 Major

Add request timeouts and non-2xx handling in benchmark probes.

urlopen without timeout can hang indefinitely, and HTTPError currently aborts the whole run instead of recording a sample.

💡 Suggested fix

+from urllib.error import HTTPError, URLError
@@
 def _post_json(url: str, payload: dict[str, object]) -> None:
@@
-    with urlopen(request) as response:
-        response.read()
+    try:
+        with urlopen(request, timeout=10) as response:
+            response.read()
+    except HTTPError as error:
+        error.read()
+    except URLError:
+        raise
@@
 def _get(url: str) -> None:
-    with urlopen(url) as response:
-        response.read()
+    try:
+        with urlopen(url, timeout=10) as response:
+            response.read()
+    except HTTPError as error:
+        error.read()
+    except URLError:
+        raise

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/scripts/benchmark_backend.py` around lines 15 - 28, The benchmark
probe functions _post_json and _get currently call urlopen with no timeout and
let exceptions like HTTPError/URLError bubble up, which can hang or abort the
entire run; update both functions to pass a sensible timeout (e.g., as a
constant or parameter) to urlopen, wrap the call in a try/except that catches
urllib.error.HTTPError and URLError, record the response status and body (or
error status) as a sample instead of raising, and ensure non-2xx HTTP responses
are handled as valid samples (capture status code and response text) so the
benchmark run continues rather than aborting.

scripts/start-agclaw-local.ps1-26-28 (1)

26-28: ⚠️ Potential issue | 🟠 Major

Require the expected health status code.

Wait-HttpReady currently treats any non-5xx response as healthy. That can unblock startup on redirects or other unexpected responses even when the intended health endpoint is not actually ready.

✅ Suggested fix

-      if ($response.StatusCode -ge 200 -and $response.StatusCode -lt 500) {
+      if ($response.StatusCode -eq 200) {
         return $true
       }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/start-agclaw-local.ps1` around lines 26 - 28, The Wait-HttpReady
check currently treats any non-5xx status as healthy; update the health
validation in the function Wait-HttpReady so it only returns success for
explicit 2xx responses (e.g., require $response.StatusCode -ge 200 -and
$response.StatusCode -lt 300) instead of the broader -lt 500 check; locate the
Invoke-WebRequest call that assigns $response and change the conditional that
examines $response.StatusCode to enforce the 2xx range and keep retries/timeouts
intact.

scripts/start-agclaw-local.ps1-103-113 (1)

103-113: ⚠️ Potential issue | 🟠 Major

Don't print "stack started" before the web shell is reachable.

The backend gets a readiness probe, but the web process doesn't. A port collision or failed npm run dev still ends with a success banner here.

🌐 Suggested readiness check

 Write-Host "Starting AG-Claw web shell on port $WebPort..." -ForegroundColor Cyan
 Start-Process powershell -ArgumentList @('-NoExit', '-Command', $webCommand) -WorkingDirectory $webDir | Out-Null
+
+if (-not (Wait-HttpReady -Url "http://127.0.0.1:$WebPort/health" -TimeoutSeconds 30)) {
+  throw "Web shell did not become healthy on http://127.0.0.1:$WebPort"
+}
 
 Write-Host ""
 Write-Host "AG-Claw local stack started." -ForegroundColor Green

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/start-agclaw-local.ps1` around lines 103 - 113, The script currently
prints "AG-Claw local stack started." immediately after Start-Process, which can
hide start failures; change the flow so after launching the web shell with
Start-Process (use Start-Process ... -PassThru or capture the process object
from $webCommand), poll the web endpoint http://localhost:$WebPort (e.g., with
Invoke-WebRequest or Test-NetConnection) until it returns a successful response
or a configurable timeout is reached, and only then print the success banner and
UI instructions; if the port never becomes reachable, print an error message
with the process state and exit non‑zero instead of declaring success. Ensure
your readiness loop references Start-Process, $webCommand, $WebPort, and $webDir
so it’s clear where to add the check.

web/scripts/start-e2e-server.mjs-13-39 (1)

13-39: ⚠️ Potential issue | 🟠 Major

Handle spawn failures and any backend exit unconditionally.

The code currently has no error listeners on either child process, and the backend exit handler (line 54–59) only tears down the Next.js child when exit code is truthy. This means:

Spawn failures (e.g., python not found) are unhandled and can crash the launcher.
A backend exit with code 0 or via signal leaves the Next.js child running, causing E2E orchestration to hang.

The fix should add error listeners on both spawns and ensure the child is always killed when the backend exits, regardless of exit code or signal.

🔧 Suggested supervision fix

+const fail = (name) => (error) => {
+  console.error(`${name} failed to start:`, error);
+  shutdown("SIGTERM");
+  process.exit(1);
+};
+
 const backend = spawn(
   "python",
   ["-m", "agclaw_backend.server", "--host", "127.0.0.1", "--port", backendPort],
   {
     cwd: repoRoot,
     stdio: "inherit",
     env: {
       ...process.env,
       PYTHONPATH: path.resolve(repoRoot, "backend"),
       AGCLAW_BACKEND_MOCK_CHAT: "1",
       AGCLAW_BACKEND_MOCK_HEALTH: "1",
       AGCLAW_BACKEND_QUIET: "1",
     },
   }
 );
+backend.on("error", fail("backend"));
 
 const child = spawn(process.execPath, [nextBin, "start", "-p", port], {
   cwd: webDir,
   stdio: "inherit",
   env: {
     ...process.env,
     AGCLAW_WEB_ROOT: repoRoot,
     AGCLAW_BACKEND_URL: `http://127.0.0.1:${backendPort}`,
     NEXT_PUBLIC_APP_URL: `http://127.0.0.1:${port}`,
   },
 });
+child.on("error", fail("web"));
 
 const shutdown = (signal) => {
   if (!backend.killed) {
     backend.kill(signal);
   }
   if (!child.killed) {
     child.kill(signal);
   }
 };
 
 child.on("exit", (code) => {
   shutdown("SIGTERM");
   process.exit(code ?? 0);
 });
 
-backend.on("exit", (code) => {
-  if (code && !child.killed) {
-    child.kill("SIGTERM");
-    process.exit(code);
-  }
+backend.on("exit", (code, signal) => {
+  if (!child.killed) {
+    child.kill("SIGTERM");
+  }
+  process.exit(code ?? (signal ? 1 : 0));
 });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@web/scripts/start-e2e-server.mjs` around lines 13 - 39, Add error listeners
to both spawned processes and always terminate the Next.js child when the
backend exits: attach backend.on('error', ...) and child.on('error', ...)
handlers to log and exit/cleanup on spawn failures (handle cases like
executable-not-found), and change the backend.on('exit', ...) handler (and also
handle 'close'/'signal' if present) to unconditionally kill/cleanup the child
process (child.kill() and wait/exit) regardless of exit code or signal so the
launcher cannot hang when the backend stops.

web/app/api/provider-health/route.ts-48-50 (1)

48-50: ⚠️ Potential issue | 🟠 Major

API keys in query parameters pose a security risk.

Reading apiKey from query parameters (line 50) means the key appears in:

Server access logs

Browser history

Proxy/CDN logs

Referer headers if the response triggers navigation

Consider accepting API keys via request headers or switching to POST with a JSON body.
Suggested approach

Option 1: Use a custom header
const apiKey = request.headers.get("x-provider-api-key") ?? "";
Option 2: Switch to POST with body (requires client changes)
export async function POST(request: NextRequest) {
  const body = await request.json();
  const apiKey = body.apiKey ?? "";
  // ...
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/app/api/provider-health/route.ts` around lines 48 - 50, The handler
currently reads the sensitive apiKey from request.nextUrl.searchParams into the
apiKey variable (and then passes it to normalizeBaseUrl/health checks), which
leaks secrets; change the handler to instead read the API key from a secure
source such as a custom request header (e.g. use
request.headers.get("x-provider-api-key") to populate apiKey) or convert the
endpoint to accept POST and parse apiKey from the JSON body, and remove usage of
request.nextUrl.searchParams.get("apiKey"); ensure downstream calls
(normalizeBaseUrl, provider health logic) use the new apiKey source and update
any client calls accordingly.

web/app/api/provider-health/route.ts-25-40 (1)

25-40: ⚠️ Potential issue | 🟠 Major

Backend proxy also forwards apiKey in query string.

When proxying to the backend (line 31), the full query string including apiKey is forwarded. This propagates the same security concern to the backend logs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/app/api/provider-health/route.ts` around lines 25 - 40, proxyToBackend is
currently forwarding the entire query string (including sensitive apiKey) to the
backend via the target URL; modify proxyToBackend to copy
request.nextUrl.searchParams into a new URLSearchParams (or URL), remove the
"apiKey" param before building target (the variable target) so the forwarded
query string no longer contains apiKey, and if the backend needs authentication
send the key in a secure header (e.g., Authorization) instead of in the query
string.

web/components/research/ResearchWorkbench.tsx-243-350 (1)

243-350: ⚠️ Potential issue | 🟠 Major

Persisted orchestration history is unreachable until a new run succeeds.

Recent orchestration runs and selectedHistoryDetail are nested under orchestrateResponse && ..., so opening the workbench with existing history but no current response renders nothing. That makes the history viewer unusable on a fresh page load.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/components/research/ResearchWorkbench.tsx` around lines 243 - 350, The
Recent orchestration runs and persisted detail UI are currently nested inside
the orchestrateResponse conditional so they are not rendered when
orchestrationHistory exists but orchestrateResponse is null; refactor the JSX so
that the block rendering orchestrationHistory and the selectedHistoryDetail
panel is moved out of (or rendered alongside) the orchestrateResponse && ...
conditional, ensuring orchestrationHistory.map(...) (and the button that calls
loadHistoryDetail) and the selectedHistoryDetail rendering (using
selectedHistoryDetail and setSelectedHistoryDetail) are shown independently of
orchestrateResponse.

backend/agclaw_backend/http_api.py-63-70 (1)

63-70: ⚠️ Potential issue | 🟠 Major

/api/chat is not actually streaming yet.

_send_sse() concatenates every chunk, sets Content-Length, and writes once. When the Next.js route proxies this backend, users will not see incremental tokens until the provider call has fully finished.

Also applies to: 121-133

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/agclaw_backend/http_api.py` around lines 63 - 70, The current
_send_sse concatenates all chunks and sets Content-Length, preventing true
streaming; change _send_sse to stream each SSE event as it's produced: remove
setting Content-Length, write each event individually by converting each chunk
via _sse_bytes (or formatting per-event SSE), call self.wfile.write(...) for
each event followed by self.wfile.flush() so the client receives incremental
tokens, and ensure the same streaming pattern is applied to the analogous
handler used around the other chunked response (the method at the other
commented block that also builds SSEs) so neither path buffers all chunks before
sending.

web/app/api/chat/route.ts-131-149 (1)

131-149: ⚠️ Potential issue | 🟠 Major

Forward the request abort signal to the upstream fetches.

The UI's Stop button aborts the browser request, but none of these provider/backend fetch() calls receive req.signal. Generation can keep running after the client disconnects, which wastes tokens and continues work the user already canceled.

Also applies to: 179-194, 267-277, 374-405
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/app/api/chat/route.ts` around lines 131 - 149, The proxyToBackend fetch
doesn't forward the incoming request abort signal, so pass the client's
AbortSignal to upstream fetches by adding signal: req.signal in the fetch init
(e.g., update proxyToBackend's fetch call to include { method: "POST", headers:
..., body: JSON.stringify(body), signal: req.signal }). Do the same for the
other upstream/provider fetches referenced in the review (the similar fetch
blocks at lines 179-194, 267-277, 374-405) so all provider/backend fetches abort
when the client aborts.

backend/agclaw_backend/http_api.py-50-53 (1)

50-53: ⚠️ Potential issue | 🟠 Major

Harden request parsing with size limits and structured 400/413 responses.

_read_json() trusts Content-Length, reads the whole body into memory, and later code casts untrusted values directly with json.loads(...), int(...), ChatProvider(...), and OrchestratorRole(...). One oversized or malformed request currently ends in a dropped handler thread instead of a predictable JSON error response.

Also applies to: 77-105, 109-193
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/agclaw_backend/http_api.py` around lines 50 - 53, Update _read_json
to enforce a maximum body size (e.g. configurable MAX_BODY_BYTES), do not trust
Content-Length blindly (cap it and read only up to the cap, or stream in chunks)
and raise/return a specific 413 error when the payload exceeds the limit; wrap
json.loads in a try/except catching json.JSONDecodeError and return a structured
400 response when JSON is malformed; likewise, wrap downstream casts/parses
(int(...), ChatProvider(...), OrchestratorRole(...)) in try/except to return
structured 400 responses for invalid types or unknown enum values; apply the
same validations and error handling to the other request-parsing code paths that
call json.loads/int/ChatProvider/OrchestratorRole so all failures produce
deterministic 400/413 responses instead of letting the handler thread crash.

web/app/api/chat/route.ts-178-244 (1)

178-244: ⚠️ Potential issue | 🟠 Major

Handle Anthropic non-stream responses before trying to parse SSE.

When body.stream === false, this branch sends a non-streaming request upstream, but the response is still fed through readSseStream(). Anthropic returns a normal JSON body in that mode, so this stream never emits a final chunk and the caller hangs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web/app/api/chat/route.ts` around lines 178 - 244, The stream-handling code
wrongly always feeds upstreamResponse into readSseStream even when the client
requested a non-stream response (body.stream === false); modify the
ReadableStream start logic to first detect non-stream mode (check body.stream
=== false) or upstreamResponse content-type, then await and parse
upstreamResponse.json() instead of calling readSseStream, extract the final
text/message from the JSON payload and controller.enqueue a single sseMessage
with that text followed by controller.enqueue(sseDone()) and controller.close();
otherwise fall back to the existing readSseStream handling. Reference symbols:
body.stream, upstreamResponse, readSseStream,
controller.enqueue(sseMessage(...)), sseDone().

web/components/research/ResearchWorkbench.tsx-692-733 (1)

692-733: ⚠️ Potential issue | 🟠 Major

Give this overlay real dialog semantics and keyboard handling.

The workbench opens as a modal, but it lacks role="dialog", aria-modal, focus management, and Escape handling. Keyboard and screen-reader users can still interact with the page behind it.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@web/components/research/ResearchWorkbench.tsx` around lines 692 - 733, The
overlay rendered by ResearchWorkbench needs proper dialog semantics and keyboard
handling: add role="dialog" and aria-modal="true" to the top-level container and
link aria-labelledby to the Research Workbench heading (give the <h2> an id);
implement focus management in the ResearchWorkbench component so that when the
modal opens focus moves to the close button (or first focusable control) and
when it closes focus returns to the previously focused element; add an Escape
key handler (useEffect) that calls closeResearch; and implement a basic focus
trap (either a small internal trap using keydown/tab logic in useEffect or by
integrating a focus-trap library) so keyboard users cannot tab out to the
underlying page. Use identifiers like ResearchWorkbench, closeResearch,
setActiveSection, activeSection, and SECTION_META to locate the code to change.

web/components/research/ResearchWorkbench.tsx-203-213 (1)

203-213: ⚠️ Potential issue | 🟠 Major

Remove the hard-coded local workspace path from the client payload.

This sends a developer-machine path in every orchestration request. It will be wrong for almost every user, and it also gets written into persisted orchestration history.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@web/components/research/ResearchWorkbench.tsx` around lines 203 - 213, In
ResearchWorkbench.tsx remove the hard-coded local path in the orchestration
payload: stop sending context.workspace_root = "D:/OneDrive - AG
SOLUTION/claude-code" inside the body JSON.stringify block used when building
the orchestration request; instead derive the value from a safe source (e.g., a
prop/state named workspaceRoot passed into the ResearchWorkbench component, or a
runtime config/env var such as NEXT_PUBLIC_WORKSPACE_ROOT) or omit
workspace_root entirely if unavailable, and update any references to context in
the same request-building code to use that variable (look for the JSON.stringify
call that includes prompt/orchestratePrompt, provider, model, roles, context).

web/lib/store.ts-395-400 (1)

395-400: ⚠️ Potential issue | 🟠 Major

Resetting the API section can leave provider and model out of sync.

This branch now resets provider/localMode, but it leaves settings.model untouched. If a user was on Ollama or vLLM and hits “reset API”, the next request can become provider: "anthropic" with a local model id, which will fail on the next chat call.

💡 Possible fix

          api: {
            localMode: DEFAULT_SETTINGS.localMode,
            provider: DEFAULT_SETTINGS.provider,
            apiUrl: DEFAULT_SETTINGS.apiUrl,
            apiKey: DEFAULT_SETTINGS.apiKey,
            streamingEnabled: DEFAULT_SETTINGS.streamingEnabled,
+           model: getDefaultModelForProvider(DEFAULT_SETTINGS.provider),
          },

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@web/lib/store.ts` around lines 395 - 400, Resetting the api block leaves
settings.model inconsistent with provider/localMode; when you reset api (the api
object with keys api.localMode, provider, apiUrl, apiKey, streamingEnabled) also
update settings.model to a safe default (e.g., DEFAULT_SETTINGS.model or null)
so the model matches the new provider/localMode, ensuring provider/localMode are
set before assigning model; update the reset logic to set settings.model
alongside api using DEFAULT_SETTINGS.model (or clear it) to keep provider and
model in sync.

backend/tests/test_http_api.py-24-24 (1)

24-24: ⚠️ Potential issue | 🟠 Major

Replace fixed sleep with readiness polling to avoid flaky startup.

time.sleep(0.05) on Line 24 is race-prone on slower CI runners. Poll /health with a deadline instead.

Suggested fix

-        time.sleep(0.05)
+        deadline = time.time() + 3
+        while True:
+            try:
+                with urlopen(f"http://127.0.0.1:{cls.port}/health", timeout=0.2):
+                    break
+            except Exception:
+                if time.time() >= deadline:
+                    raise TimeoutError("Backend test server did not become ready in time")
+                time.sleep(0.05)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/tests/test_http_api.py` at line 24, Replace the brittle
time.sleep(0.05) wait with a readiness poll loop: repeatedly GET the /health
endpoint (with a short per-request timeout) until it returns a healthy status or
a deadline (e.g., 5–10s) is reached, failing the test if the deadline elapses;
locate the occurrence of time.sleep(0.05) in backend/tests/test_http_api.py and
swap it for this polling logic so the test waits deterministically for the
service to be ready.

backend/tests/test_http_api.py-42-43 (1)

42-43: ⚠️ Potential issue | 🟠 Major

Add explicit timeout parameter to all urlopen() calls in tests to prevent test hangs.

Currently, every urlopen(...) call in this file lacks a timeout, which can stall the entire test suite if any request hangs. All instances need explicit timeout values.

Affected lines: 42, 48, 54, 60, 85, 107, 126, 144, 156, 184, 194, 200, 208, 228, 264

Suggested fix pattern

-        with urlopen(self._url("/health")) as response:
+        with urlopen(self._url("/health"), timeout=3) as response:
             payload = json.loads(response.read().decode("utf-8"))

-            urlopen(request)
+            urlopen(request, timeout=3)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@backend/tests/test_http_api.py` around lines 42 - 43, The tests use
urllib.request.urlopen without a timeout which can hang the test suite; update
every urlopen(...) call in backend/tests/test_http_api.py (e.g., the call inside
the health-check block and all other occurrences at the listed affected lines)
to pass an explicit timeout argument (e.g., urlopen(self._url("/health"),
timeout=...) or urlopen(req, timeout=...)), choosing a reasonable short timeout
value (e.g., 5–10 seconds) and apply the same change for all listed instances to
ensure tests fail fast instead of hanging; keep the modification limited to
adding the timeout parameter where urlopen is invoked.

coderabbitai · 2026-04-04T10:16:47Z

backend/agclaw_backend/history_store.py

+def get_orchestration_detail(detail_id: str) -> OrchestrationArtifactBundle | None:
+    path = _artifact_dir() / f"{detail_id}.json"
+    if not path.exists():


⚠️ Potential issue | 🔴 Critical

Validate detail_id before building filesystem paths.

Line 162 currently trusts detail_id directly, which enables path traversal attempts against the artifact directory.

🔒 Suggested fix

-from uuid import uuid4 +from uuid import UUID, uuid4 ... def get_orchestration_detail(detail_id: str) -> OrchestrationArtifactBundle | None: - path = _artifact_dir() / f"{detail_id}.json" + try: + safe_id = str(UUID(detail_id)) + except ValueError: + return None + path = _artifact_dir() / f"{safe_id}.json"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/agclaw_backend/history_store.py` around lines 161 - 163, The get_orchestration_detail function builds a filesystem path from the untrusted detail_id, enabling path traversal; before using detail_id to construct path (in get_orchestration_detail and anywhere _artifact_dir() / f"{detail_id}.json" is used) validate/sanitize it: reject or return None for any value containing path separators or parent-dir tokens ("/", "\\", ".."), and enforce a strict allowed pattern (e.g., UUID or /^[A-Za-z0-9._-]+$/) so only safe filenames are accepted; alternatively, normalize the candidate filename and compare its resolved parent to ensure it stays inside _artifact_dir() and refuse requests that escape the artifact directory.

coderabbitai · 2026-04-04T10:16:47Z

backend/pyproject.toml

+dependencies = []
+# Optional: tracing and instrumentation for observability. These are best-effort
+# -- the code will run even if these packages are not installed.
+dependencies = [
+	"opentelemetry-sdk>=1.18.0",
+	"opentelemetry-exporter-otlp>=1.18.0",
+	"opentelemetry-instrumentation-requests>=1.18.0",
+]


⚠️ Potential issue | 🔴 Critical

Duplicate dependencies key violates TOML specification.

The dependencies key is defined twice (lines 12 and 15). This is invalid TOML—duplicate keys in the same table are prohibited. While some lenient parsers may silently use the last value, others (including pip and build tools) may reject the file entirely or behave unpredictably.

Remove the empty dependencies = [] on line 12.

Proposed fix

requires-python = ">=3.11" authors = [{ name = "AG Solution" }] -dependencies = [] # Optional: tracing and instrumentation for observability. These are best-effort # -- the code will run even if these packages are not installed. dependencies = [

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

dependencies = []

# Optional: tracing and instrumentation for observability. These are best-effort

# -- the code will run even if these packages are not installed.

dependencies = [

"opentelemetry-sdk>=1.18.0",

"opentelemetry-exporter-otlp>=1.18.0",

"opentelemetry-instrumentation-requests>=1.18.0",

]

# Optional: tracing and instrumentation for observability. These are best-effort

# -- the code will run even if these packages are not installed.

dependencies = [

"opentelemetry-sdk>=1.18.0",

"opentelemetry-exporter-otlp>=1.18.0",

"opentelemetry-instrumentation-requests>=1.18.0",

]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/pyproject.toml` around lines 12 - 19, The pyproject.toml defines the table key "dependencies" twice which violates TOML; remove the first empty declaration (the line `dependencies = []`) so only the intended dependencies list (the second "dependencies" array containing opentelemetry packages) remains, ensuring the file has a single "dependencies" key.

coderabbitai · 2026-04-04T10:16:47Z

web/components/settings/ApiSettings.tsx

+      const params = new URLSearchParams({
+        provider: settings.provider,
+        apiUrl: settings.apiUrl,
+      });
+      if (settings.apiKey) {
+        params.set("apiKey", settings.apiKey);
+      }
+      const response = await fetch(`/api/provider-health?${params.toString()}`, {
+        signal: AbortSignal.timeout(5000),


⚠️ Potential issue | 🔴 Critical

Do not send apiKey in the URL.

Putting the API key in the query string leaks it into browser history, server logs, proxy logs, and cache keys. This needs to move to a POST body or header-based transport instead.

🔐 Suggested client-side change

- const params = new URLSearchParams({ - provider: settings.provider, - apiUrl: settings.apiUrl, - }); - if (settings.apiKey) { - params.set("apiKey", settings.apiKey); - } - const response = await fetch(`/api/provider-health?${params.toString()}`, { + const response = await fetch("/api/provider-health", { + method: "POST", + headers: { "Content-Type": "application/json" }, + body: JSON.stringify({ + provider: settings.provider, + apiUrl: settings.apiUrl, + apiKey: settings.apiKey || undefined, + }), signal: AbortSignal.timeout(5000), });

The route handler will need the matching POST-body change as well.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const params = new URLSearchParams({

provider: settings.provider,

apiUrl: settings.apiUrl,

});

if (settings.apiKey) {

params.set("apiKey", settings.apiKey);

}

const response = await fetch(`/api/provider-health?${params.toString()}`, {

signal: AbortSignal.timeout(5000),

const response = await fetch("/api/provider-health", {

method: "POST",

headers: { "Content-Type": "application/json" },

body: JSON.stringify({

provider: settings.provider,

apiUrl: settings.apiUrl,

apiKey: settings.apiKey || undefined,

}),

signal: AbortSignal.timeout(5000),

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@web/components/settings/ApiSettings.tsx` around lines 37 - 45, The code is leaking settings.apiKey by adding it to URLSearchParams and calling fetch(`/api/provider-health?...`); instead send the provider and apiUrl as query or body but remove apiKey from the query—use a POST to "/api/provider-health" (or include apiKey in an Authorization header) and put settings.apiKey in the request body or header instead; update the fetch call that currently uses params and AbortSignal.timeout(5000) to use method: "POST" with a JSON body (or header) containing apiKey and adjust the server route handler to read apiKey from the request body or header rather than the query string.

coderabbitai · 2026-04-04T10:16:48Z

web/lib/api.ts

+      stream: settings.streamingEnabled,
+      settings,
+    }),


⚠️ Potential issue | 🔴 Critical

Handle non-stream chat responses before SSE parsing.

streamChat now requests non-stream mode when settings.streamingEnabled is false, but the reader loop still only parses SSE data: lines. In non-stream mode this can drop the assistant payload entirely.

💡 Suggested fix

if (!response.ok) { const err = await response.text(); yield { type: "error", error: err }; return; } + + const contentType = response.headers.get("content-type") ?? ""; + if (!settings.streamingEnabled || !contentType.includes("text/event-stream")) { + const text = await response.text(); + if (text.trim().length > 0) { + yield { type: "text", content: text }; + } + yield { type: "done" }; + return; + } const reader = response.body?.getReader();

Also applies to: 64-93

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@web/lib/api.ts` around lines 43 - 45, The reader loop in streamChat only parses SSE "data:" lines and thus ignores full non-stream responses when settings.streamingEnabled is false; update the streamChat implementation(s) to detect non-stream responses (when settings.streamingEnabled is false or Content-Type is application/json), read and parse the full JSON response body, extract the assistant payload and feed it into the same controller/consumer path (or return it) used by SSE parsing, and preserve existing SSE handling for streaming mode; apply this fix in the streamChat reader logic referenced in web/lib/api.ts (both the initial call site around streamChat/settings.streamingEnabled and the second similar reader block between lines 64-93).

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

README.md (1)

27-113: Consolidate duplicated startup instructions to avoid drift.

“Quick Start” and the dedicated “Run The Backend/Web UI” sections repeat near-identical steps. Consider keeping one canonical command block per runtime and linking to it from Quick Start.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 27 - 113, The README duplicates startup steps between
the "Quick Start: See The UI", "Run The Backend" and "Run The Web UI" sections;
refactor by extracting one canonical command block for backend startup and one
for web-ui startup (keep the exact PowerShell commands currently under "Run The
Backend" and "Run The Web UI"), remove the duplicate blocks from "Quick Start:
See The UI" and replace them with short links or pointers that reference the
canonical sections (or anchor links to "Run The Backend" and "Run The Web UI");
ensure headings ("Quick Start: See The UI", "Run The Backend", "Run The Web UI")
remain and update any paths/env examples to a single source of truth so
instructions cannot drift.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md`:
- Around line 33-37: The README uses machine-specific absolute paths (e.g.,
Set-Location "d:\OneDrive - ...\claude-code") which breaks portability; update
the PowerShell snippets that set location and PYTHONPATH to use
repo-root-relative references (e.g., use $PWD, $PSScriptRoot, or git rev-parse
--show-toplevel) and resolve the backend path with Resolve-Path ./backend so the
commands that set $env:PYTHONPATH and run python -m agclaw_backend.server work
on any machine or CI; apply the same change pattern to all similar blocks
referenced (lines noted in the comment).

---

Nitpick comments:
In `@README.md`:
- Around line 27-113: The README duplicates startup steps between the "Quick
Start: See The UI", "Run The Backend" and "Run The Web UI" sections; refactor by
extracting one canonical command block for backend startup and one for web-ui
startup (keep the exact PowerShell commands currently under "Run The Backend"
and "Run The Web UI"), remove the duplicate blocks from "Quick Start: See The
UI" and replace them with short links or pointers that reference the canonical
sections (or anchor links to "Run The Backend" and "Run The Web UI"); ensure
headings ("Quick Start: See The UI", "Run The Backend", "Run The Web UI") remain
and update any paths/env examples to a single source of truth so instructions
cannot drift.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 51e29ec4-3dd8-4d33-ac0a-3ca02a2a6e9d

📥 Commits

Reviewing files that changed from the base of the PR and between a95ff68 and bd437e4.

📒 Files selected for processing (1)

README.md

coderabbitai · 2026-04-04T10:19:31Z

README.md

+```powershell
+Set-Location "d:\OneDrive - AG SOLUTION\claude-code"
+$env:PYTHONPATH = (Resolve-Path .\backend)
+python -m agclaw_backend.server --host 127.0.0.1 --port 8008
 ```


⚠️ Potential issue | 🟡 Minor

Use repo-relative paths instead of machine-specific absolute paths.

Commands currently depend on d:\OneDrive - AG SOLUTION\claude-code, which will fail for most users and CI-like environments. Prefer repo-root-relative instructions (or $PWD) so setup is portable.

Suggested doc update pattern

- Set-Location "d:\OneDrive - AG SOLUTION\claude-code\web" + Set-Location "<repo-root>\web"

- Set-Location "d:\OneDrive - AG SOLUTION\claude-code" + Set-Location "<repo-root>"

Also applies to: 42-47, 54-58, 67-70, 94-99, 106-110, 117-120, 129-134, 143-145, 151-152, 158-159

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 33 - 37, The README uses machine-specific absolute paths (e.g., Set-Location "d:\OneDrive - ...\claude-code") which breaks portability; update the PowerShell snippets that set location and PYTHONPATH to use repo-root-relative references (e.g., use $PWD, $PSScriptRoot, or git rev-parse --show-toplevel) and resolve the backend path with Resolve-Path ./backend so the commands that set $env:PYTHONPATH and run python -m agclaw_backend.server work on any machine or CI; apply the same change pattern to all similar blocks referenced (lines noted in the comment).

williamkasasa added 14 commits April 3, 2026 18:05

Rebrand AG-Claw explorer docs and metadata

0099692

Fix web accessibility markup and TS config

430393b

Add AG-Claw vision validation runbook

3a7eac8

Ignore generated web and backend artifacts

d8cdc38

Add clean-room backend scaffold and safety docs

be01f65

Add AG-Claw web clean-room shell

6a39f80

Adjust reference metadata and Bun session tests

35755ba

Update web lockfile and rebrand source explorer

b2267b9

Add hosted providers and visible buddy shell

f2e4deb

Add local startup and session env scripts

9ce58cf

backend: add optional OpenTelemetry tracing (best-effort init) and docs

8f2d83a

feat: provider presets and UI - add model selector; add provider prob…

6867738

…e unit tests

feat(spike): add pretext evaluation script and guarded live vision test

cbd5570

chore(spike): pretext evaluation report; add guarded live vision test

a95ff68

chore: resolve README merge conflict include star-history and operati…

bd437e4

…ng rule

coderabbitai bot reviewed Apr 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge/codex integration#5

Merge/codex integration#5
williamkasasa wants to merge 15 commits intocodeaashu:mainfrom
williamkasasa:merge/codex-integration

williamkasasa commented Apr 4, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 4, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 4, 2026

Uh oh!

coderabbitai bot Apr 4, 2026

Uh oh!

coderabbitai bot Apr 4, 2026

Uh oh!

coderabbitai bot Apr 4, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

williamkasasa commented Apr 4, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

williamkasasa commented Apr 4, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 4, 2026 •

edited

Loading