Unify retry behavior for OpenAI errors and enhance macOS tool#427
Merged
Conversation
The OpenAI streaming path used its own is_retryable_error classifier that omitted transient TLS faults (BadRecordMac / received fatal alert / TLS handshake EOF / connection aborted), DNS/route failures, and HTTP/2 stream faults that every other provider already retries via the shared is_transient_transport_error. It also did not retry 429 rate limits. Delegate the OpenAI path to is_transient_transport_error (like Claude, Anthropic, Copilot, OpenRouter) and add 429 handling, unifying retry behavior across providers. Adds regression tests asserting BadRecordMac / received fatal alert / tls handshake eof / 429 are retryable while auth errors remain non-retryable. Closes #338 (gaps #1 and #2).
Loading older compacted history used to reset scroll_offset to 0, which teleported the viewport to the new absolute top every time more history streamed in - the janky jump the user hit when scrolling to the top of a long transcript. Anchor the viewport instead: when a load is triggered we capture the reader's distance from the bottom of the transcript (invariant under a top-side prepend) and let the next render resolve it back into an absolute offset against the larger total. The content under the reader stays put and the prepend is seamless. A reconcile tick then folds the resolved position back into scroll_offset so manual scrolling resumes from the on-screen position. Also: - Prefetch older history within ~one viewport of the top (instead of only at the literal top) so scrolling up keeps flowing instead of stalling. - Fold unsatisfied upward scroll intent into the load as overshoot. - Ctrl+[ / prompt-jump up at the top of the loaded window now pulls in older compacted history instead of doing nothing. - Publish last_total_wrapped_lines / last_resolved_chat_scroll render statics to support the anchor math. Adds unit tests covering distance-from-bottom invariance, anchor reconcile, anchored scroll-up loading, and prompt-jump-triggered loading.
Add a macOS-only `computer` tool that lets the agent control the desktop GUI directly: screenshot, move/click/double-click/right-click/drag the mouse, type text, press key chords, scroll, read the focused app's Accessibility (UI) tree, report the cursor position, and check Accessibility/Screen-Recording permissions. Synthetic input uses Core Graphics CGEvents (core-graphics is already a transitive dep; the highsierra feature enables scroll). Screenshots shell out to /usr/sbin/screencapture and report the Retina point/pixel scale so click coordinates (points) line up with what the model sees (pixels). The UI tree is read via System Events (osascript). The tool is gated behind cfg(target_os = "macos") and is the desktop analog of the existing browser tool. Closes #340.
- The osascript UI-tree handler used the reserved word `line` and referenced System Events terminology (`UI elements`) outside a tell block, producing a -2741 syntax error. Wrap the handler in `using terms from application "System Events"` and rename `line` to `ln`. - CGDisplay::pixels_wide can report points (not pixels) on Retina, giving a wrong 1.00x scale. Read true pixel dimensions from the PNG IHDR header so the reported point/pixel scale is correct (e.g. 2.00x), letting the model convert image pixel coordinates to click points. Add a png_dimensions unit test plus ignored live tests for cursor/move/screenshot/ui/permissions.
When the user has not explicitly configured an Anthropic reasoning effort, Claude Opus models now default to their strongest supported thinking level (xhigh on Opus 4.7/4.8, clamped to high on older Opus). Other models keep their own default so cheaper models stay cheap. Explicit user config still wins. The surfaced reasoning_effort() status reflects the resolved default.
Restructure the computer tool into a directory module and expand it from visible coordinate control into full macOS control, with background-capable mechanisms preferred over disrupting the user's screen. New capabilities: - Tier 1 AX (background, no cursor): find_element, element_at, press, set_value, get_value, perform_action, select_menu. Verified end-to-end: set a TextEdit field via AX while it was NOT frontmost (no cursor move). - Tier 2 windows/apps: list_apps/windows, activate/hide/quit_app, focus/move/resize/minimize/close_window, window_screenshot. - Tier 3/4: clipboard get/set, run_applescript/run_jxa (headless scripting), wait_for (AX poll), notify, system_state, set_brightness, key_down/key_up. - Tier 5: ocr via Vision (Swift bridge), per-window capture. - setup/check_permissions: report + request + deep-link to the exact System Settings panes + poll Accessibility until granted (the one toggle macOS won't let any API flip). Interface: one action-dispatched tool with progressive disclosure. The always-on schema describes only common actions + a action that returns full specs per category on demand, so the base prompt cost stays ~700 chars-est tokens regardless of how many actions exist (guarded by a schema_is_compact test). Default policy documented: prefer background AX/scripting over visible coordinate input when the target is resolvable. Modules: osa (osascript/JXA + TCC error mapping), keys, input (CGEvent), screen (capture/OCR), ax, win, sys, setup, discover. Unit tests for parsing/ discovery/schema size; ignored live tests for the full surface (all pass on a Retina Mac). Roadmap in docs/proposals/computer-use-maximal-control.md. Refs #340.
… bug fixes Exhaustive live coverage of all 42 actions surfaced and fixed three bugs: - key_down/key_up rejected modifier-only holds (e.g. "shift"); now emit a FlagsChanged event with the modifier keycode. - select_menu failed with 'Can't get menu bar 1'; address menu bar of the target process and activate it first. Verified it actually clicks the item. - quit_app hung indefinitely when an app showed a modal (unsaved-changes sheet); now bounded and reports the dialog instead of freezing. Robustness: - All external commands (osascript/JXA/screencapture) run under a wall-clock timeout via osa::run_command_timed, so an unresponsive app can never freeze the agent. AX action verbs use a 10s timeout; quit uses 8s. - dry_run param: mutating actions report intent without acting (is_mutating classifier + gate). - cap_output truncates large textual results (ui tree / clipboard / ocr) at 16k chars to protect context; images unaffected. - Audited temp-file cleanup (all screenshot/OCR temp files removed). Tests: 20 unit (incl. timeout, dry_run, cap, modifier-chord, is_mutating) + an exhaustive coverage_tests suite that exercises every action live (8 suites, all pass on a Retina Mac). Clippy clean. Refs #340.
Rename the macOS computer-use tool from `computer` to `macos_computer_use` (API tool names cannot contain spaces, so the requested "macos computer use" maps to the snake_case form). The tool remains gated behind cfg(target_os = "macos") for both the module and registry registration, so it does not exist on Linux/Windows. Updates name(), registry key, discover output, error/context strings, the bad-action test assertion, and doc comments.
…iders (#341) NVIDIA NIM DeepSeek-V4 reasoning models (deepseek-v4-flash/pro) only enable thinking when the request includes chat_template_kwargs; jcode had no way to send non-standard request-body fields, so reasoning was unreachable. Add a generic, model-agnostic extra_body passthrough for the OpenAI-compatible / OpenRouter provider: - New optional NamedProviderConfig.extra_body (JSON object) in config.toml, e.g. [providers.x.extra_body.chat_template_kwargs]. - New JCODE_OPENAI_EXTRA_BODY env var (JSON object string), resolvable from the provider's env file, so it works for built-in profiles like nvidia-nim. - Resolved once at construction; merged last into the chat/completions body so it can satisfy or override jcode-generated fields. Env overrides config on key collisions. Invalid input is logged and ignored, never fatal. Live-verified against NVIDIA NIM deepseek-v4-flash: with extra_body the model streams reasoning; without it, no reasoning. Adds 8 unit tests covering the resolver, merge precedence, and the documented nested-table TOML config path. Documents both config and env paths in README.
The tool ran on the user's own live machine but only nudged the model toward background *mechanisms*; it never told it to avoid acting *proactively* or disrupting in-progress work. Add explicit restraint guidance: - description: 'this is the user's live machine: act only on the requested task (not proactively) and prefer BACKGROUND AX/scripting over moving the cursor or stealing focus.' - discover default-policy: don't move the cursor / change frontmost app / move windows unless asked or strictly required; never take proactive control upfront; when a visible action is unavoidable, do the minimum and restore focus. - fix stale DisruptPolicy doc reference (no such type existed) and document that the policy is conveyed via description + discover. - bump schema_is_compact bound 2800->3000 to fit the safety sentence.
…rovider (#343) The native openai-api (Responses API) base URL was hardcoded to https://api.openai.com/v1, so users with a local/proxied Responses endpoint could not point openai-api at it (openai-compatible is a different protocol). Add an API-key-only base override resolved in responses_url(): JCODE_OPENAI_API_BASE > OPENAI_BASE_URL > OPENAI_API_BASE. jcode still appends /responses, derives the WS and /compact endpoints from the same base, and points the /models catalog probe at it too. ChatGPT/Codex OAuth mode stays pinned to the fixed Codex backend; malformed overrides are logged and ignored (never fatal), and a trailing /responses is trimmed to avoid doubling. Verified end to end against a local mock Responses server: with the override, 'jcode -p openai-api run hello' POSTs to http://127.0.0.1:8317/v1/responses and renders the mock reply; without it, requests still go to api.openai.com. Adds 3 unit tests (override precedence/validation, WS+compact derivation, and that ChatGPT mode ignores the override).
Info widgets used to pin to an absolute screen row. While the user scrolled, the negative-space pocket flowed up/down the screen but the widget tried to stay glued to its row, so it churned (shrink/hide/re-home) against the ragged free-width profile - the distracting movement during scroll. Now, while the user is actively scrolling (auto_scroll_paused), each anchored widget pins to a transcript line (content_top) and rides with it, sticking to the same pocket of negative space and simply scrolling along with the text. Fresh widgets are seated at the bottom of their pocket so they have maximum runway to ride up before recycling off the top. Pinned-at-bottom/streaming keeps the prior screen-anchored behaviour. Quantified via the stability harness: new content_travel metric (motion relative to the transcript, scroll-ride subtracted) plus a recycle counter. On a 100x24 viewport, rich widget set, long-line-every-7 content: screen-anchored: flicker/100=114.5 keepVis=36% content-travel/100=86.2 content-anchored: flicker/100= 2.2 keepVis=75% content-travel/100= 8.7 Periodic/flat content drops to ~0 content-travel (perfectly stuck). Adds SimMode::ContentAnchored + analyze_frames_with_scroll, and regression tests for zero drift + no overlaps while riding the scroll.
…sing-signature 400 (#339) Gemini-3 via the Antigravity/Cloud Code backend attaches an opaque thoughtSignature to function-call parts, but: - a parallel multi-call turn only signs its FIRST call (siblings persist unsigned), and - locally synthesized tool calls (batch sub-calls, manual tool use, auto-poke continuations, recovery) and pre-signature/imported sessions carry none. Live-probing the backend shows the rule precisely: an assistant turn is rejected with HTTP 400 "Function call is missing a thought_signature in functionCall parts" only when ALL of its function calls are unsigned; if any one carries a valid signature the turn is accepted, and a previously-emitted signature is accepted when replayed on later calls. Fix: in the shared gemini build_contents() (used by both the gemini and antigravity providers), track the most recent real thoughtSignature across the whole conversation and apply it to any function call that lacks one. We never fabricate a signature when none has ever been seen. Verified live against the Antigravity backend with an A/B harness on a fully-unsigned follow-up turn (the #339 trigger): as-is = 400x4, carry-forward = 200x4. A real multi-tool antigravity session (parallel write calls across auto-poke rounds) now completes with zero signature 400s. Adds/updates 5 build_contents unit tests; full gemini suites pass.
Adds /commit-push (alias /commit-and-push) which behaves like /commit but also pushes the resulting commits to the remote tracking branch (using git push -u when no upstream exists). Never force-pushes or rewrites already-pushed history. Wired through local and remote key handling, command registry, /help topic, and added launch-notice/help/synthetic-turn tests.
Replace the pinned-image side panel with inline image rendering in the chat flow. Images (pasted screenshots, read of an image file, generated images) now appear directly below the message they belong to. Design: - New jcode-tui-mermaid inline_image module: hash-addressed, lazy. inline_image_dims() does a header-only parse (no full decode) for prepare-time sizing; materialize_inline_image() decodes + caches only at draw time. Shares the existing RENDER_CACHE / render_image_widget_fit path, decoupled from mermaid SVG machinery. - New ui_inline_image section: builds a dedicated PreparedMessages section with a dim label + correctly-sized fit placeholder per image. Sized to fit width preserving aspect, capped to a fraction of the viewport so tall screenshots cannot bury the transcript. - ImageRegion gains a render mode (Crop for mermaid, Fit for raster images); draw_messages branches on it and lazily materializes only the on-screen images. - Side panel and Alt+M auto-hide no longer treat images as pinned content. Lazy: a session with many images only ever decodes/transmits the ones currently on screen. Tests: inline_image dims/id, fit-rows capping, section geometry, payload registry; updated obsolete pinned-image-panel tests.
Add claude-fable-5 to the Anthropic model catalog/picker. Confirmed live via GET /v1/models: native 1M input context, adaptive thinking, and effort levels low/medium/high/xhigh/max. Wire it into ALL_CLAUDE_MODELS, AVAILABLE_MODELS, known-Claude prefixes, native-1M context classification, reasoning-effort capability checks (incl. xhigh), and API pricing estimate. Ordered after Opus 4.8 so flagship auto-selection is unchanged.
… resets The first-run onboarding gate relied solely on launch_count <= 5 in setup_hints.json. That file only counts interactive TTY launches and was recently recreated from scratch (count reset to 2), so every plain 'jcode' spawn (e.g. the niri Alt+' release hotkey) dropped an experienced user into the onboarding/resume flow. Three fixes: - SetupHintsState::load now falls back to the .bak sibling the atomic writer always keeps, so a missing/corrupt setup_hints.json no longer silently resets launch_count. - SetupHintsState is #[serde(default)] so partial/older files parse instead of erroring into defaults. - The new-user heuristic now requires the absence of established native session history (>=10 session_*.json files) in addition to a low launch count; imported_* transcripts intentionally don't count. The duplicated heuristic in suggestion_prompts now shares this logic.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.