Skip to content

Unify retry behavior for OpenAI errors and enhance macOS tool#427

Merged
quangdang46 merged 19 commits into
quangdang46:masterfrom
1jehuang:master
Jun 10, 2026
Merged

Unify retry behavior for OpenAI errors and enhance macOS tool#427
quangdang46 merged 19 commits into
quangdang46:masterfrom
1jehuang:master

Conversation

@quangdang46

Copy link
Copy Markdown
Owner

No description provided.

1jehuang and others added 19 commits June 7, 2026 22:22
The OpenAI streaming path used its own is_retryable_error classifier that
omitted transient TLS faults (BadRecordMac / received fatal alert / TLS
handshake EOF / connection aborted), DNS/route failures, and HTTP/2 stream
faults that every other provider already retries via the shared
is_transient_transport_error. It also did not retry 429 rate limits.

Delegate the OpenAI path to is_transient_transport_error (like Claude,
Anthropic, Copilot, OpenRouter) and add 429 handling, unifying retry
behavior across providers. Adds regression tests asserting BadRecordMac /
received fatal alert / tls handshake eof / 429 are retryable while auth
errors remain non-retryable.

Closes #338 (gaps #1 and #2).
Loading older compacted history used to reset scroll_offset to 0, which
teleported the viewport to the new absolute top every time more history
streamed in - the janky jump the user hit when scrolling to the top of a
long transcript.

Anchor the viewport instead: when a load is triggered we capture the
reader's distance from the bottom of the transcript (invariant under a
top-side prepend) and let the next render resolve it back into an absolute
offset against the larger total. The content under the reader stays put and
the prepend is seamless. A reconcile tick then folds the resolved position
back into scroll_offset so manual scrolling resumes from the on-screen
position.

Also:
- Prefetch older history within ~one viewport of the top (instead of only
  at the literal top) so scrolling up keeps flowing instead of stalling.
- Fold unsatisfied upward scroll intent into the load as overshoot.
- Ctrl+[ / prompt-jump up at the top of the loaded window now pulls in
  older compacted history instead of doing nothing.
- Publish last_total_wrapped_lines / last_resolved_chat_scroll render
  statics to support the anchor math.

Adds unit tests covering distance-from-bottom invariance, anchor reconcile,
anchored scroll-up loading, and prompt-jump-triggered loading.
Add a macOS-only `computer` tool that lets the agent control the desktop
GUI directly: screenshot, move/click/double-click/right-click/drag the
mouse, type text, press key chords, scroll, read the focused app's
Accessibility (UI) tree, report the cursor position, and check
Accessibility/Screen-Recording permissions.

Synthetic input uses Core Graphics CGEvents (core-graphics is already a
transitive dep; the highsierra feature enables scroll). Screenshots shell
out to /usr/sbin/screencapture and report the Retina point/pixel scale so
click coordinates (points) line up with what the model sees (pixels). The
UI tree is read via System Events (osascript).

The tool is gated behind cfg(target_os = "macos") and is the desktop
analog of the existing browser tool. Closes #340.
- The osascript UI-tree handler used the reserved word `line` and referenced
  System Events terminology (`UI elements`) outside a tell block, producing a
  -2741 syntax error. Wrap the handler in `using terms from application
  "System Events"` and rename `line` to `ln`.
- CGDisplay::pixels_wide can report points (not pixels) on Retina, giving a
  wrong 1.00x scale. Read true pixel dimensions from the PNG IHDR header so the
  reported point/pixel scale is correct (e.g. 2.00x), letting the model convert
  image pixel coordinates to click points. Add a png_dimensions unit test plus
  ignored live tests for cursor/move/screenshot/ui/permissions.
When the user has not explicitly configured an Anthropic reasoning effort,
Claude Opus models now default to their strongest supported thinking level
(xhigh on Opus 4.7/4.8, clamped to high on older Opus). Other models keep
their own default so cheaper models stay cheap. Explicit user config still
wins. The surfaced reasoning_effort() status reflects the resolved default.
Restructure the computer tool into a directory module and expand it from
visible coordinate control into full macOS control, with background-capable
mechanisms preferred over disrupting the user's screen.

New capabilities:
- Tier 1 AX (background, no cursor): find_element, element_at, press,
  set_value, get_value, perform_action, select_menu. Verified end-to-end:
  set a TextEdit field via AX while it was NOT frontmost (no cursor move).
- Tier 2 windows/apps: list_apps/windows, activate/hide/quit_app,
  focus/move/resize/minimize/close_window, window_screenshot.
- Tier 3/4: clipboard get/set, run_applescript/run_jxa (headless scripting),
  wait_for (AX poll), notify, system_state, set_brightness, key_down/key_up.
- Tier 5: ocr via Vision (Swift bridge), per-window capture.
- setup/check_permissions: report + request + deep-link to the exact System
  Settings panes + poll Accessibility until granted (the one toggle macOS
  won't let any API flip).

Interface: one action-dispatched  tool with progressive disclosure.
The always-on schema describes only common actions + a  action that
returns full specs per category on demand, so the base prompt cost stays
~700 chars-est tokens regardless of how many actions exist (guarded by a
schema_is_compact test). Default policy documented: prefer background
AX/scripting over visible coordinate input when the target is resolvable.

Modules: osa (osascript/JXA + TCC error mapping), keys, input (CGEvent),
screen (capture/OCR), ax, win, sys, setup, discover. Unit tests for parsing/
discovery/schema size; ignored live tests for the full surface (all pass on a
Retina Mac). Roadmap in docs/proposals/computer-use-maximal-control.md.

Refs #340.
… bug fixes

Exhaustive live coverage of all 42 actions surfaced and fixed three bugs:
- key_down/key_up rejected modifier-only holds (e.g. "shift"); now emit a
  FlagsChanged event with the modifier keycode.
- select_menu failed with 'Can't get menu bar 1'; address menu bar of the
  target process and activate it first. Verified it actually clicks the item.
- quit_app hung indefinitely when an app showed a modal (unsaved-changes
  sheet); now bounded and reports the dialog instead of freezing.

Robustness:
- All external commands (osascript/JXA/screencapture) run under a wall-clock
  timeout via osa::run_command_timed, so an unresponsive app can never freeze
  the agent. AX action verbs use a 10s timeout; quit uses 8s.
- dry_run param: mutating actions report intent without acting (is_mutating
  classifier + gate).
- cap_output truncates large textual results (ui tree / clipboard / ocr) at
  16k chars to protect context; images unaffected.
- Audited temp-file cleanup (all screenshot/OCR temp files removed).

Tests: 20 unit (incl. timeout, dry_run, cap, modifier-chord, is_mutating) +
an exhaustive coverage_tests suite that exercises every action live (8 suites,
all pass on a Retina Mac). Clippy clean.

Refs #340.
Rename the macOS computer-use tool from `computer` to
`macos_computer_use` (API tool names cannot contain spaces, so the
requested "macos computer use" maps to the snake_case form). The tool
remains gated behind cfg(target_os = "macos") for both the module and
registry registration, so it does not exist on Linux/Windows.

Updates name(), registry key, discover output, error/context strings,
the bad-action test assertion, and doc comments.
…iders (#341)

NVIDIA NIM DeepSeek-V4 reasoning models (deepseek-v4-flash/pro) only enable
thinking when the request includes chat_template_kwargs; jcode had no way to
send non-standard request-body fields, so reasoning was unreachable.

Add a generic, model-agnostic extra_body passthrough for the OpenAI-compatible
/ OpenRouter provider:
- New optional NamedProviderConfig.extra_body (JSON object) in config.toml,
  e.g. [providers.x.extra_body.chat_template_kwargs].
- New JCODE_OPENAI_EXTRA_BODY env var (JSON object string), resolvable from the
  provider's env file, so it works for built-in profiles like nvidia-nim.
- Resolved once at construction; merged last into the chat/completions body so
  it can satisfy or override jcode-generated fields. Env overrides config on key
  collisions. Invalid input is logged and ignored, never fatal.

Live-verified against NVIDIA NIM deepseek-v4-flash: with extra_body the model
streams reasoning; without it, no reasoning. Adds 8 unit tests covering the
resolver, merge precedence, and the documented nested-table TOML config path.
Documents both config and env paths in README.
The tool ran on the user's own live machine but only nudged the model
toward background *mechanisms*; it never told it to avoid acting
*proactively* or disrupting in-progress work. Add explicit restraint
guidance:
- description: 'this is the user's live machine: act only on the
  requested task (not proactively) and prefer BACKGROUND AX/scripting
  over moving the cursor or stealing focus.'
- discover default-policy: don't move the cursor / change frontmost app /
  move windows unless asked or strictly required; never take proactive
  control upfront; when a visible action is unavoidable, do the minimum
  and restore focus.
- fix stale DisruptPolicy doc reference (no such type existed) and
  document that the policy is conveyed via description + discover.
- bump schema_is_compact bound 2800->3000 to fit the safety sentence.
…rovider (#343)

The native openai-api (Responses API) base URL was hardcoded to
https://api.openai.com/v1, so users with a local/proxied Responses endpoint
could not point openai-api at it (openai-compatible is a different protocol).

Add an API-key-only base override resolved in responses_url():
JCODE_OPENAI_API_BASE > OPENAI_BASE_URL > OPENAI_API_BASE. jcode still appends
/responses, derives the WS and /compact endpoints from the same base, and points
the /models catalog probe at it too. ChatGPT/Codex OAuth mode stays pinned to the
fixed Codex backend; malformed overrides are logged and ignored (never fatal), and
a trailing /responses is trimmed to avoid doubling.

Verified end to end against a local mock Responses server: with the override,
'jcode -p openai-api run hello' POSTs to http://127.0.0.1:8317/v1/responses and
renders the mock reply; without it, requests still go to api.openai.com. Adds 3
unit tests (override precedence/validation, WS+compact derivation, and that
ChatGPT mode ignores the override).
Info widgets used to pin to an absolute screen row. While the user scrolled,
the negative-space pocket flowed up/down the screen but the widget tried to
stay glued to its row, so it churned (shrink/hide/re-home) against the
ragged free-width profile - the distracting movement during scroll.

Now, while the user is actively scrolling (auto_scroll_paused), each anchored
widget pins to a transcript line (content_top) and rides with it, sticking to
the same pocket of negative space and simply scrolling along with the text.
Fresh widgets are seated at the bottom of their pocket so they have maximum
runway to ride up before recycling off the top. Pinned-at-bottom/streaming
keeps the prior screen-anchored behaviour.

Quantified via the stability harness: new content_travel metric (motion
relative to the transcript, scroll-ride subtracted) plus a recycle counter.
On a 100x24 viewport, rich widget set, long-line-every-7 content:
  screen-anchored:  flicker/100=114.5  keepVis=36%  content-travel/100=86.2
  content-anchored: flicker/100=  2.2  keepVis=75%  content-travel/100= 8.7
Periodic/flat content drops to ~0 content-travel (perfectly stuck).

Adds SimMode::ContentAnchored + analyze_frames_with_scroll, and regression
tests for zero drift + no overlaps while riding the scroll.
…sing-signature 400 (#339)

Gemini-3 via the Antigravity/Cloud Code backend attaches an opaque
thoughtSignature to function-call parts, but:
- a parallel multi-call turn only signs its FIRST call (siblings persist
  unsigned), and
- locally synthesized tool calls (batch sub-calls, manual tool use, auto-poke
  continuations, recovery) and pre-signature/imported sessions carry none.

Live-probing the backend shows the rule precisely: an assistant turn is rejected
with HTTP 400 "Function call is missing a thought_signature in functionCall
parts" only when ALL of its function calls are unsigned; if any one carries a
valid signature the turn is accepted, and a previously-emitted signature is
accepted when replayed on later calls.

Fix: in the shared gemini build_contents() (used by both the gemini and
antigravity providers), track the most recent real thoughtSignature across the
whole conversation and apply it to any function call that lacks one. We never
fabricate a signature when none has ever been seen.

Verified live against the Antigravity backend with an A/B harness on a
fully-unsigned follow-up turn (the #339 trigger): as-is = 400x4, carry-forward
= 200x4. A real multi-tool antigravity session (parallel write calls across
auto-poke rounds) now completes with zero signature 400s. Adds/updates 5
build_contents unit tests; full gemini suites pass.
Adds /commit-push (alias /commit-and-push) which behaves like /commit but
also pushes the resulting commits to the remote tracking branch (using
git push -u when no upstream exists). Never force-pushes or rewrites
already-pushed history.

Wired through local and remote key handling, command registry, /help
topic, and added launch-notice/help/synthetic-turn tests.
Replace the pinned-image side panel with inline image rendering in the
chat flow. Images (pasted screenshots, read of an image file, generated
images) now appear directly below the message they belong to.

Design:
- New jcode-tui-mermaid inline_image module: hash-addressed, lazy.
  inline_image_dims() does a header-only parse (no full decode) for
  prepare-time sizing; materialize_inline_image() decodes + caches only
  at draw time. Shares the existing RENDER_CACHE / render_image_widget_fit
  path, decoupled from mermaid SVG machinery.
- New ui_inline_image section: builds a dedicated PreparedMessages section
  with a dim label + correctly-sized fit placeholder per image. Sized to
  fit width preserving aspect, capped to a fraction of the viewport so
  tall screenshots cannot bury the transcript.
- ImageRegion gains a render mode (Crop for mermaid, Fit for raster
  images); draw_messages branches on it and lazily materializes only the
  on-screen images.
- Side panel and Alt+M auto-hide no longer treat images as pinned content.

Lazy: a session with many images only ever decodes/transmits the ones
currently on screen.

Tests: inline_image dims/id, fit-rows capping, section geometry, payload
registry; updated obsolete pinned-image-panel tests.
Add claude-fable-5 to the Anthropic model catalog/picker. Confirmed live
via GET /v1/models: native 1M input context, adaptive thinking, and effort
levels low/medium/high/xhigh/max. Wire it into ALL_CLAUDE_MODELS,
AVAILABLE_MODELS, known-Claude prefixes, native-1M context classification,
reasoning-effort capability checks (incl. xhigh), and API pricing estimate.
Ordered after Opus 4.8 so flagship auto-selection is unchanged.
… resets

The first-run onboarding gate relied solely on launch_count <= 5 in
setup_hints.json. That file only counts interactive TTY launches and was
recently recreated from scratch (count reset to 2), so every plain
'jcode' spawn (e.g. the niri Alt+' release hotkey) dropped an
experienced user into the onboarding/resume flow.

Three fixes:
- SetupHintsState::load now falls back to the .bak sibling the atomic
  writer always keeps, so a missing/corrupt setup_hints.json no longer
  silently resets launch_count.
- SetupHintsState is #[serde(default)] so partial/older files parse
  instead of erroring into defaults.
- The new-user heuristic now requires the absence of established native
  session history (>=10 session_*.json files) in addition to a low
  launch count; imported_* transcripts intentionally don't count. The
  duplicated heuristic in suggestion_prompts now shares this logic.
@quangdang46 quangdang46 merged commit 3b7fefa into quangdang46:master Jun 10, 2026
2 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants