kernel · dprevoznik · Jun 21, 2026 · Jun 21, 2026 · Jun 21, 2026 · Jun 21, 2026
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # Kernel Skills
 
-Official AI agent skills from the Kernel for installing useful skills for our CLI and SDKs that you can load into popular coding agents.
+Official AI agent skills from [Kernel](https://kernel.sh) for installing useful skills for our CLI and SDKs that you can load into popular coding agents.
 
 ## Installation
 
@@ -72,6 +72,7 @@ Command-line interface skills for using Kernel CLI commands.
 | **kernel-cli** | Complete guide to Kernel CLI - cloud browser platform with automation, deployment, and management |
 | **kernel-agent-browser** | Best practices for `agent-browser -p kernel` automation, bot detection handling, iframes, login persistence |
 | **kernel-auth** | Setup and manage Kernel authentication connections for any website with safety checks and reauthentication support |
+| **cua-cli** | Drive a Kernel browser from shell via the `cua` binary: one-shot subcommands, named sessions, TUI, profile persistence, transcripts, live-view handoff |
 | **profile-website-bot-detection** | Profile a website for bot detection vendors using stealth vs non-stealth Kernel browsers; compare effectiveness and identify vendor products |
 
 ### kernel-sdks
@@ -82,6 +83,7 @@ SDK skills for building browser automation with TypeScript and Python.
 |-------|-------------|
 | **typescript-sdk** | Build automation with Kernel's Typescript SDK |
 | **python-sdk** | Build automation with kernel's Python SDK |
+| **cua-agent** | Build TypeScript apps that embed Kernel cua's loop with `CuaAgent` / `CuaAgentHarness`: provider switching, custom tools, session repos, event-stream debugging |
 
 ### generate-video
 

diff --git a/plugins/kernel-cli/skills/cua-cli/SKILL.md b/plugins/kernel-cli/skills/cua-cli/SKILL.md
@@ -0,0 +1,257 @@
+---
+name: cua-cli
+description: Drive a Kernel cloud browser from the shell using the `cua` CLI. Use this skill when you need to open URLs, click elements, type into fields, take screenshots, or chain multi-step browser tasks across shell calls. Supports named sessions for stateful workflows, profile persistence for logins, transcript-based debugging, live-view handoff, and mixing vision-driven computer-use with `playwright_execute` for DOM-precise steps the model picks per action (`--playwright`). For building your own TS agent on top of cua, see `cua-agent`.
+---
+
+# cua-cli
+
+`cua` is a single-binary CLI that drives a real Chrome session running in Kernel. It's designed for agentic use: each subcommand returns a one-line result on stdout and a deterministic exit code, so you can chain calls together and parse the output. An LLM picks targets semantically from screenshots — there are no CSS selectors.
+
+## When to use this skill
+
+- **Use this skill** when you need shell-callable computer-use steps (`cua open`, `cua click`, `cua do …`), an interactive TUI, or want to chain browser actions in a shell pipeline.
+- **Reach for the `cua-agent` skill** (in the `kernel-sdks` plugin) when you're writing a TypeScript app that needs to embed cua's prompt → screenshot → tool-call loop programmatically.
+- **Reach for `kernel-agent-browser`** when you need deterministic browser scripting (semantic selectors, `find role`, `wait --text`, accessibility-tree snapshots).
+- **Reach for `kernel-cli`** for raw Kernel browser management (`kernel browsers create`, `kernel browsers exec`, profile / proxy CRUD).
+
+## Prerequisites
+
+- A Kernel account and `KERNEL_API_KEY`. See `kernel-cli` for install + auth.
+- At least one model-provider API key, matched to the model you pick (table below).
+- Node 20+ for the npm install.
+
+## Install
+
+```bash
+# Global install — puts `cua` on $PATH
+npm i -g @onkernel/cua-cli
+
+# Or zero-install one-shot
+npx -y -p @onkernel/cua-cli cua --help
+```
+
+## Environment variables
+
+| Env | Used for |
+| --- | --- |
+| `KERNEL_API_KEY` | Kernel API key (always required) |
+| `OPENAI_API_KEY` | OpenAI models (`-m openai:…`) |
+| `ANTHROPIC_OAUTH_TOKEN` / `ANTHROPIC_API_KEY` | Anthropic models (`-m anthropic:…`); OAuth token wins if both are set |
+| `GOOGLE_API_KEY` / `GEMINI_API_KEY` | Google / Gemini models (`-m google:…`); `GOOGLE_API_KEY` wins if both are set |
+| `YUTORI_API_KEY` | Yutori Navigator (`-m yutori:…`) |
+| `TZAFON_API_KEY` | Tzafon (`-m tzafon:…`) |
+| `KERNEL_BASE_URL` | Override Kernel base URL |
+| `XDG_DATA_HOME` | Sessions / transcripts dir (defaults to `~/.local/share`) |
+| `CUA_IMAGE_PROTOCOL` | Force inline image protocol (`kitty` / `iterm2` / `none` / `auto`) |
+
+## One-shot subcommands
+
+Each call provisions a fresh Kernel browser by default, runs the action, prints a one-line result, and tears the browser down. Chain via `-s <name>` (next section) to keep state.
+
+| Subcommand | What it does | Stdout | Exit code |
+| --- | --- | --- | --- |
+| `cua open <url>` | Navigate to a URL. | `ok` | 0 ok, 2 error |
+| `cua click "<desc>"` | Find element matching natural-language description and click it. | `ok clicked (x, y)` or `not_found <reason>` | 0 ok, 1 not_found, 2 error |
+| `cua type "<field>" "<text>"` | Focus a field by description and type. | `ok typed` or `not_found <reason>` | 0 ok, 1 not_found, 2 error |
+| `cua press <key> [<key>...]` | Send a key combo (`cua press ctrl l`, `cua press Return`). | `ok pressed` | 0 ok, 2 error |
+| `cua url` | Print the current URL. | the URL | 0 ok, 2 error |
+| `cua observe ["<question>"]` | Describe the page; optionally answer a question. | the description | 0 ok, 2 error |
+| `cua screenshot --out <file\|->` | Save a PNG. `--out -` writes bytes to stdout. | the path or `(stdout)` | 0 ok, 2 error |
+| `cua do "<instruction>"` | Open-ended; agent plans and acts. Bound by `--max-steps` (default 3). | the assistant's final text | 0 ok, 2 error |
+
+Useful flags:
+
+- `-m <model>` — pick the LLM (default `openai:gpt-5.5`). `cua models` to list.
+- `--max-steps <n>` — bound the loop on `cua do`.
+- `--profile <id-or-name>` — load a Kernel browser profile for persisted cookies / storage. Existing ids or names are reused; a non-id name is created if missing. Pass `--profile-no-save-changes` for read-only.
+- `--playwright` — expose the `playwright_execute` tool alongside computer-use so the model can run Playwright/TS against the live browser for steps that are cleaner as DOM operations (form fills, structured extraction, `waitForSelector`). Off by default. See "Mixing vision and DOM" below.
+- `-v` — verbose progress on stderr (provisioning, tool calls, transcript path).
+
+`click` and `type` match **semantically**, not by selector — use natural-language descriptions of what's visible on screen.
+
+The cua CLI always provisions **stealth-on** browsers. If you need non-stealth or a custom viewport / proxy, pre-create the browser via `kernel browsers create` and attach the cua session to it.
+
+If you're unsure of a flag or subcommand, `cua --help` and `cua <subcommand> --help` print the current surface.
+
+## Named sessions
+
+Without `-s`, each subcommand provisions a brand-new browser. To keep state across calls, allocate a named session first:
+
+```bash
+cua --profile github session start login    # provisions a Kernel browser, prints `name=login`
+cua -s login open https://github.com/login
+cua -s login type "email field"    "$EMAIL"
+cua -s login type "password field" "$PASSWORD"
+cua -s login click "Sign in"
+cua -s login url                            # prints post-login URL
+cua session stop login                      # tears down the Kernel browser
+```
+
+Inspect:
+
+```bash
+cua session list                            # NAME / KERNEL_ID / AGE / LIVE_URL
+cua session show login                      # full JSON metadata
+```
+
+Pass `--profile` when starting the named session; later `cua -s <name> …` calls attach to the same browser, so they don't need the profile flag again.
+
+**Liveness**: Kernel browsers time out from inactivity. If you see `error session "login" is no longer alive on Kernel …`, re-provision with the same profile and name:
+
+```bash
+cua session stop login                       # safe even if the Kernel browser is already gone
+cua --profile github session start login     # re-attach name=login to a fresh browser, same profile
+```
+
+Named-session metadata lives in `$XDG_DATA_HOME/cua/named-sessions/<name>.json`.
+
+## Free-form mode
+
+```bash
+cua --print "open hn and tell me the top story"   # one-shot, streams text
+cua --print -o jsonl "..."                        # one-shot, streams JSONL events
+cua "..."                                         # interactive TUI (real terminal)
+```
+
+`--print` exits when the agent finishes; the TUI runs until Ctrl+C. Add `--jsonl-include-deltas` for token deltas, `--jsonl-include-images` for base64 screenshots in `tool_result` events.
+
+**If you're an agent driving cua from a shell, always pass `--print` or `--print -o jsonl`.** The bare `cua "..."` form opens an interactive TUI that needs a real terminal — it will hang in a non-interactive context.
+
+Inside the TUI, `/playwright on` and `/playwright off` toggle the `playwright_execute` tool mid-session without restarting.
+
+## Model selection
+
+Run `cua models` for the current catalog. Pick with `-m <ref>` (default `openai:gpt-5.5`). Switch per call or per named session.
+
+| Model ref | Provider |
+| --- | --- |
+| `openai:gpt-5.5` | OpenAI (default) |
+| `anthropic:claude-opus-4-7` | Anthropic (supports `--thinking off\|minimal\|low\|medium\|high\|xhigh`) |
+| `google:gemini-3-flash-preview` | Google / Gemini |
+| `yutori:n1.5-latest` | Yutori Navigator |
+
+Not every provider's native vocabulary includes navigation. If a model can click and type but can't navigate (`goto`, `back`, `forward`, `url`), pick a different model.
+
+## Live view URL and manual login fallback
+
+Stealth-on doesn't always beat bot detection. When automation gets stuck on a login, hand off to a human via the live view URL.
+
+```bash
+cua --profile mysite session start login
+cua session show login | jq -r .live_url   # share this URL with the user
+# user logs in manually in the live view
+cua -s login url                           # confirm post-login URL
+cua session stop login                     # profile state saves on teardown
+```
+
+If you only have a session id (e.g. from `cua session list`), the `kernel` CLI also surfaces it:
+
+```bash
+kernel browsers view <session-id>
+```
+
+## Mixing vision and DOM
+
+Vision (computer-use tools) and DOM (Playwright) are peer capabilities against the same browser. Real workloads mix both — a login flow may be visual and bot-detected while the data extraction on the other side is a stable table with reliable selectors. cua exposes both so the *model* picks per action:
+
+- **Reach for computer use when**: DOM is brittle or unknown, the target is canvas/video/pixel UI, bot detection makes human-like input matter, or the check is visual ("did the modal close?").
+- **Reach for Playwright when**: selectors are stable, you want structured extraction (`page.$$eval` over a list) instead of asking the model to read pixels, you're filling many form fields or hidden inputs, or you need to wait on a network/DOM condition rather than a visual cue.
+
+### Model-picked (`--playwright`)
+
+Pass `--playwright` and the model gets both toolsets and chooses per step:
+
+```bash
+cua --playwright --profile mysite session start work
+cua -s work --print "log in to example.com, then pull the last 10 orders as JSON"
+# The model picks computer_use for the login (visual, bot-detected)
+# and playwright_execute for the extraction (structured, stable DOM).
+```
+
+Inside `playwright_execute`, `page`, `context`, and `browser` are in scope and the code may `return` a JSON-serializable value that comes back as the tool result. Each call runs in a fresh JS context (locals don't persist), but the browser session does. Screenshots aren't auto-attached — the model requests one on a follow-up turn when it needs to see the page.
+
+Verified end-to-end against Anthropic, Tzafon, and Yutori CUA models; OpenAI and Google are unit-tested. In the TUI, `/playwright on` / `/playwright off` toggle the tool mid-session.
+
+### Developer-scripted split
+
+If you'd rather orchestrate the split yourself instead of letting the model pick — for repeatable batch jobs, or when you know the DOM shape in advance — `kernel browsers exec` runs Playwright against the same `kernel_session_id`:
+
+```bash
+cua --profile mysite session start work
+cua -s work open https://example.com/checkout
+cua -s work click "Continue to payment"
+
+cua session show work | jq -r .kernel_session_id   # → <session-id>
+kernel browsers exec <session-id> --code "
+  const frame = page.frameLocator('#payment-iframe');
+  await frame.locator('#card-number').fill(process.env.CARD_NUMBER);
+  await frame.locator('#submit').click();
+"
+
+cua -s work observe "did the payment succeed?"
+```
+
+## Debugging
+
+- **Verbose stderr**: `cua -v --print "…"` writes provisioning info, tool calls, and the transcript path to stderr.
+- **Live event stream**: `cua --print -o jsonl "…"` emits one event per line (`tool_call`, `tool_result`, `assistant_text_done`, etc.). Add `--jsonl-include-images` to inline screenshots in `tool_result`.
+- **Persisted transcript**: every `--print`, TUI, and `-s <name>` invocation appends to `$XDG_DATA_HOME/cua/sessions/<cwd-hash>/<id>.jsonl`. Find the exact path:
+  ```bash
+  cua -v --print "..."                       # stderr includes: [cua] session=<path>
+  cua session show login | jq -r .transcript_path
+  ```
+  Roles: `user`, `assistant`, `toolResult`. There's also a custom `cua-browser` entry written once per session with `kernel_session_id` / `live_url` / `profile_id`.
+- **Screenshots**: `cua screenshot --out shot.png` or inspect `image` blocks in `toolResult` transcript entries.
+- **Page URL**: `cua url` to confirm post-action navigation.
+
+A few `jq` starters against a transcript path:
+
+```bash
+# Every tool call the agent made, in order
+jq -c 'select(.role == "assistant") | .content[]?
+       | select(.type == "tool_use") | {name, input}' "$TRANSCRIPT"
+
+# Final assistant text (the answer)
+jq -r 'select(.role == "assistant") | .content[]?
+       | select(.type == "text") | .text' "$TRANSCRIPT" | tail -1
+```
+
+## Gotchas
+
+- **Element descriptions are semantic, not selectors.** `cua click "Sign in button"` looks at the screenshot — describe what the user sees, not a CSS selector.
+- **Viewport defaults to 1920x1080.** Pre-create the browser with `kernel browsers create` if you need something else.
+- **Keyboard navigation > mouse-wheel scroll.** `cua press Page_Down` / `Home` / arrow keys is more reliable than scroll wheel via the LLM.
+- **Multi-step state requires `-s <name>`.** A second one-shot subcommand can't see what the first one did.
+- **Profile saves on close, not continuously.** Tear down cleanly with `cua session stop` or you'll lose recent state.
+- **`--max-steps` defaults to 3 on `cua do`.** Bump it for non-trivial tasks.
+
+## Quick reference
+
+```bash
+# One-shot, fresh browser
+cua --print "open hn and tell me the top story"
+
+# Named session for multi-step
+cua --profile github session start login
+cua -s login open https://example.com
+cua -s login click "Log in"
+cua -s login type "email field" "$EMAIL"
+cua -s login click "Submit"
+cua -s login url
+cua session stop login
+
+# List models, switch model per call
+cua models
+cua --print -m anthropic:claude-opus-4-7 "..."
+
+# Get the live view URL
+cua session show login | jq -r .live_url
+kernel browsers view <session-id>   # alternative
+
+# Let the model mix vision and DOM via playwright_execute
+cua --playwright --print "log in and pull the last 10 orders as JSON"
+
+# Or orchestrate the split yourself against the same session
+cua session show login | jq -r .kernel_session_id
+kernel browsers exec <session-id> --code "..."
+```