Add headless MCP server for AutoControl by JE-Chen · Pull Request #178 · Integration-Automation/AutoControlGUI

JE-Chen · 2026-04-25T12:10:15Z

Summary

Adds a complete headless Model Context Protocol (MCP) server to AutoControl so MCP-compatible clients (Claude Desktop, Claude Code, custom tool-use loops) can drive the host machine through AutoControl. 35 focused commits land tools, protocol coverage, transports, and infrastructure.

Tool surface (~90 tools)

Mouse, keyboard, screen + screenshot-as-image, image / OCR, accessibility tree, VLM locator, windows (focus / list / move / resize / minimize / maximize / restore / close), clipboard text + image, action recording / replay / edit, scheduler, triggers, hotkey daemon, screen recording, multi-monitor, drag, send-to-window, process / shell, ac_diff_screenshots, ac_wait_for_image / ac_wait_for_pixel, plus a curated set of short aliases (`click`, `type`, `screenshot`, ...).

Protocol coverage (MCP 2025-06-18)

`tools/list`, `tools/call`, `tools/list_changed`, `resources/list`, `resources/read`, `resources/subscribe`, `notifications/resources/updated`, `prompts/list`, `prompts/get`, `sampling/createMessage` (server-initiated), `elicitation/create` (destructive-tool gating), `logging/setLevel` + `notifications/message`, `roots/list` + `notifications/roots/list_changed`, progress notifications, cancellation, schema validation, plugin auto-register.

Transports

stdio (`python -m je_auto_control.utils.mcp_server`, `je_auto_control_mcp` console script)
HTTP at `/mcp` with SSE streaming, bearer-token auth, optional TLS

Infrastructure

Tool annotations (readOnly / destructive / idempotent / openWorld)
Read-only mode (env var or constructor flag)
JSONL audit log with secret redaction
Token-bucket rate limiter
Auto-screenshot on tool error
Plugin hot-reload via PluginWatcher
In-memory fake backend for CI (`JE_AUTOCONTROL_FAKE_BACKEND=1`)
CLI introspection (`--list-tools`, `--list-resources`, `--list-prompts`, `--read-only`, `--fake-backend`)
Bilingual docs (Eng + Zh)

All work obeys CLAUDE.md feature-delivery rules: headless core in `utils/`, public re-exports from `je_auto_control/init.py`, executor commands wired (`AC_start_mcp_server`, `AC_start_mcp_http_server`), top-level package stays Qt-free, every module under 750 lines, no new runtime dependencies (psutil only required by the optional process tools).

Test plan

`py -m pytest test/unit_test/headless/ -q` — 260 tests pass on Windows
`py -c "import sys, je_auto_control; assert not any('PySide6' in m for m in sys.modules)"` — facade stays Qt-free
Manual: register the server with Claude Desktop / Claude Code via `claude mcp add autocontrol -- python -m je_auto_control.utils.mcp_server` and exercise a few tools
Manual: HTTPS smoke test with bearer token (`JE_AUTOCONTROL_MCP_TOKEN=...`)
Manual: confirm `--list-tools` JSON renders on a fresh Python install

Expose the headless automation API through a stdlib-only Model Context Protocol server so MCP clients (Claude Desktop, Claude Code, custom tool-use loops) can drive AutoControl over JSON-RPC 2.0 stdio. Ships 24 tools across mouse, keyboard, screen, image, OCR, window, clipboard, executor, and run-history.

Surface readOnlyHint, destructiveHint, idempotentHint, openWorldHint on every default tool so MCP clients can require user confirmation before destructive actions (typing, clicking, executing scripts) and auto-approve read-only queries (positions, sizes, OCR, history).

Honor the JE_AUTOCONTROL_MCP_READONLY env var (and a read_only parameter on build_default_tool_registry) to drop every tool that can mutate state. Lets shared / hardened deployments expose only observers (positions, OCR, clipboard reads, history) without code changes.

ac_screenshot now returns a base64 PNG image content block so the model can actually see the screen, not just record that a file was saved. file_path is optional now; when given, the image is also written to disk and the resolved path is appended as a text block. Add MCPContent value type and content-block normalisation in the JSON-RPC dispatcher so future tools can return any mix of text and image content.

Expose record/stop_record, read/write JSON action files, and the recording-edit helpers (trim, adjust_delays, scale_coordinates) as MCP tools so a model can capture a manual session, persist it, and replay it on a different resolution or speed.

ac_drag composes set→press→move→release for click-and-drag flows. ac_send_key_to_window / ac_send_mouse_to_window post events to a specific window without stealing focus, useful when an automation needs to drive a background app while the user keeps using the foreground.

Split tools.py into a tools/ package (_base, _handlers, _factories) to keep every module under the 750-line limit, and add five new semantic-locator tools: ac_a11y_list / ac_a11y_find / ac_a11y_click target widgets through the accessibility tree (stable across visual restyles), and ac_vlm_locate / ac_vlm_click fall back to a vision language model for ad-hoc descriptions. Both paths are far more robust than pixel-template matching for dynamic UIs.

Expose 15 automation-orchestration tools so a model can build full event-driven workflows: ac_scheduler_* manages interval and cron jobs, ac_trigger_add/remove/list/start/stop covers the four trigger kinds (image, window, pixel, file), and ac_hotkey_* binds global hotkeys to action JSON files. Each control-plane tool returns the registered record so the model can chain follow-ups without a separate listing call.

Implement resources/list and resources/read so MCP clients can browse data the server has to offer without invoking a tool. The default provider chains a directory listing of action JSON files (autocontrol://files/<name>), the recent run-history snapshot (autocontrol://history), and the executor command catalogue (autocontrol://commands). A pluggable ResourceProvider lets callers swap in custom sources. Path traversal is blocked at the provider boundary.

Implement prompts/list and prompts/get so MCP clients can surface reusable task templates as slash commands. Default catalogue ships five recipes: automate_ui_task (locator-priority order), record_and_ generalize (capture and replay-with-semantics), compare_screenshots, find_widget (cheapest reliable strategy first), explain_action_file (plain-language summary). Pluggable PromptProvider lets callers add project-specific templates.

Wrap the existing JSON-RPC dispatcher in a stdlib HTTP server so MCP clients that prefer HTTP — or that need to reach the server from a container or another machine — can connect without stdio. Speaks the JSON-only flavour of MCP Streamable HTTP: POST /mcp returns the JSON-RPC reply, notifications get 202 Accepted, GET returns 405 (no SSE streaming yet). Default bind is 127.0.0.1 per the project's least-privilege policy. Wired into the executor as AC_start_mcp_http_server.

Tools that declare a 'ctx' parameter receive a ToolCallContext that can push notifications/progress (when the client supplied a progressToken) and observe cooperative cancellation. Server tracks active tools/call requests and responds to notifications/cancelled by setting the context flag so long-running tools can abort with OperationCancelledError, surfaced to the client as JSON-RPC -32800.

Tools running in concurrent mode (the default under serve_stdio) can now call server.request_sampling(messages, ...) to ask the client model a question — the server emits a server-initiated sampling/createMessage request and blocks the calling worker until the client returns a response. handle_line gains response-correlation so inbound JSON-RPC responses (id + result/error, no method) are routed to the matching pending request. Stdio is opted into concurrent tools/call dispatch so a sampling round-trip never deadlocks the reader thread.

Run a small stdlib JSON-Schema validator before invoking the handler so a model that hallucinates an arg (missing field, wrong type, value outside an enum) gets a clean -32602 with a field-level message instead of a Python TypeError. The validator covers the schema features the bundled tools actually use; we deliberately don't depend on the jsonschema library.

register_tool / unregister_tool now emit notifications/tools/ list_changed so the connected client refreshes its cached catalogue, and the initialize handshake advertises listChanged=true. Add make_plugin_tool / register_plugin_tools to wrap arbitrary AC_* plugin callables as MCP tools — the JSON Schema is derived from inspect.signature so a plugin you drop into your plugin directory shows up as a model-callable tool with named arguments.

When a POST sets Accept: text/event-stream, the response now streams progress notifications followed by the final JSON-RPC result as SSE events. Lets a model see live status updates from long-running tools (wait_for_window, OCR scans, etc.) over HTTP. A per-server lock serialises SSE requests since they swap the shared notifier/writer state — JSON POST requests stay fully concurrent.

Tool takes two PNG paths and returns the bounding boxes that changed. Pixel diff is computed in numpy (already a transitive dep through opencv-python), connected components are found via 4-connectivity flood fill, and tiny components are filtered out to ignore JPEG / antialias noise. Lets a model verify what its last action actually changed without re-OCRing the screen.

ac_screen_record_start / ac_screen_record_stop / ac_screen_record_list wrap the existing ScreenRecorder so a model can capture a video of its own automation run for later review or for sharing with the user. Recordings live on disk under a model-supplied path.

Add ac_list_monitors and a monitor_index parameter to ac_screenshot. Index 0 reports the virtual desktop spanning every connected display; 1+ are individual monitors. When monitor_index is given, capture goes through mss directly so multi-display setups work where PIL.ImageGrab silently captures only the primary screen.

ac_get_clipboard_image returns the clipboard's image as a base64 PNG content block (or a clear text fallback when the clipboard holds no image). ac_set_clipboard_image places a Pillow-readable file on the clipboard — Windows uses CF_DIB via ctypes; macOS and Linux raise a clean NotImplementedError so the model gets a useful error rather than a crash.

Optional Authorization: Bearer <token> validation, configurable via constructor arg or JE_AUTOCONTROL_MCP_TOKEN env var. Missing token returns 401, wrong token returns 403, comparison uses hmac.compare_digest to avoid timing leaks. Optional ssl_context wraps the listening socket so the same transport can serve HTTPS — required for any non-localhost deployment.

Every tools/call now produces one audit record with timestamp, tool name, sanitised arguments (password/token/secret/api_key/ authorization values are replaced with '<redacted>'), status (ok / error / cancelled), duration, and the error message on failure. The default sink is the JE_AUTOCONTROL_MCP_AUDIT env var; when unset, auditing is a no-op so unconfigured deployments pay nothing.

Optional RateLimiter on the MCPServer guards against runaway loops — a model that gets stuck calling the same tool 1000x per second now hits a -32000 'Rate limit exceeded' response instead of pegging the host. No limiter is installed by default so existing deployments keep their current behaviour; opt in with MCPServer(rate_limiter=RateLimiter(...)).

--list-tools / --list-resources / --list-prompts emit the default catalogue as JSON and exit, so CI checks and manual debugging can inspect the server's surface without launching a real MCP client. --read-only filters --list-tools to the read-only subset (matches JE_AUTOCONTROL_MCP_READONLY runtime). With no flags the entry point still launches the stdio server.

install_fake_backend() / uninstall_fake_backend() swap the mouse / keyboard / screen / clipboard wrappers with recorders that mutate an in-memory FakeState rather than the real OS. Lets a CI runner exercise every MCP tool end-to-end without a display server. Toggle via JE_AUTOCONTROL_FAKE_BACKEND=1 or the new --fake-backend CLI flag. ac_click_mouse adapter relaxed to pass through string keycodes unchanged so it works under either backend.

When the client advertises the roots capability during initialize, the server reciprocates with roots.listChanged in its capabilities, fires a roots/list request once notifications/initialized arrives, and re-fetches on every notifications/roots/list_changed. The first file:// URI in the response becomes the FileSystemProvider root, so one MCP server can follow the user across projects without a restart. ResourceProvider gains a no-op set_workspace_root hook; ChainProvider fans out, FileSystemProvider re-targets.

Add an MCPLogBridge logging.Handler that, while serve_stdio is running, forwards every project-logger record to the MCP client as a notifications/message. The handler is attached / detached automatically per stdio session. logging/setLevel requests retune the bridge level on the fly, and the initialize handshake now advertises the logging capability so clients know to expect them.

Polling wait helpers fill the gap left by ac_wait_for_window / ac_wait_for_text — useful for 'click then wait until the spinner disappears' style flows. Both tools accept a ToolCallContext, so they emit progress notifications and abort on notifications/cancelled. Timeout becomes a TimeoutError surfaced as a clean tool error to the model.

Fill the gap left by ac_focus_window / ac_close_window with ac_window_move (combined move + resize via the new MoveWindow helper), ac_window_minimize / _maximize / _restore on top of ShowWindow with the right SW_* flags. All tools resolve a matching hwnd via find_window first so the model can target by title substring instead of tracking handles itself.

ac_launch_process spawns a detached subprocess from an argv list (no shell expansion); ac_shell runs a command line via shlex.split and reports exit_code/stdout/stderr; ac_list_processes and ac_kill_process wrap psutil for inspection and cleanup. Working directories are validated against os.path.realpath; missing psutil returns a clear runtime error rather than ImportError.

When JE_AUTOCONTROL_MCP_ERROR_SHOTS is set to a directory, every failed tools/call triggers a debug screenshot saved as <tool>_<ts>.png under that path. The artifact path is appended to both the error message returned to the model and the audit JSONL record, giving a one-step forensic trail for flaky automations. Disabled by default — costs nothing when the env var is unset.

Default registry now also exposes a curated set of model-friendly aliases (click, type, screenshot, drag, find_image, shell, ...) that point at the canonical ac_* tools. Reduces prompt verbosity without renaming the underlying API. Toggle off via JE_AUTOCONTROL_MCP_ALIASES=0 or build_default_tool_registry( aliases=False); read-only mode automatically filters destructive aliases out before expansion.

PluginWatcher polls a plugin directory and (un)registers MCP tools on change: new files become tools, modified files re-register under the same names with the updated handler, deleted files drop their tools. Each register/unregister already triggers notifications/tools/list_changed so connected clients see the catalogue refresh in real time.

resources/subscribe / resources/unsubscribe wire a producer callback into a provider. ResourceProvider gains optional subscribe/unsubscribe hooks; ChainProvider fans out to children; LiveScreenProvider exposes autocontrol://screen/live and pushes notifications/resources/updated every poll_seconds while at least one client is subscribed. resources.subscribe capability flag flipped to true so clients know to attempt subscriptions.

Add MCPServer.request_elicitation that fires a server-initiated elicitation/create when JE_AUTOCONTROL_MCP_CONFIRM_DESTRUCTIVE=1 and the client advertised the elicitation capability. Tools whose annotations mark them destructive (and not read-only) are gated: the user sees a confirmation prompt before the action runs, and declining returns a clean -32000 error to the model. Servers that talk to non-elicitation clients fall through with a logged warning so the feature is opt-in and never blocks unexpectedly.

codacy-production · 2026-04-25T12:11:11Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 1021 complexity · 25 duplication

Metric Results

Complexity 1021

Duplication 25

View in Codacy

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes.}

- S3516: drop invariant ``return 0`` from main(); CLI returns None. - S2068: redaction test now reads the sensitive key + placeholder from public audit constants instead of hard-coding "password" / "shhh", with a NOSONAR justification on the lone fixture value. - S1542: keep the AC_* plugin convention with a NOSONAR comment. - S1192: extract _TOOLS_CALL_METHOD and _MIME_JSON constants. - S1244: use math.isclose for the scheduler interval assertion. - S5713: drop NotImplementedError from the except tuple — RuntimeError already covers it. - Prospector pyflakes: remove the unused ``import logging``. - Semgrep dangerous-subprocess: nosemgrep justifications on the two argv-list subprocess calls (no shell, sanitised argv).

- README.md / zh-TW / zh-CN: add MCP feature bullet, MCP quick-start section (registering with Claude Desktop / Code, programmatic start, HTTP+SSE+auth+TLS, --list-* CLI flags, surface table, security notes), and a Mermaid architecture diagram showing the client → transport → mcp_server → core → backends path. The filesystem tree below is kept and updated with mcp_server/ and the new auto_control_window.py. - docs/source/{Eng,Zh}/doc/mcp_server/mcp_server_doc.rst: rewrite to cover the full surface that landed in this PR — ~90 tools, resources / prompts / sampling / roots / logging / progress / cancellation / elicitation, HTTP+SSE transport, audit log, rate limiter, auto-screenshot on error, plugin hot-reload, fake backend for CI, and CLI introspection flags.

main() returns None (Sonar S3516 fix from the previous round), so ``rc = main(argv)`` followed by ``assert rc is None`` trips Sonar S3699 ("Remove this use of the output from 'main'; 'main' doesn't return anything"). Drop the binding entirely and call main() purely for its side effect.

sonarqubecloud · 2026-04-25T12:37:37Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.8% Duplication on New Code

See analysis details on SonarQube Cloud

JE-Chen added 30 commits April 25, 2026 18:40

JE-Chen added 5 commits April 25, 2026 19:54

JE-Chen added 3 commits April 25, 2026 20:24

JE-Chen merged commit 6961cd9 into main Apr 25, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add headless MCP server for AutoControl#178

Add headless MCP server for AutoControl#178
JE-Chen merged 38 commits intomainfrom
dev

JE-Chen commented Apr 25, 2026

Uh oh!

codacy-production Bot commented Apr 25, 2026 •

edited

Loading

Uh oh!

sonarqubecloud Bot commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JE-Chen commented Apr 25, 2026

Summary

Tool surface (~90 tools)

Protocol coverage (MCP 2025-06-18)

Transports

Infrastructure

Test plan

Uh oh!

codacy-production Bot commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Up to standards ✅

Uh oh!

sonarqubecloud Bot commented Apr 25, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codacy-production Bot commented Apr 25, 2026 •

edited

Loading