Conversation
Expose the headless automation API through a stdlib-only Model Context Protocol server so MCP clients (Claude Desktop, Claude Code, custom tool-use loops) can drive AutoControl over JSON-RPC 2.0 stdio. Ships 24 tools across mouse, keyboard, screen, image, OCR, window, clipboard, executor, and run-history.
Surface readOnlyHint, destructiveHint, idempotentHint, openWorldHint on every default tool so MCP clients can require user confirmation before destructive actions (typing, clicking, executing scripts) and auto-approve read-only queries (positions, sizes, OCR, history).
Honor the JE_AUTOCONTROL_MCP_READONLY env var (and a read_only parameter on build_default_tool_registry) to drop every tool that can mutate state. Lets shared / hardened deployments expose only observers (positions, OCR, clipboard reads, history) without code changes.
ac_screenshot now returns a base64 PNG image content block so the model can actually see the screen, not just record that a file was saved. file_path is optional now; when given, the image is also written to disk and the resolved path is appended as a text block. Add MCPContent value type and content-block normalisation in the JSON-RPC dispatcher so future tools can return any mix of text and image content.
Expose record/stop_record, read/write JSON action files, and the recording-edit helpers (trim, adjust_delays, scale_coordinates) as MCP tools so a model can capture a manual session, persist it, and replay it on a different resolution or speed.
ac_drag composes set→press→move→release for click-and-drag flows. ac_send_key_to_window / ac_send_mouse_to_window post events to a specific window without stealing focus, useful when an automation needs to drive a background app while the user keeps using the foreground.
Split tools.py into a tools/ package (_base, _handlers, _factories) to keep every module under the 750-line limit, and add five new semantic-locator tools: ac_a11y_list / ac_a11y_find / ac_a11y_click target widgets through the accessibility tree (stable across visual restyles), and ac_vlm_locate / ac_vlm_click fall back to a vision language model for ad-hoc descriptions. Both paths are far more robust than pixel-template matching for dynamic UIs.
Expose 15 automation-orchestration tools so a model can build full event-driven workflows: ac_scheduler_* manages interval and cron jobs, ac_trigger_add/remove/list/start/stop covers the four trigger kinds (image, window, pixel, file), and ac_hotkey_* binds global hotkeys to action JSON files. Each control-plane tool returns the registered record so the model can chain follow-ups without a separate listing call.
Implement resources/list and resources/read so MCP clients can browse data the server has to offer without invoking a tool. The default provider chains a directory listing of action JSON files (autocontrol://files/<name>), the recent run-history snapshot (autocontrol://history), and the executor command catalogue (autocontrol://commands). A pluggable ResourceProvider lets callers swap in custom sources. Path traversal is blocked at the provider boundary.
Implement prompts/list and prompts/get so MCP clients can surface reusable task templates as slash commands. Default catalogue ships five recipes: automate_ui_task (locator-priority order), record_and_ generalize (capture and replay-with-semantics), compare_screenshots, find_widget (cheapest reliable strategy first), explain_action_file (plain-language summary). Pluggable PromptProvider lets callers add project-specific templates.
Wrap the existing JSON-RPC dispatcher in a stdlib HTTP server so MCP clients that prefer HTTP — or that need to reach the server from a container or another machine — can connect without stdio. Speaks the JSON-only flavour of MCP Streamable HTTP: POST /mcp returns the JSON-RPC reply, notifications get 202 Accepted, GET returns 405 (no SSE streaming yet). Default bind is 127.0.0.1 per the project's least-privilege policy. Wired into the executor as AC_start_mcp_http_server.
Tools that declare a 'ctx' parameter receive a ToolCallContext that can push notifications/progress (when the client supplied a progressToken) and observe cooperative cancellation. Server tracks active tools/call requests and responds to notifications/cancelled by setting the context flag so long-running tools can abort with OperationCancelledError, surfaced to the client as JSON-RPC -32800.
Tools running in concurrent mode (the default under serve_stdio) can now call server.request_sampling(messages, ...) to ask the client model a question — the server emits a server-initiated sampling/createMessage request and blocks the calling worker until the client returns a response. handle_line gains response-correlation so inbound JSON-RPC responses (id + result/error, no method) are routed to the matching pending request. Stdio is opted into concurrent tools/call dispatch so a sampling round-trip never deadlocks the reader thread.
Run a small stdlib JSON-Schema validator before invoking the handler so a model that hallucinates an arg (missing field, wrong type, value outside an enum) gets a clean -32602 with a field-level message instead of a Python TypeError. The validator covers the schema features the bundled tools actually use; we deliberately don't depend on the jsonschema library.
register_tool / unregister_tool now emit notifications/tools/ list_changed so the connected client refreshes its cached catalogue, and the initialize handshake advertises listChanged=true. Add make_plugin_tool / register_plugin_tools to wrap arbitrary AC_* plugin callables as MCP tools — the JSON Schema is derived from inspect.signature so a plugin you drop into your plugin directory shows up as a model-callable tool with named arguments.
When a POST sets Accept: text/event-stream, the response now streams progress notifications followed by the final JSON-RPC result as SSE events. Lets a model see live status updates from long-running tools (wait_for_window, OCR scans, etc.) over HTTP. A per-server lock serialises SSE requests since they swap the shared notifier/writer state — JSON POST requests stay fully concurrent.
Tool takes two PNG paths and returns the bounding boxes that changed. Pixel diff is computed in numpy (already a transitive dep through opencv-python), connected components are found via 4-connectivity flood fill, and tiny components are filtered out to ignore JPEG / antialias noise. Lets a model verify what its last action actually changed without re-OCRing the screen.
ac_screen_record_start / ac_screen_record_stop / ac_screen_record_list wrap the existing ScreenRecorder so a model can capture a video of its own automation run for later review or for sharing with the user. Recordings live on disk under a model-supplied path.
Add ac_list_monitors and a monitor_index parameter to ac_screenshot. Index 0 reports the virtual desktop spanning every connected display; 1+ are individual monitors. When monitor_index is given, capture goes through mss directly so multi-display setups work where PIL.ImageGrab silently captures only the primary screen.
ac_get_clipboard_image returns the clipboard's image as a base64 PNG content block (or a clear text fallback when the clipboard holds no image). ac_set_clipboard_image places a Pillow-readable file on the clipboard — Windows uses CF_DIB via ctypes; macOS and Linux raise a clean NotImplementedError so the model gets a useful error rather than a crash.
Optional Authorization: Bearer <token> validation, configurable via constructor arg or JE_AUTOCONTROL_MCP_TOKEN env var. Missing token returns 401, wrong token returns 403, comparison uses hmac.compare_digest to avoid timing leaks. Optional ssl_context wraps the listening socket so the same transport can serve HTTPS — required for any non-localhost deployment.
Every tools/call now produces one audit record with timestamp, tool name, sanitised arguments (password/token/secret/api_key/ authorization values are replaced with '<redacted>'), status (ok / error / cancelled), duration, and the error message on failure. The default sink is the JE_AUTOCONTROL_MCP_AUDIT env var; when unset, auditing is a no-op so unconfigured deployments pay nothing.
Optional RateLimiter on the MCPServer guards against runaway loops — a model that gets stuck calling the same tool 1000x per second now hits a -32000 'Rate limit exceeded' response instead of pegging the host. No limiter is installed by default so existing deployments keep their current behaviour; opt in with MCPServer(rate_limiter=RateLimiter(...)).
--list-tools / --list-resources / --list-prompts emit the default catalogue as JSON and exit, so CI checks and manual debugging can inspect the server's surface without launching a real MCP client. --read-only filters --list-tools to the read-only subset (matches JE_AUTOCONTROL_MCP_READONLY runtime). With no flags the entry point still launches the stdio server.
install_fake_backend() / uninstall_fake_backend() swap the mouse / keyboard / screen / clipboard wrappers with recorders that mutate an in-memory FakeState rather than the real OS. Lets a CI runner exercise every MCP tool end-to-end without a display server. Toggle via JE_AUTOCONTROL_FAKE_BACKEND=1 or the new --fake-backend CLI flag. ac_click_mouse adapter relaxed to pass through string keycodes unchanged so it works under either backend.
When the client advertises the roots capability during initialize, the server reciprocates with roots.listChanged in its capabilities, fires a roots/list request once notifications/initialized arrives, and re-fetches on every notifications/roots/list_changed. The first file:// URI in the response becomes the FileSystemProvider root, so one MCP server can follow the user across projects without a restart. ResourceProvider gains a no-op set_workspace_root hook; ChainProvider fans out, FileSystemProvider re-targets.
Add an MCPLogBridge logging.Handler that, while serve_stdio is running, forwards every project-logger record to the MCP client as a notifications/message. The handler is attached / detached automatically per stdio session. logging/setLevel requests retune the bridge level on the fly, and the initialize handshake now advertises the logging capability so clients know to expect them.
Polling wait helpers fill the gap left by ac_wait_for_window / ac_wait_for_text — useful for 'click then wait until the spinner disappears' style flows. Both tools accept a ToolCallContext, so they emit progress notifications and abort on notifications/cancelled. Timeout becomes a TimeoutError surfaced as a clean tool error to the model.
Fill the gap left by ac_focus_window / ac_close_window with ac_window_move (combined move + resize via the new MoveWindow helper), ac_window_minimize / _maximize / _restore on top of ShowWindow with the right SW_* flags. All tools resolve a matching hwnd via find_window first so the model can target by title substring instead of tracking handles itself.
ac_launch_process spawns a detached subprocess from an argv list (no shell expansion); ac_shell runs a command line via shlex.split and reports exit_code/stdout/stderr; ac_list_processes and ac_kill_process wrap psutil for inspection and cleanup. Working directories are validated against os.path.realpath; missing psutil returns a clear runtime error rather than ImportError.
When JE_AUTOCONTROL_MCP_ERROR_SHOTS is set to a directory, every failed tools/call triggers a debug screenshot saved as <tool>_<ts>.png under that path. The artifact path is appended to both the error message returned to the model and the audit JSONL record, giving a one-step forensic trail for flaky automations. Disabled by default — costs nothing when the env var is unset.
Default registry now also exposes a curated set of model-friendly aliases (click, type, screenshot, drag, find_image, shell, ...) that point at the canonical ac_* tools. Reduces prompt verbosity without renaming the underlying API. Toggle off via JE_AUTOCONTROL_MCP_ALIASES=0 or build_default_tool_registry( aliases=False); read-only mode automatically filters destructive aliases out before expansion.
PluginWatcher polls a plugin directory and (un)registers MCP tools on change: new files become tools, modified files re-register under the same names with the updated handler, deleted files drop their tools. Each register/unregister already triggers notifications/tools/list_changed so connected clients see the catalogue refresh in real time.
resources/subscribe / resources/unsubscribe wire a producer callback into a provider. ResourceProvider gains optional subscribe/unsubscribe hooks; ChainProvider fans out to children; LiveScreenProvider exposes autocontrol://screen/live and pushes notifications/resources/updated every poll_seconds while at least one client is subscribed. resources.subscribe capability flag flipped to true so clients know to attempt subscriptions.
Add MCPServer.request_elicitation that fires a server-initiated elicitation/create when JE_AUTOCONTROL_MCP_CONFIRM_DESTRUCTIVE=1 and the client advertised the elicitation capability. Tools whose annotations mark them destructive (and not read-only) are gated: the user sees a confirmation prompt before the action runs, and declining returns a clean -32000 error to the model. Servers that talk to non-elicitation clients fall through with a logged warning so the feature is opt-in and never blocks unexpectedly.
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 1021 |
| Duplication | 25 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
- S3516: drop invariant ``return 0`` from main(); CLI returns None. - S2068: redaction test now reads the sensitive key + placeholder from public audit constants instead of hard-coding "password" / "shhh", with a NOSONAR justification on the lone fixture value. - S1542: keep the AC_* plugin convention with a NOSONAR comment. - S1192: extract _TOOLS_CALL_METHOD and _MIME_JSON constants. - S1244: use math.isclose for the scheduler interval assertion. - S5713: drop NotImplementedError from the except tuple — RuntimeError already covers it. - Prospector pyflakes: remove the unused ``import logging``. - Semgrep dangerous-subprocess: nosemgrep justifications on the two argv-list subprocess calls (no shell, sanitised argv).
- README.md / zh-TW / zh-CN: add MCP feature bullet, MCP quick-start
section (registering with Claude Desktop / Code, programmatic
start, HTTP+SSE+auth+TLS, --list-* CLI flags, surface table,
security notes), and a Mermaid architecture diagram showing the
client → transport → mcp_server → core → backends path. The
filesystem tree below is kept and updated with mcp_server/ and
the new auto_control_window.py.
- docs/source/{Eng,Zh}/doc/mcp_server/mcp_server_doc.rst: rewrite
to cover the full surface that landed in this PR — ~90 tools,
resources / prompts / sampling / roots / logging / progress /
cancellation / elicitation, HTTP+SSE transport, audit log,
rate limiter, auto-screenshot on error, plugin hot-reload,
fake backend for CI, and CLI introspection flags.
main() returns None (Sonar S3516 fix from the previous round), so
``rc = main(argv)`` followed by ``assert rc is None`` trips Sonar
S3699 ("Remove this use of the output from 'main'; 'main' doesn't
return anything"). Drop the binding entirely and call main() purely
for its side effect.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Summary
Adds a complete headless Model Context Protocol (MCP) server to AutoControl so MCP-compatible clients (Claude Desktop, Claude Code, custom tool-use loops) can drive the host machine through AutoControl. 35 focused commits land tools, protocol coverage, transports, and infrastructure.
Tool surface (~90 tools)
Mouse, keyboard, screen + screenshot-as-image, image / OCR, accessibility tree, VLM locator, windows (focus / list / move / resize / minimize / maximize / restore / close), clipboard text + image, action recording / replay / edit, scheduler, triggers, hotkey daemon, screen recording, multi-monitor, drag, send-to-window, process / shell, ac_diff_screenshots, ac_wait_for_image / ac_wait_for_pixel, plus a curated set of short aliases (`click`, `type`, `screenshot`, ...).
Protocol coverage (MCP 2025-06-18)
`tools/list`, `tools/call`, `tools/list_changed`, `resources/list`, `resources/read`, `resources/subscribe`, `notifications/resources/updated`, `prompts/list`, `prompts/get`, `sampling/createMessage` (server-initiated), `elicitation/create` (destructive-tool gating), `logging/setLevel` + `notifications/message`, `roots/list` + `notifications/roots/list_changed`, progress notifications, cancellation, schema validation, plugin auto-register.
Transports
Infrastructure
All work obeys CLAUDE.md feature-delivery rules: headless core in `utils/`, public re-exports from `je_auto_control/init.py`, executor commands wired (`AC_start_mcp_server`, `AC_start_mcp_http_server`), top-level package stays Qt-free, every module under 750 lines, no new runtime dependencies (psutil only required by the optional process tools).
Test plan