Skip to content

Add headless MCP server for AutoControl#178

Merged
JE-Chen merged 38 commits intomainfrom
dev
Apr 25, 2026
Merged

Add headless MCP server for AutoControl#178
JE-Chen merged 38 commits intomainfrom
dev

Conversation

@JE-Chen
Copy link
Copy Markdown
Member

@JE-Chen JE-Chen commented Apr 25, 2026

Summary

Adds a complete headless Model Context Protocol (MCP) server to AutoControl so MCP-compatible clients (Claude Desktop, Claude Code, custom tool-use loops) can drive the host machine through AutoControl. 35 focused commits land tools, protocol coverage, transports, and infrastructure.

Tool surface (~90 tools)

Mouse, keyboard, screen + screenshot-as-image, image / OCR, accessibility tree, VLM locator, windows (focus / list / move / resize / minimize / maximize / restore / close), clipboard text + image, action recording / replay / edit, scheduler, triggers, hotkey daemon, screen recording, multi-monitor, drag, send-to-window, process / shell, ac_diff_screenshots, ac_wait_for_image / ac_wait_for_pixel, plus a curated set of short aliases (`click`, `type`, `screenshot`, ...).

Protocol coverage (MCP 2025-06-18)

`tools/list`, `tools/call`, `tools/list_changed`, `resources/list`, `resources/read`, `resources/subscribe`, `notifications/resources/updated`, `prompts/list`, `prompts/get`, `sampling/createMessage` (server-initiated), `elicitation/create` (destructive-tool gating), `logging/setLevel` + `notifications/message`, `roots/list` + `notifications/roots/list_changed`, progress notifications, cancellation, schema validation, plugin auto-register.

Transports

  • stdio (`python -m je_auto_control.utils.mcp_server`, `je_auto_control_mcp` console script)
  • HTTP at `/mcp` with SSE streaming, bearer-token auth, optional TLS

Infrastructure

  • Tool annotations (readOnly / destructive / idempotent / openWorld)
  • Read-only mode (env var or constructor flag)
  • JSONL audit log with secret redaction
  • Token-bucket rate limiter
  • Auto-screenshot on tool error
  • Plugin hot-reload via PluginWatcher
  • In-memory fake backend for CI (`JE_AUTOCONTROL_FAKE_BACKEND=1`)
  • CLI introspection (`--list-tools`, `--list-resources`, `--list-prompts`, `--read-only`, `--fake-backend`)
  • Bilingual docs (Eng + Zh)

All work obeys CLAUDE.md feature-delivery rules: headless core in `utils/`, public re-exports from `je_auto_control/init.py`, executor commands wired (`AC_start_mcp_server`, `AC_start_mcp_http_server`), top-level package stays Qt-free, every module under 750 lines, no new runtime dependencies (psutil only required by the optional process tools).

Test plan

  • `py -m pytest test/unit_test/headless/ -q` — 260 tests pass on Windows
  • `py -c "import sys, je_auto_control; assert not any('PySide6' in m for m in sys.modules)"` — facade stays Qt-free
  • Manual: register the server with Claude Desktop / Claude Code via `claude mcp add autocontrol -- python -m je_auto_control.utils.mcp_server` and exercise a few tools
  • Manual: HTTPS smoke test with bearer token (`JE_AUTOCONTROL_MCP_TOKEN=...`)
  • Manual: confirm `--list-tools` JSON renders on a fresh Python install

JE-Chen added 30 commits April 25, 2026 18:40
Expose the headless automation API through a stdlib-only Model
Context Protocol server so MCP clients (Claude Desktop, Claude Code,
custom tool-use loops) can drive AutoControl over JSON-RPC 2.0
stdio. Ships 24 tools across mouse, keyboard, screen, image, OCR,
window, clipboard, executor, and run-history.
Surface readOnlyHint, destructiveHint, idempotentHint, openWorldHint
on every default tool so MCP clients can require user confirmation
before destructive actions (typing, clicking, executing scripts) and
auto-approve read-only queries (positions, sizes, OCR, history).
Honor the JE_AUTOCONTROL_MCP_READONLY env var (and a read_only
parameter on build_default_tool_registry) to drop every tool that
can mutate state. Lets shared / hardened deployments expose only
observers (positions, OCR, clipboard reads, history) without code
changes.
ac_screenshot now returns a base64 PNG image content block so the
model can actually see the screen, not just record that a file was
saved. file_path is optional now; when given, the image is also
written to disk and the resolved path is appended as a text block.

Add MCPContent value type and content-block normalisation in the
JSON-RPC dispatcher so future tools can return any mix of text and
image content.
Expose record/stop_record, read/write JSON action files, and the
recording-edit helpers (trim, adjust_delays, scale_coordinates) as
MCP tools so a model can capture a manual session, persist it, and
replay it on a different resolution or speed.
ac_drag composes set→press→move→release for click-and-drag flows.
ac_send_key_to_window / ac_send_mouse_to_window post events to a
specific window without stealing focus, useful when an automation
needs to drive a background app while the user keeps using the
foreground.
Split tools.py into a tools/ package (_base, _handlers, _factories)
to keep every module under the 750-line limit, and add five new
semantic-locator tools: ac_a11y_list / ac_a11y_find / ac_a11y_click
target widgets through the accessibility tree (stable across visual
restyles), and ac_vlm_locate / ac_vlm_click fall back to a vision
language model for ad-hoc descriptions. Both paths are far more
robust than pixel-template matching for dynamic UIs.
Expose 15 automation-orchestration tools so a model can build full
event-driven workflows: ac_scheduler_* manages interval and cron
jobs, ac_trigger_add/remove/list/start/stop covers the four
trigger kinds (image, window, pixel, file), and ac_hotkey_* binds
global hotkeys to action JSON files. Each control-plane tool
returns the registered record so the model can chain follow-ups
without a separate listing call.
Implement resources/list and resources/read so MCP clients can
browse data the server has to offer without invoking a tool. The
default provider chains a directory listing of action JSON files
(autocontrol://files/<name>), the recent run-history snapshot
(autocontrol://history), and the executor command catalogue
(autocontrol://commands). A pluggable ResourceProvider lets callers
swap in custom sources. Path traversal is blocked at the provider
boundary.
Implement prompts/list and prompts/get so MCP clients can surface
reusable task templates as slash commands. Default catalogue ships
five recipes: automate_ui_task (locator-priority order), record_and_
generalize (capture and replay-with-semantics), compare_screenshots,
find_widget (cheapest reliable strategy first), explain_action_file
(plain-language summary). Pluggable PromptProvider lets callers add
project-specific templates.
Wrap the existing JSON-RPC dispatcher in a stdlib HTTP server so MCP
clients that prefer HTTP — or that need to reach the server from a
container or another machine — can connect without stdio. Speaks
the JSON-only flavour of MCP Streamable HTTP: POST /mcp returns the
JSON-RPC reply, notifications get 202 Accepted, GET returns 405 (no
SSE streaming yet). Default bind is 127.0.0.1 per the project's
least-privilege policy. Wired into the executor as
AC_start_mcp_http_server.
Tools that declare a 'ctx' parameter receive a ToolCallContext that
can push notifications/progress (when the client supplied a
progressToken) and observe cooperative cancellation. Server tracks
active tools/call requests and responds to notifications/cancelled
by setting the context flag so long-running tools can abort with
OperationCancelledError, surfaced to the client as JSON-RPC -32800.
Tools running in concurrent mode (the default under serve_stdio)
can now call server.request_sampling(messages, ...) to ask the
client model a question — the server emits a server-initiated
sampling/createMessage request and blocks the calling worker until
the client returns a response. handle_line gains response-correlation
so inbound JSON-RPC responses (id + result/error, no method) are
routed to the matching pending request. Stdio is opted into
concurrent tools/call dispatch so a sampling round-trip never
deadlocks the reader thread.
Run a small stdlib JSON-Schema validator before invoking the
handler so a model that hallucinates an arg (missing field, wrong
type, value outside an enum) gets a clean -32602 with a
field-level message instead of a Python TypeError. The validator
covers the schema features the bundled tools actually use; we
deliberately don't depend on the jsonschema library.
register_tool / unregister_tool now emit notifications/tools/
list_changed so the connected client refreshes its cached
catalogue, and the initialize handshake advertises listChanged=true.

Add make_plugin_tool / register_plugin_tools to wrap arbitrary
AC_* plugin callables as MCP tools — the JSON Schema is derived
from inspect.signature so a plugin you drop into your plugin
directory shows up as a model-callable tool with named arguments.
When a POST sets Accept: text/event-stream, the response now
streams progress notifications followed by the final JSON-RPC
result as SSE events. Lets a model see live status updates from
long-running tools (wait_for_window, OCR scans, etc.) over HTTP.
A per-server lock serialises SSE requests since they swap the
shared notifier/writer state — JSON POST requests stay fully
concurrent.
Tool takes two PNG paths and returns the bounding boxes that
changed. Pixel diff is computed in numpy (already a transitive
dep through opencv-python), connected components are found via
4-connectivity flood fill, and tiny components are filtered out
to ignore JPEG / antialias noise. Lets a model verify what its
last action actually changed without re-OCRing the screen.
ac_screen_record_start / ac_screen_record_stop / ac_screen_record_list
wrap the existing ScreenRecorder so a model can capture a video of
its own automation run for later review or for sharing with the
user. Recordings live on disk under a model-supplied path.
Add ac_list_monitors and a monitor_index parameter to ac_screenshot.
Index 0 reports the virtual desktop spanning every connected
display; 1+ are individual monitors. When monitor_index is given,
capture goes through mss directly so multi-display setups work
where PIL.ImageGrab silently captures only the primary screen.
ac_get_clipboard_image returns the clipboard's image as a base64
PNG content block (or a clear text fallback when the clipboard
holds no image). ac_set_clipboard_image places a Pillow-readable
file on the clipboard — Windows uses CF_DIB via ctypes; macOS and
Linux raise a clean NotImplementedError so the model gets a
useful error rather than a crash.
Optional Authorization: Bearer <token> validation, configurable
via constructor arg or JE_AUTOCONTROL_MCP_TOKEN env var. Missing
token returns 401, wrong token returns 403, comparison uses
hmac.compare_digest to avoid timing leaks.

Optional ssl_context wraps the listening socket so the same
transport can serve HTTPS — required for any non-localhost
deployment.
Every tools/call now produces one audit record with timestamp,
tool name, sanitised arguments (password/token/secret/api_key/
authorization values are replaced with '<redacted>'), status (ok /
error / cancelled), duration, and the error message on failure.
The default sink is the JE_AUTOCONTROL_MCP_AUDIT env var; when
unset, auditing is a no-op so unconfigured deployments pay nothing.
Optional RateLimiter on the MCPServer guards against runaway
loops — a model that gets stuck calling the same tool 1000x per
second now hits a -32000 'Rate limit exceeded' response instead
of pegging the host. No limiter is installed by default so
existing deployments keep their current behaviour; opt in with
MCPServer(rate_limiter=RateLimiter(...)).
--list-tools / --list-resources / --list-prompts emit the
default catalogue as JSON and exit, so CI checks and manual
debugging can inspect the server's surface without launching a
real MCP client. --read-only filters --list-tools to the
read-only subset (matches JE_AUTOCONTROL_MCP_READONLY runtime).
With no flags the entry point still launches the stdio server.
install_fake_backend() / uninstall_fake_backend() swap the
mouse / keyboard / screen / clipboard wrappers with recorders that
mutate an in-memory FakeState rather than the real OS. Lets a CI
runner exercise every MCP tool end-to-end without a display
server. Toggle via JE_AUTOCONTROL_FAKE_BACKEND=1 or the new
--fake-backend CLI flag. ac_click_mouse adapter relaxed to pass
through string keycodes unchanged so it works under either
backend.
When the client advertises the roots capability during initialize,
the server reciprocates with roots.listChanged in its capabilities,
fires a roots/list request once notifications/initialized arrives,
and re-fetches on every notifications/roots/list_changed. The first
file:// URI in the response becomes the FileSystemProvider root, so
one MCP server can follow the user across projects without a
restart. ResourceProvider gains a no-op set_workspace_root hook;
ChainProvider fans out, FileSystemProvider re-targets.
Add an MCPLogBridge logging.Handler that, while serve_stdio is
running, forwards every project-logger record to the MCP client as
a notifications/message. The handler is attached / detached
automatically per stdio session. logging/setLevel requests retune
the bridge level on the fly, and the initialize handshake now
advertises the logging capability so clients know to expect them.
Polling wait helpers fill the gap left by ac_wait_for_window /
ac_wait_for_text — useful for 'click then wait until the spinner
disappears' style flows. Both tools accept a ToolCallContext, so
they emit progress notifications and abort on
notifications/cancelled. Timeout becomes a TimeoutError surfaced
as a clean tool error to the model.
Fill the gap left by ac_focus_window / ac_close_window with
ac_window_move (combined move + resize via the new MoveWindow
helper), ac_window_minimize / _maximize / _restore on top of
ShowWindow with the right SW_* flags. All tools resolve a
matching hwnd via find_window first so the model can target by
title substring instead of tracking handles itself.
ac_launch_process spawns a detached subprocess from an argv list
(no shell expansion); ac_shell runs a command line via shlex.split
and reports exit_code/stdout/stderr; ac_list_processes and
ac_kill_process wrap psutil for inspection and cleanup. Working
directories are validated against os.path.realpath; missing psutil
returns a clear runtime error rather than ImportError.
JE-Chen added 5 commits April 25, 2026 19:54
When JE_AUTOCONTROL_MCP_ERROR_SHOTS is set to a directory, every
failed tools/call triggers a debug screenshot saved as
<tool>_<ts>.png under that path. The artifact path is appended to
both the error message returned to the model and the audit JSONL
record, giving a one-step forensic trail for flaky automations.
Disabled by default — costs nothing when the env var is unset.
Default registry now also exposes a curated set of model-friendly
aliases (click, type, screenshot, drag, find_image, shell, ...)
that point at the canonical ac_* tools. Reduces prompt verbosity
without renaming the underlying API. Toggle off via
JE_AUTOCONTROL_MCP_ALIASES=0 or build_default_tool_registry(
aliases=False); read-only mode automatically filters destructive
aliases out before expansion.
PluginWatcher polls a plugin directory and (un)registers MCP tools
on change: new files become tools, modified files re-register
under the same names with the updated handler, deleted files drop
their tools. Each register/unregister already triggers
notifications/tools/list_changed so connected clients see the
catalogue refresh in real time.
resources/subscribe / resources/unsubscribe wire a producer
callback into a provider. ResourceProvider gains optional
subscribe/unsubscribe hooks; ChainProvider fans out to children;
LiveScreenProvider exposes autocontrol://screen/live and pushes
notifications/resources/updated every poll_seconds while at least
one client is subscribed. resources.subscribe capability flag
flipped to true so clients know to attempt subscriptions.
Add MCPServer.request_elicitation that fires a server-initiated
elicitation/create when JE_AUTOCONTROL_MCP_CONFIRM_DESTRUCTIVE=1
and the client advertised the elicitation capability. Tools whose
annotations mark them destructive (and not read-only) are gated:
the user sees a confirmation prompt before the action runs, and
declining returns a clean -32000 error to the model. Servers that
talk to non-elicitation clients fall through with a logged warning
so the feature is opt-in and never blocks unexpectedly.
@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented Apr 25, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 1021 complexity · 25 duplication

Metric Results
Complexity 1021
Duplication 25

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

JE-Chen added 3 commits April 25, 2026 20:24
- S3516: drop invariant ``return 0`` from main(); CLI returns None.
- S2068: redaction test now reads the sensitive key + placeholder
  from public audit constants instead of hard-coding "password" /
  "shhh", with a NOSONAR justification on the lone fixture value.
- S1542: keep the AC_* plugin convention with a NOSONAR comment.
- S1192: extract _TOOLS_CALL_METHOD and _MIME_JSON constants.
- S1244: use math.isclose for the scheduler interval assertion.
- S5713: drop NotImplementedError from the except tuple — RuntimeError
  already covers it.
- Prospector pyflakes: remove the unused ``import logging``.
- Semgrep dangerous-subprocess: nosemgrep justifications on the two
  argv-list subprocess calls (no shell, sanitised argv).
- README.md / zh-TW / zh-CN: add MCP feature bullet, MCP quick-start
  section (registering with Claude Desktop / Code, programmatic
  start, HTTP+SSE+auth+TLS, --list-* CLI flags, surface table,
  security notes), and a Mermaid architecture diagram showing the
  client → transport → mcp_server → core → backends path. The
  filesystem tree below is kept and updated with mcp_server/ and
  the new auto_control_window.py.
- docs/source/{Eng,Zh}/doc/mcp_server/mcp_server_doc.rst: rewrite
  to cover the full surface that landed in this PR — ~90 tools,
  resources / prompts / sampling / roots / logging / progress /
  cancellation / elicitation, HTTP+SSE transport, audit log,
  rate limiter, auto-screenshot on error, plugin hot-reload,
  fake backend for CI, and CLI introspection flags.
main() returns None (Sonar S3516 fix from the previous round), so
``rc = main(argv)`` followed by ``assert rc is None`` trips Sonar
S3699 ("Remove this use of the output from 'main'; 'main' doesn't
return anything"). Drop the binding entirely and call main() purely
for its side effect.
@sonarqubecloud
Copy link
Copy Markdown

@JE-Chen JE-Chen merged commit 6961cd9 into main Apr 25, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant