A voice-powered AI assistant that lives next to your cursor on Linux. It can see your screen, talk to you, and point at things — like having a real teacher sitting beside you.
This is a personal project I'm actively building and improving. Inspired by Farza's original Clicky (the macOS app), rebuilt from scratch for Linux.
- Push-to-talk (Ctrl+Alt) or wake word ("Hey Jarvis") activation
- Takes screenshots and sends them to Claude for visual understanding
- Cursor overlay that follows your mouse and flies to UI elements Claude points at
- Teacher mode: guides your real cursor to targets and auto-clicks when you ask
- Computer agent mode: Claude can control your desktop (click, type, scroll, drag, bash)
- Three voice providers: Worker (AssemblyAI STT + Claude + Polly TTS), Nova Sonic (speech-to-speech via Bedrock), or Hybrid (Nova Sonic STT + Claude vision + Worker TTS)
- Conversation state machine with follow-up window (no wake word needed for quick replies)
- Voice Activity Detection (VAD) for automatic end-of-speech in wake word mode
- Persistent memory via optional MemPalace integration (semantic search + knowledge graph)
- Multi-monitor support with per-screen pointing
- Conversation history saved to disk
- Works on both X11 and Wayland (evdev fallback for global hotkeys)
- Circuit breaker + retry logic for API resilience
clicky_linux_desktop.py— tray + overlay + push-to-talk + wake word (full experience)clicky_linux_cli.py— terminal fallback (text input, screenshot capture, TTS)
- Python 3.10+
- Node.js 18+ (for the Cloudflare Worker)
- A Cloudflare account (free tier works)
- API keys for: Anthropic, AssemblyAI
- AWS credentials for TTS (Polly) and optional Bedrock routing
- Optional: ElevenLabs API key as TTS fallback
- Optional audio player for TTS:
ffplay,mpg123,mpv, orcvlc
The Worker holds your API keys so they never ship in the app.
cd worker
npm installAdd your secrets:
npx wrangler secret put ANTHROPIC_API_KEY
npx wrangler secret put ASSEMBLYAI_API_KEY
npx wrangler secret put AWS_ACCESS_KEY_ID
npx wrangler secret put AWS_SECRET_ACCESS_KEY
npx wrangler secret put AWS_SESSION_TOKEN # optional, for temp credentials
npx wrangler secret put ELEVENLABS_API_KEY # optional TTS fallbackConfigure non-secret vars in wrangler.toml:
[vars]
POLLY_VOICE_ID = "Joanna"
POLLY_ENGINE = "neural"
ELEVENLABS_VOICE_ID = "your-elevenlabs-voice-id"
AWS_REGION = "us-east-1"
BEDROCK_MODEL_ID = "global.anthropic.claude-sonnet-4-6"Deploy:
npx wrangler deployYou'll get a URL like https://your-worker-name.your-subdomain.workers.dev.
cd worker
npx wrangler devCreate worker/.dev.vars with your keys (this file is gitignored).
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtSystem packages:
# Required
sudo apt install libportaudio2 portaudio19-dev
# Optional: agent mode on X11
sudo apt install xdotool xclip
# Optional: Wayland global hotkeys (evdev)
sudo apt install libevdev-devOptional dependencies (not in requirements.txt):
# Nova Sonic voice provider (speech-to-speech via Bedrock)
pip install aws-sdk-bedrock-runtime smithy-aws-core
# Persistent memory (semantic search + knowledge graph)
pip install mempalace
mempalace init ~/clicky-palaceDesktop (full experience):
python clicky_linux_desktop.py --worker-url https://your-worker-name.your-subdomain.workers.devTerminal only:
python clicky_linux_cli.py --worker-url https://your-worker-name.your-subdomain.workers.devDesktop runtime (clicky_linux_desktop.py):
| Flag | Description |
|---|---|
--worker-url |
Cloudflare Worker base URL |
--model |
Chat model ID (default: claude-sonnet-4-6) |
--show-cursor |
Keep cursor overlay always visible |
--hide-cursor |
Transient cursor (show only during interactions) |
--system-prompt |
Custom system prompt for /chat |
--no-tts |
Disable TTS playback |
--voice-provider |
worker (default), nova-sonic, or hybrid |
Terminal runtime (clicky_linux_cli.py):
| Flag | Description |
|---|---|
--worker-url |
Cloudflare Worker base URL |
--model |
Chat model ID |
--system-prompt |
Custom system prompt |
--no-screenshot |
Text-only mode, no screen capture |
--no-tts |
Disable TTS playback |
| Variable | Description |
|---|---|
CLICKY_WORKER_URL |
Worker URL (alternative to --worker-url) |
CLICKY_MODEL |
Model ID (alternative to --model) |
CLICKY_SYSTEM_PROMPT |
System prompt |
CLICKY_VOICE_PROVIDER |
Voice provider: worker, nova-sonic, hybrid |
CLICKY_SHOW_CURSOR |
1/0 — cursor overlay visibility |
CLICKY_WAKE_WORD_ENABLED |
1/0 — "Hey Jarvis" wake word (default: on) |
CLICKY_AGENT_MODE |
1/0 — computer agent mode (default: on) |
CLICKY_ENABLE_TEACHER_CURSOR_GUIDANCE |
1/0 — guide real cursor to targets |
CLICKY_ENABLE_TEACHER_AUTO_CLICK |
1/0 — auto-click pointed targets |
CLICKY_FORCE_POINTING_FALLBACK |
1/0 — always attempt pointing even without coordinates |
Tray app with a cursor overlay (PySide6). Push-to-talk or wake word triggers recording. Audio streams over WebSocket to AssemblyAI for transcription, then the transcript + screenshot go to Claude via streaming SSE through the Worker /chat route. Response plays through Worker /tts (AWS Polly, optional ElevenLabs fallback).
Claude can embed [POINT:x,y:label:screenN] tags in responses to make the cursor fly to specific UI elements. Computer Use locator pass refines coordinates. The overlay animates along a Bezier curve to the target, shows a speech bubble, then flies back.
The Worker has three routes:
/chat— proxies to Anthropic or AWS Bedrock (selected bybedrock:model prefix)/tts— text-to-speech via AWS Polly (primary) or ElevenLabs (fallback)/transcribe-token— fetches a temporary AssemblyAI real-time token
Conversation state machine: idle → listening → processing → responding → follow_up → idle. Follow-up window lets you speak again without the wake word.
clicky_linux_desktop.py # Desktop runtime entry point
clicky_linux_cli.py # Terminal runtime entry point
clicky_desktop_runtime/
main.py # CLI parsing + app bootstrap
core.py # Config, screenshots, pointing, TTS, history
desktop_app.py # PySide6 tray, overlay, hotkeys, state machine
services.py # WorkerClient, AssemblyAI streaming, audio recorder, TTS player, wake word, VAD
computer_agent.py # Computer Use agent loop (click, type, scroll, bash)
memory_service.py # MemPalace integration (semantic memory + knowledge graph)
nova_sonic_provider.py # Amazon Nova Sonic speech-to-speech pipeline
worker/
src/index.ts # Three routes: /chat, /tts, /transcribe-token
wrangler.toml # Cloudflare Worker config
tests/ # Test suite
Work in progress. Actively improving it. PRs and ideas welcome.