Clicky for Linux

A voice-powered AI assistant that lives next to your cursor on Linux. It can see your screen, talk to you, and point at things — like having a real teacher sitting beside you.

This is a personal project I'm actively building and improving. Inspired by Farza's original Clicky (the macOS app), rebuilt from scratch for Linux.

Features

Push-to-talk (Ctrl+Alt) or wake word ("Hey Jarvis") activation
Takes screenshots and sends them to Claude for visual understanding
Cursor overlay that follows your mouse and flies to UI elements Claude points at
Teacher mode: guides your real cursor to targets and auto-clicks when you ask
Computer agent mode: Claude can control your desktop (click, type, scroll, drag, bash)
Three voice providers: Worker (AssemblyAI STT + Claude + Polly TTS), Nova Sonic (speech-to-speech via Bedrock), or Hybrid (Nova Sonic STT + Claude vision + Worker TTS)
Conversation state machine with follow-up window (no wake word needed for quick replies)
Voice Activity Detection (VAD) for automatic end-of-speech in wake word mode
Persistent memory via optional MemPalace integration (semantic search + knowledge graph)
Multi-monitor support with per-screen pointing
Conversation history saved to disk
Works on both X11 and Wayland (evdev fallback for global hotkeys)
Circuit breaker + retry logic for API resilience

Two runtimes

clicky_linux_desktop.py — tray + overlay + push-to-talk + wake word (full experience)
clicky_linux_cli.py — terminal fallback (text input, screenshot capture, TTS)

Setup

Prerequisites

Python 3.10+
Node.js 18+ (for the Cloudflare Worker)
A Cloudflare account (free tier works)
API keys for: Anthropic, AssemblyAI
AWS credentials for TTS (Polly) and optional Bedrock routing
Optional: ElevenLabs API key as TTS fallback
Optional audio player for TTS: ffplay, mpg123, mpv, or cvlc

1. Cloudflare Worker (API proxy)

The Worker holds your API keys so they never ship in the app.

cd worker
npm install

Add your secrets:

npx wrangler secret put ANTHROPIC_API_KEY
npx wrangler secret put ASSEMBLYAI_API_KEY
npx wrangler secret put AWS_ACCESS_KEY_ID
npx wrangler secret put AWS_SECRET_ACCESS_KEY
npx wrangler secret put AWS_SESSION_TOKEN          # optional, for temp credentials
npx wrangler secret put ELEVENLABS_API_KEY          # optional TTS fallback

Configure non-secret vars in wrangler.toml:

[vars]
POLLY_VOICE_ID = "Joanna"
POLLY_ENGINE = "neural"
ELEVENLABS_VOICE_ID = "your-elevenlabs-voice-id"
AWS_REGION = "us-east-1"
BEDROCK_MODEL_ID = "global.anthropic.claude-sonnet-4-6"

Deploy:

npx wrangler deploy

You'll get a URL like https://your-worker-name.your-subdomain.workers.dev.

2. Local development (optional)

cd worker
npx wrangler dev

Create worker/.dev.vars with your keys (this file is gitignored).

3. Install Linux dependencies

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

System packages:

# Required
sudo apt install libportaudio2 portaudio19-dev

# Optional: agent mode on X11
sudo apt install xdotool xclip

# Optional: Wayland global hotkeys (evdev)
sudo apt install libevdev-dev

Optional dependencies (not in requirements.txt):

# Nova Sonic voice provider (speech-to-speech via Bedrock)
pip install aws-sdk-bedrock-runtime smithy-aws-core

# Persistent memory (semantic search + knowledge graph)
pip install mempalace
mempalace init ~/clicky-palace

4. Run

Desktop (full experience):

python clicky_linux_desktop.py --worker-url https://your-worker-name.your-subdomain.workers.dev

Terminal only:

python clicky_linux_cli.py --worker-url https://your-worker-name.your-subdomain.workers.dev

CLI flags

Desktop runtime (clicky_linux_desktop.py):

Flag	Description
`--worker-url`	Cloudflare Worker base URL
`--model`	Chat model ID (default: `claude-sonnet-4-6`)
`--show-cursor`	Keep cursor overlay always visible
`--hide-cursor`	Transient cursor (show only during interactions)
`--system-prompt`	Custom system prompt for /chat
`--no-tts`	Disable TTS playback
`--voice-provider`	`worker` (default), `nova-sonic`, or `hybrid`

Terminal runtime (clicky_linux_cli.py):

Flag	Description
`--worker-url`	Cloudflare Worker base URL
`--model`	Chat model ID
`--system-prompt`	Custom system prompt
`--no-screenshot`	Text-only mode, no screen capture
`--no-tts`	Disable TTS playback

Environment variables

Variable	Description
`CLICKY_WORKER_URL`	Worker URL (alternative to `--worker-url`)
`CLICKY_MODEL`	Model ID (alternative to `--model`)
`CLICKY_SYSTEM_PROMPT`	System prompt
`CLICKY_VOICE_PROVIDER`	Voice provider: `worker`, `nova-sonic`, `hybrid`
`CLICKY_SHOW_CURSOR`	`1`/`0` — cursor overlay visibility
`CLICKY_WAKE_WORD_ENABLED`	`1`/`0` — "Hey Jarvis" wake word (default: on)
`CLICKY_AGENT_MODE`	`1`/`0` — computer agent mode (default: on)
`CLICKY_ENABLE_TEACHER_CURSOR_GUIDANCE`	`1`/`0` — guide real cursor to targets
`CLICKY_ENABLE_TEACHER_AUTO_CLICK`	`1`/`0` — auto-click pointed targets
`CLICKY_FORCE_POINTING_FALLBACK`	`1`/`0` — always attempt pointing even without coordinates

Architecture

Tray app with a cursor overlay (PySide6). Push-to-talk or wake word triggers recording. Audio streams over WebSocket to AssemblyAI for transcription, then the transcript + screenshot go to Claude via streaming SSE through the Worker /chat route. Response plays through Worker /tts (AWS Polly, optional ElevenLabs fallback).

Claude can embed [POINT:x,y:label:screenN] tags in responses to make the cursor fly to specific UI elements. Computer Use locator pass refines coordinates. The overlay animates along a Bezier curve to the target, shows a speech bubble, then flies back.

The Worker has three routes:

/chat — proxies to Anthropic or AWS Bedrock (selected by bedrock: model prefix)
/tts — text-to-speech via AWS Polly (primary) or ElevenLabs (fallback)
/transcribe-token — fetches a temporary AssemblyAI real-time token

Conversation state machine: idle → listening → processing → responding → follow_up → idle. Follow-up window lets you speak again without the wake word.

Project structure

clicky_linux_desktop.py          # Desktop runtime entry point
clicky_linux_cli.py              # Terminal runtime entry point
clicky_desktop_runtime/
  main.py                        # CLI parsing + app bootstrap
  core.py                        # Config, screenshots, pointing, TTS, history
  desktop_app.py                 # PySide6 tray, overlay, hotkeys, state machine
  services.py                    # WorkerClient, AssemblyAI streaming, audio recorder, TTS player, wake word, VAD
  computer_agent.py              # Computer Use agent loop (click, type, scroll, bash)
  memory_service.py              # MemPalace integration (semantic memory + knowledge graph)
  nova_sonic_provider.py         # Amazon Nova Sonic speech-to-speech pipeline
worker/
  src/index.ts                   # Three routes: /chat, /tts, /transcribe-token
  wrangler.toml                  # Cloudflare Worker config
tests/                           # Test suite

Status

Work in progress. Actively improving it. PRs and ideas welcome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clicky for Linux

Features

Two runtimes

Setup

Prerequisites

1. Cloudflare Worker (API proxy)

2. Local development (optional)

3. Install Linux dependencies

4. Run

CLI flags

Environment variables

Architecture

Project structure

Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
clicky_desktop_runtime		clicky_desktop_runtime
tests		tests
worker		worker
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
clicky_linux_cli.py		clicky_linux_cli.py
clicky_linux_desktop.py		clicky_linux_desktop.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Clicky for Linux

Features

Two runtimes

Setup

Prerequisites

1. Cloudflare Worker (API proxy)

2. Local development (optional)

3. Install Linux dependencies

4. Run

CLI flags

Environment variables

Architecture

Project structure

Status

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages