Skip to content

AkhtarXx/clicky-linux

Repository files navigation

Clicky for Linux

A voice-powered AI assistant that lives next to your cursor on Linux. It can see your screen, talk to you, and point at things — like having a real teacher sitting beside you.

This is a personal project I'm actively building and improving. Inspired by Farza's original Clicky (the macOS app), rebuilt from scratch for Linux.

Features

  • Push-to-talk (Ctrl+Alt) or wake word ("Hey Jarvis") activation
  • Takes screenshots and sends them to Claude for visual understanding
  • Cursor overlay that follows your mouse and flies to UI elements Claude points at
  • Teacher mode: guides your real cursor to targets and auto-clicks when you ask
  • Computer agent mode: Claude can control your desktop (click, type, scroll, drag, bash)
  • Three voice providers: Worker (AssemblyAI STT + Claude + Polly TTS), Nova Sonic (speech-to-speech via Bedrock), or Hybrid (Nova Sonic STT + Claude vision + Worker TTS)
  • Conversation state machine with follow-up window (no wake word needed for quick replies)
  • Voice Activity Detection (VAD) for automatic end-of-speech in wake word mode
  • Persistent memory via optional MemPalace integration (semantic search + knowledge graph)
  • Multi-monitor support with per-screen pointing
  • Conversation history saved to disk
  • Works on both X11 and Wayland (evdev fallback for global hotkeys)
  • Circuit breaker + retry logic for API resilience

Two runtimes

  • clicky_linux_desktop.py — tray + overlay + push-to-talk + wake word (full experience)
  • clicky_linux_cli.py — terminal fallback (text input, screenshot capture, TTS)

Setup

Prerequisites

  • Python 3.10+
  • Node.js 18+ (for the Cloudflare Worker)
  • A Cloudflare account (free tier works)
  • API keys for: Anthropic, AssemblyAI
  • AWS credentials for TTS (Polly) and optional Bedrock routing
  • Optional: ElevenLabs API key as TTS fallback
  • Optional audio player for TTS: ffplay, mpg123, mpv, or cvlc

1. Cloudflare Worker (API proxy)

The Worker holds your API keys so they never ship in the app.

cd worker
npm install

Add your secrets:

npx wrangler secret put ANTHROPIC_API_KEY
npx wrangler secret put ASSEMBLYAI_API_KEY
npx wrangler secret put AWS_ACCESS_KEY_ID
npx wrangler secret put AWS_SECRET_ACCESS_KEY
npx wrangler secret put AWS_SESSION_TOKEN          # optional, for temp credentials
npx wrangler secret put ELEVENLABS_API_KEY          # optional TTS fallback

Configure non-secret vars in wrangler.toml:

[vars]
POLLY_VOICE_ID = "Joanna"
POLLY_ENGINE = "neural"
ELEVENLABS_VOICE_ID = "your-elevenlabs-voice-id"
AWS_REGION = "us-east-1"
BEDROCK_MODEL_ID = "global.anthropic.claude-sonnet-4-6"

Deploy:

npx wrangler deploy

You'll get a URL like https://your-worker-name.your-subdomain.workers.dev.

2. Local development (optional)

cd worker
npx wrangler dev

Create worker/.dev.vars with your keys (this file is gitignored).

3. Install Linux dependencies

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

System packages:

# Required
sudo apt install libportaudio2 portaudio19-dev

# Optional: agent mode on X11
sudo apt install xdotool xclip

# Optional: Wayland global hotkeys (evdev)
sudo apt install libevdev-dev

Optional dependencies (not in requirements.txt):

# Nova Sonic voice provider (speech-to-speech via Bedrock)
pip install aws-sdk-bedrock-runtime smithy-aws-core

# Persistent memory (semantic search + knowledge graph)
pip install mempalace
mempalace init ~/clicky-palace

4. Run

Desktop (full experience):

python clicky_linux_desktop.py --worker-url https://your-worker-name.your-subdomain.workers.dev

Terminal only:

python clicky_linux_cli.py --worker-url https://your-worker-name.your-subdomain.workers.dev

CLI flags

Desktop runtime (clicky_linux_desktop.py):

Flag Description
--worker-url Cloudflare Worker base URL
--model Chat model ID (default: claude-sonnet-4-6)
--show-cursor Keep cursor overlay always visible
--hide-cursor Transient cursor (show only during interactions)
--system-prompt Custom system prompt for /chat
--no-tts Disable TTS playback
--voice-provider worker (default), nova-sonic, or hybrid

Terminal runtime (clicky_linux_cli.py):

Flag Description
--worker-url Cloudflare Worker base URL
--model Chat model ID
--system-prompt Custom system prompt
--no-screenshot Text-only mode, no screen capture
--no-tts Disable TTS playback

Environment variables

Variable Description
CLICKY_WORKER_URL Worker URL (alternative to --worker-url)
CLICKY_MODEL Model ID (alternative to --model)
CLICKY_SYSTEM_PROMPT System prompt
CLICKY_VOICE_PROVIDER Voice provider: worker, nova-sonic, hybrid
CLICKY_SHOW_CURSOR 1/0 — cursor overlay visibility
CLICKY_WAKE_WORD_ENABLED 1/0 — "Hey Jarvis" wake word (default: on)
CLICKY_AGENT_MODE 1/0 — computer agent mode (default: on)
CLICKY_ENABLE_TEACHER_CURSOR_GUIDANCE 1/0 — guide real cursor to targets
CLICKY_ENABLE_TEACHER_AUTO_CLICK 1/0 — auto-click pointed targets
CLICKY_FORCE_POINTING_FALLBACK 1/0 — always attempt pointing even without coordinates

Architecture

Tray app with a cursor overlay (PySide6). Push-to-talk or wake word triggers recording. Audio streams over WebSocket to AssemblyAI for transcription, then the transcript + screenshot go to Claude via streaming SSE through the Worker /chat route. Response plays through Worker /tts (AWS Polly, optional ElevenLabs fallback).

Claude can embed [POINT:x,y:label:screenN] tags in responses to make the cursor fly to specific UI elements. Computer Use locator pass refines coordinates. The overlay animates along a Bezier curve to the target, shows a speech bubble, then flies back.

The Worker has three routes:

  • /chat — proxies to Anthropic or AWS Bedrock (selected by bedrock: model prefix)
  • /tts — text-to-speech via AWS Polly (primary) or ElevenLabs (fallback)
  • /transcribe-token — fetches a temporary AssemblyAI real-time token

Conversation state machine: idle → listening → processing → responding → follow_up → idle. Follow-up window lets you speak again without the wake word.

Project structure

clicky_linux_desktop.py          # Desktop runtime entry point
clicky_linux_cli.py              # Terminal runtime entry point
clicky_desktop_runtime/
  main.py                        # CLI parsing + app bootstrap
  core.py                        # Config, screenshots, pointing, TTS, history
  desktop_app.py                 # PySide6 tray, overlay, hotkeys, state machine
  services.py                    # WorkerClient, AssemblyAI streaming, audio recorder, TTS player, wake word, VAD
  computer_agent.py              # Computer Use agent loop (click, type, scroll, bash)
  memory_service.py              # MemPalace integration (semantic memory + knowledge graph)
  nova_sonic_provider.py         # Amazon Nova Sonic speech-to-speech pipeline
worker/
  src/index.ts                   # Three routes: /chat, /tts, /transcribe-token
  wrangler.toml                  # Cloudflare Worker config
tests/                           # Test suite

Status

Work in progress. Actively improving it. PRs and ideas welcome.

About

An AI buddy that lives next to your cursor — inspired by Farza's Clicky, built for Linux

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors