
AI-Powered Desktop Automation Agent
Resonix GUI Agent is an autonomous AI agent that can see your screen, understand what's happening, and help you with desktop tasks - automatically, proactively, and intelligently.
- 👾 Autonomous Observation - Watches your screen in the background and detects when you need help
- 👁 AI Vision - Uses GPT-4V, Claude Vision, or local LLaVA to understand screen content
- 🖱️ Full Desktop Control - Click, type, scroll, hotkeys, drag & drop
- 🔄 Multi-Display Support - Works with multiple monitors and HiDPI displays
- 📝 OCR Built-in - Reads text from any screen
- 🧠 Smart Decision Making - Knows when to help and when to wait
Resonix starts
↓
GUI Agent automatically begins observation 👾👾👾
↓
Every 15 seconds → Analyzes screen
↓
Smart scene detection:
- Using other browser → "Use Resonix instead?"
- Search intent → "I can help search"
- Error detected → "I can fix that"
- Working → waits quietly
↓
Proactively offers help when needed
# Install
npm install -g resonix
# Start (GUI Agent is built-in and auto-starts!)
resonix
# Or run GUI Agent directly
resonix gui-agent --mode auto# Auto mode (AI makes decisions)
resonix gui-agent --mode auto
# Interactive mode (you give commands)
resonix gui-agent --mode interactive
# Disable vision
resonix gui-agent --no-vision
# Custom interval
resonix gui-agent --interval 3000# For AI Vision (choose one)
OPENAI_API_KEY=sk-... # GPT-4V (best)
ANTHROPIC_API_KEY=sk-... # Claude Vision
# For local (free)
# Just run Ollama with LLaVA model- macOS (Windows/Linux coming soon)
- For full functionality: Accessibility permission (auto-prompts)
src/agents/tools/gui-agent/
├── smart-agent.ts # Autonomous AI decision making
├── perception.ts # Screen capture, OCR, multi-display
├── accessibility.ts # macOS UI element detection
├── auto-vision.ts # AI vision provider
├── executor/ # Cross-platform mouse/keyboard control
└── index.ts # Built-in integration
Open source under MIT license. Contributions welcome!
MIT
Resonix - AI That Actually Helps