feat: add voice (TTS/STT, playback) and rich interactive components by DevRohit06 · Pull Request #20 · DevRohit06/discli

DevRohit06 · 2026-04-16T18:06:42Z

Summary

Adds full-duplex voice capabilities: TTS speak (ElevenLabs, OpenAI) + STT listen/transcribe (Deepgram, OpenAI Whisper) + audio playback (files/URLs)
Adds rich interactive components: modals, multi-step workflows with state tracking and timeouts, persistent dashboards with pagination and auto-refresh
Surfaces everything via both CLI commands (discli voice, discli interact) and serve-mode JSONL actions (17 new actions)

What's new

New modules:

src/discli/voice_engine.py — VoiceEngine managing per-guild connections, AudioPlayer (queue-based playback), AudioListener (VAD via silero-vad + STT pipeline)
src/discli/interact_engine.py — InteractEngine with workflow state machine, dashboard renderer, and prefix-based interaction router (modal: / wf: / dash:)
src/discli/tts.py — TTSProvider protocol + ElevenLabs and OpenAI implementations (lazy SDK imports)
src/discli/stt.py — STTProvider protocol + Deepgram (streaming) and OpenAI Whisper (batch) implementations

New CLI commands:

discli voice — join, leave, speak, play, stop, pause, resume, listen, status
discli interact — modal, workflow start, dashboard create/delete

New serve-mode actions (17): voice_connect, voice_disconnect, voice_move, voice_speak, voice_play, voice_stop, voice_pause, voice_resume, voice_listen_start, voice_listen_stop, voice_status, voice_set_config, workflow_start, workflow_cancel, dashboard_create, dashboard_update, dashboard_delete

New serve-mode events: voice_transcription, voice_playback_started/finished, voice_connected/disconnected, workflow_step_completed/finished/timeout, dashboard_interaction

Optional dependency groups:

voice — PyNaCl, discord-ext-voice-recv, silero-vad, audioop-lts
elevenlabs, deepgram, openai-voice, local-voice (piper-tts, faster-whisper)
all-voice — convenience meta-package (voice + elevenlabs + deepgram)

Security: permission profiles updated — voice scope added, interact scope added to chat profile, voice status allowed in readonly.

Version: 0.7.0 → 0.8.0

Design doc: docs/plans/2026-04-11-voice-and-interactive-design.md
Implementation plan: docs/plans/2026-04-11-voice-and-interactive-plan.md

Test plan

31/31 unit tests passing (12 new for engines + security)
discli --help lists voice and interact groups
discli voice --help shows all 9 subcommands
discli interact --help shows modal / workflow / dashboard
Core package installs cleanly without optional voice extras
Manual: discli voice join #general && discli voice speak "hello" (requires ELEVENLABS_API_KEY + ffmpeg)
Manual: discli voice listen round-trip via Deepgram (requires DEEPGRAM_API_KEY)
Manual: workflow + dashboard JSON specs tested end-to-end in serve mode
Voice receive via discord-ext-voice-recv verified against live Discord voice channel

🤖 Generated with Claude Code

…tions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ementations Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… management Implements voice_engine.py with VoiceError, DEFAULT_CONFIG, AudioPlayer (queue-based playback with PCMVolumeTransformer), AudioListener (VAD-segmented STT via silero-vad), and VoiceEngine (multi-guild connection manager wrapping TTS/STT providers). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…routing Implements InteractEngine with stateful workflow execution, multi-page dashboard management, prefix-based custom_id routing, and a modal builder. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Implements interact_group with modal trigger, workflow start, dashboard create/delete subcommands using InteractEngine. Registers group in cli.py alongside existing voice_group. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…resume, listen, status) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…etc.) Adds lazy VoiceEngine initialization and 12 voice action handlers to serve.py: voice_connect, voice_disconnect, voice_move, voice_speak, voice_play, voice_stop, voice_pause, voice_resume, voice_listen_start, voice_listen_stop, voice_status, voice_set_config. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…nteraction routing) - Add interact engine lazy init (mirrors voice engine pattern) - Intercept wf: and dash: prefixed component interactions in on_interaction, routing them to InteractEngine before generic emit - Add 5 action handlers: workflow_start, workflow_cancel, dashboard_create, dashboard_update, dashboard_delete - Register new actions in _actions dispatch dict Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add 'voice' scope to chat profile denial list - Add 'interact' scope to chat profile allowed list - Add 'voice status' to readonly profile allowed list - Update profile descriptions to reflect new capabilities - Add comprehensive tests for voice and interact permission scopes All 31 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nt.md - Updated CLAUDE.md Project Overview to mention voice audio and interactive components - Added voice_engine.py, interact_engine.py, tts.py, stt.py to Key modules section - Updated serve command description to reflect voice and interactive action dispatch - Updated security.py description to include voice/interact permission scopes - Added voice extras installation examples to Commands section - Updated agents/discord-agent.md with voice command group (join, leave, speak, play, listen, etc.) - Updated agents/discord-agent.md with interact command group (modal, workflow, dashboard) - Added voice and interact actions to serve stdin commands - Added voice and interact examples to serve stdin and stdout event sections Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Uses cached voice_states and channel.members — no voice connection required. Added matching serve-mode actions voice_where and voice_members. Both allowed in readonly profile since they're pure reads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Core: click>=8.3.2, discord.py>=2.7.1 - Voice: pynacl>=1.5.0, discord-ext-voice-recv>=0.5.2a179, silero-vad>=6.2.1, audioop-lts>=0.2.2 - TTS/STT: elevenlabs>=2.43.0, deepgram-sdk>=6.1.1, openai>=2.32.0 - Local voice: faster-whisper>=1.2.1, piper-tts>=1.4.2 - Dev: pytest>=9.0.3, pytest-asyncio>=1.3.0 - Add uv.lock for reproducible installs (102 packages) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

DevRohit06 and others added 14 commits April 12, 2026 00:13

chore: add voice and interactive optional dependency groups

6830433

feat: add TTS provider protocol with ElevenLabs and OpenAI implementa…

6a1a7e0

…tions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: add STT provider protocol with Deepgram and OpenAI Whisper impl…

7d9f583

…ementations Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat: add voice CLI commands (join, leave, speak, play, stop, pause, …

4cc59a0

…resume, listen, status) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore: bump version to 0.8.0

dfc3f03

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add voice (TTS/STT, playback) and rich interactive components#20

feat: add voice (TTS/STT, playback) and rich interactive components#20
DevRohit06 wants to merge 14 commits intodevfrom
feat/voice-and-interactive

DevRohit06 commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DevRohit06 commented Apr 16, 2026

Summary

What's new

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant