feat: add voice (TTS/STT, playback) and rich interactive components#20
Open
DevRohit06 wants to merge 14 commits intodevfrom
Open
feat: add voice (TTS/STT, playback) and rich interactive components#20DevRohit06 wants to merge 14 commits intodevfrom
DevRohit06 wants to merge 14 commits intodevfrom
Conversation
…tions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ementations Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… management Implements voice_engine.py with VoiceError, DEFAULT_CONFIG, AudioPlayer (queue-based playback with PCMVolumeTransformer), AudioListener (VAD-segmented STT via silero-vad), and VoiceEngine (multi-guild connection manager wrapping TTS/STT providers). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…routing Implements InteractEngine with stateful workflow execution, multi-page dashboard management, prefix-based custom_id routing, and a modal builder. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements interact_group with modal trigger, workflow start, dashboard create/delete subcommands using InteractEngine. Registers group in cli.py alongside existing voice_group. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…resume, listen, status) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…etc.) Adds lazy VoiceEngine initialization and 12 voice action handlers to serve.py: voice_connect, voice_disconnect, voice_move, voice_speak, voice_play, voice_stop, voice_pause, voice_resume, voice_listen_start, voice_listen_stop, voice_status, voice_set_config. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nteraction routing) - Add interact engine lazy init (mirrors voice engine pattern) - Intercept wf: and dash: prefixed component interactions in on_interaction, routing them to InteractEngine before generic emit - Add 5 action handlers: workflow_start, workflow_cancel, dashboard_create, dashboard_update, dashboard_delete - Register new actions in _actions dispatch dict Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add 'voice' scope to chat profile denial list - Add 'interact' scope to chat profile allowed list - Add 'voice status' to readonly profile allowed list - Update profile descriptions to reflect new capabilities - Add comprehensive tests for voice and interact permission scopes All 31 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nt.md - Updated CLAUDE.md Project Overview to mention voice audio and interactive components - Added voice_engine.py, interact_engine.py, tts.py, stt.py to Key modules section - Updated serve command description to reflect voice and interactive action dispatch - Updated security.py description to include voice/interact permission scopes - Added voice extras installation examples to Commands section - Updated agents/discord-agent.md with voice command group (join, leave, speak, play, listen, etc.) - Updated agents/discord-agent.md with interact command group (modal, workflow, dashboard) - Added voice and interact actions to serve stdin commands - Added voice and interact examples to serve stdin and stdout event sections Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Uses cached voice_states and channel.members — no voice connection required. Added matching serve-mode actions voice_where and voice_members. Both allowed in readonly profile since they're pure reads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Core: click>=8.3.2, discord.py>=2.7.1 - Voice: pynacl>=1.5.0, discord-ext-voice-recv>=0.5.2a179, silero-vad>=6.2.1, audioop-lts>=0.2.2 - TTS/STT: elevenlabs>=2.43.0, deepgram-sdk>=6.1.1, openai>=2.32.0 - Local voice: faster-whisper>=1.2.1, piper-tts>=1.4.2 - Dev: pytest>=9.0.3, pytest-asyncio>=1.3.0 - Add uv.lock for reproducible installs (102 packages) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
discli voice,discli interact) and serve-mode JSONL actions (17 new actions)What's new
New modules:
src/discli/voice_engine.py—VoiceEnginemanaging per-guild connections,AudioPlayer(queue-based playback),AudioListener(VAD via silero-vad + STT pipeline)src/discli/interact_engine.py—InteractEnginewith workflow state machine, dashboard renderer, and prefix-based interaction router (modal:/wf:/dash:)src/discli/tts.py—TTSProviderprotocol + ElevenLabs and OpenAI implementations (lazy SDK imports)src/discli/stt.py—STTProviderprotocol + Deepgram (streaming) and OpenAI Whisper (batch) implementationsNew CLI commands:
discli voice— join, leave, speak, play, stop, pause, resume, listen, statusdiscli interact— modal, workflow start, dashboard create/deleteNew serve-mode actions (17):
voice_connect,voice_disconnect,voice_move,voice_speak,voice_play,voice_stop,voice_pause,voice_resume,voice_listen_start,voice_listen_stop,voice_status,voice_set_config,workflow_start,workflow_cancel,dashboard_create,dashboard_update,dashboard_deleteNew serve-mode events:
voice_transcription,voice_playback_started/finished,voice_connected/disconnected,workflow_step_completed/finished/timeout,dashboard_interactionOptional dependency groups:
voice— PyNaCl, discord-ext-voice-recv, silero-vad, audioop-ltselevenlabs,deepgram,openai-voice,local-voice(piper-tts, faster-whisper)all-voice— convenience meta-package (voice + elevenlabs + deepgram)Security: permission profiles updated —
voicescope added,interactscope added tochatprofile,voice statusallowed inreadonly.Version: 0.7.0 → 0.8.0
Design doc:
docs/plans/2026-04-11-voice-and-interactive-design.mdImplementation plan:
docs/plans/2026-04-11-voice-and-interactive-plan.mdTest plan
discli --helplists voice and interact groupsdiscli voice --helpshows all 9 subcommandsdiscli interact --helpshows modal / workflow / dashboarddiscli voice join #general && discli voice speak "hello"(requires ELEVENLABS_API_KEY + ffmpeg)discli voice listenround-trip via Deepgram (requires DEEPGRAM_API_KEY)discord-ext-voice-recvverified against live Discord voice channel🤖 Generated with Claude Code