Skip to content

feat: add voice (TTS/STT, playback) and rich interactive components#20

Open
DevRohit06 wants to merge 14 commits intodevfrom
feat/voice-and-interactive
Open

feat: add voice (TTS/STT, playback) and rich interactive components#20
DevRohit06 wants to merge 14 commits intodevfrom
feat/voice-and-interactive

Conversation

@DevRohit06
Copy link
Copy Markdown
Owner

Summary

  • Adds full-duplex voice capabilities: TTS speak (ElevenLabs, OpenAI) + STT listen/transcribe (Deepgram, OpenAI Whisper) + audio playback (files/URLs)
  • Adds rich interactive components: modals, multi-step workflows with state tracking and timeouts, persistent dashboards with pagination and auto-refresh
  • Surfaces everything via both CLI commands (discli voice, discli interact) and serve-mode JSONL actions (17 new actions)

What's new

New modules:

  • src/discli/voice_engine.pyVoiceEngine managing per-guild connections, AudioPlayer (queue-based playback), AudioListener (VAD via silero-vad + STT pipeline)
  • src/discli/interact_engine.pyInteractEngine with workflow state machine, dashboard renderer, and prefix-based interaction router (modal: / wf: / dash:)
  • src/discli/tts.pyTTSProvider protocol + ElevenLabs and OpenAI implementations (lazy SDK imports)
  • src/discli/stt.pySTTProvider protocol + Deepgram (streaming) and OpenAI Whisper (batch) implementations

New CLI commands:

  • discli voice — join, leave, speak, play, stop, pause, resume, listen, status
  • discli interact — modal, workflow start, dashboard create/delete

New serve-mode actions (17): voice_connect, voice_disconnect, voice_move, voice_speak, voice_play, voice_stop, voice_pause, voice_resume, voice_listen_start, voice_listen_stop, voice_status, voice_set_config, workflow_start, workflow_cancel, dashboard_create, dashboard_update, dashboard_delete

New serve-mode events: voice_transcription, voice_playback_started/finished, voice_connected/disconnected, workflow_step_completed/finished/timeout, dashboard_interaction

Optional dependency groups:

  • voice — PyNaCl, discord-ext-voice-recv, silero-vad, audioop-lts
  • elevenlabs, deepgram, openai-voice, local-voice (piper-tts, faster-whisper)
  • all-voice — convenience meta-package (voice + elevenlabs + deepgram)

Security: permission profiles updated — voice scope added, interact scope added to chat profile, voice status allowed in readonly.

Version: 0.7.0 → 0.8.0

Design doc: docs/plans/2026-04-11-voice-and-interactive-design.md
Implementation plan: docs/plans/2026-04-11-voice-and-interactive-plan.md

Test plan

  • 31/31 unit tests passing (12 new for engines + security)
  • discli --help lists voice and interact groups
  • discli voice --help shows all 9 subcommands
  • discli interact --help shows modal / workflow / dashboard
  • Core package installs cleanly without optional voice extras
  • Manual: discli voice join #general && discli voice speak "hello" (requires ELEVENLABS_API_KEY + ffmpeg)
  • Manual: discli voice listen round-trip via Deepgram (requires DEEPGRAM_API_KEY)
  • Manual: workflow + dashboard JSON specs tested end-to-end in serve mode
  • Voice receive via discord-ext-voice-recv verified against live Discord voice channel

🤖 Generated with Claude Code

DevRohit06 and others added 14 commits April 12, 2026 00:13
…tions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ementations

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… management

Implements voice_engine.py with VoiceError, DEFAULT_CONFIG, AudioPlayer (queue-based
playback with PCMVolumeTransformer), AudioListener (VAD-segmented STT via silero-vad),
and VoiceEngine (multi-guild connection manager wrapping TTS/STT providers).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…routing

Implements InteractEngine with stateful workflow execution, multi-page
dashboard management, prefix-based custom_id routing, and a modal builder.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements interact_group with modal trigger, workflow start, dashboard create/delete subcommands using InteractEngine. Registers group in cli.py alongside existing voice_group.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…resume, listen, status)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…etc.)

Adds lazy VoiceEngine initialization and 12 voice action handlers to
serve.py: voice_connect, voice_disconnect, voice_move, voice_speak,
voice_play, voice_stop, voice_pause, voice_resume, voice_listen_start,
voice_listen_stop, voice_status, voice_set_config.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nteraction routing)

- Add interact engine lazy init (mirrors voice engine pattern)
- Intercept wf: and dash: prefixed component interactions in on_interaction, routing them to InteractEngine before generic emit
- Add 5 action handlers: workflow_start, workflow_cancel, dashboard_create, dashboard_update, dashboard_delete
- Register new actions in _actions dispatch dict

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add 'voice' scope to chat profile denial list
- Add 'interact' scope to chat profile allowed list
- Add 'voice status' to readonly profile allowed list
- Update profile descriptions to reflect new capabilities
- Add comprehensive tests for voice and interact permission scopes

All 31 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nt.md

- Updated CLAUDE.md Project Overview to mention voice audio and interactive components
- Added voice_engine.py, interact_engine.py, tts.py, stt.py to Key modules section
- Updated serve command description to reflect voice and interactive action dispatch
- Updated security.py description to include voice/interact permission scopes
- Added voice extras installation examples to Commands section
- Updated agents/discord-agent.md with voice command group (join, leave, speak, play, listen, etc.)
- Updated agents/discord-agent.md with interact command group (modal, workflow, dashboard)
- Added voice and interact actions to serve stdin commands
- Added voice and interact examples to serve stdin and stdout event sections

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Uses cached voice_states and channel.members — no voice connection required.
Added matching serve-mode actions voice_where and voice_members.
Both allowed in readonly profile since they're pure reads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Core: click>=8.3.2, discord.py>=2.7.1
- Voice: pynacl>=1.5.0, discord-ext-voice-recv>=0.5.2a179, silero-vad>=6.2.1, audioop-lts>=0.2.2
- TTS/STT: elevenlabs>=2.43.0, deepgram-sdk>=6.1.1, openai>=2.32.0
- Local voice: faster-whisper>=1.2.1, piper-tts>=1.4.2
- Dev: pytest>=9.0.3, pytest-asyncio>=1.3.0
- Add uv.lock for reproducible installs (102 packages)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant