A production-ready voice bot testing and interaction framework with streaming Speech-to-Text, Text-to-Speech, and AI-powered evaluation
Vokal is a comprehensive TypeScript framework for building, testing, and evaluating voice-based applications. It provides a provider-agnostic architecture for Speech-to-Text, Text-to-Speech, and AI-powered evaluation services. Currently supports Google Cloud providers (Speech-to-Text, Text-to-Speech via Neurolink SDK, and Gemini AI), with an extensible design that allows for additional provider integrations.
Perfect for:
- π€ Testing voice bots and conversational AI
- π IVR (Interactive Voice Response) system validation
- π― Voice UI/UX testing and evaluation
- π Speech synthesis and recognition workflows
- π§ͺ Automated voice conversation testing
- π€ Text-to-Speech (TTS) - High-quality neural speech synthesis via Neurolink SDK
- π§ Streaming Speech-to-Text - Real-time audio transcription with voice activity detection
- π£οΈ Voice Interaction Pipeline - Complete TTS β Listen β STT conversation flows
- π΅ Background Audio Mixing - Realistic test environments (office, cafe, rain, nature, phone, crowd)
- π€ AI-Powered Evaluation - Semantic response validation using Google Gemini
- π Comprehensive Test Suites - JSON-based test configuration with detailed reporting
- π Automatic Retries - Built-in retry logic with exponential backoff
- π Performance Metrics - Pass rates, confidence scores, and detailed analytics
- π Full TypeScript Support - Complete type safety with strict mode
- π‘οΈ Security First - Input validation, sanitization, and secure credential handling
- π§ Easy Configuration - JSON-based configuration with sensible defaults
- π¦ Modular Architecture - Use individual services or the complete framework
- π₯οΈ Powerful CLI - Command-line interface for all operations
- Installation
- Quick Start
- Core Services
- CLI Usage
- Configuration
- Architecture
- Background Sounds
- Contributing
node -v # Should be 18.x or higher
npm -v # Should be 8.x or highernpm install @juspay/vokalOr clone and build from source:
git clone https://github.com/juspay/vokal.git
cd vokal
npm install
npm run buildCreate a .env file in your project root:
# Option 1: Service Account (Recommended - Full Features)
GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
# Option 2: API Key (Limited Features)
GOOGLE_AI_API_KEY=your_api_key_here
GEMINI_API_KEY=your_gemini_api_key_hereπ‘ Tip: Service account authentication provides access to advanced features like configurable VAD timeouts and enhanced STT capabilities.
import { createVoiceTest } from '@juspay/vokal';
const voiceTest = createVoiceTest();
// Generate and save speech
const audioPath = await voiceTest.generateSpeech({
text: "Welcome to Vokal! Your voice testing framework.",
languageCode: 'en-US',
voiceName: 'en-US-Neural2-F'
});
console.log('Audio saved to:', audioPath);import { VoiceInteractionService } from '@juspay/vokal';
const voiceBot = new VoiceInteractionService();
// Run complete voice interaction
const result = await voiceBot.runVoiceInteraction(
"What is your name?",
{
language: 'en-US',
voice: 'en-US-Neural2-D',
backgroundSound: 'office',
backgroundVolume: 0.15
}
);
console.log('User said:', result.transcript);
console.log('Confidence:', result.confidence);import { VoiceBotTestService } from '@juspay/vokal';
// Run test suite from configuration
const testService = VoiceBotTestService.create('./test-config.json');
const results = await testService.runTestSuite();
console.log(`β
Pass Rate: ${results.summary.passRate}%`);
console.log(`π Average Score: ${results.summary.averageScore}`);
console.log(`π Results: ${results.summary.resultsFile}`);| Service | Description | Use Case |
|---|---|---|
| VoiceTestService | Text-to-Speech with background audio via Neurolink | Generate test audio with realistic environments |
| VoiceInteractionService | Complete TTS + Listen + STT pipeline | Full conversation simulation |
| VoiceBotTestService | Automated test suite execution | Test multiple scenarios with AI evaluation |
| AIComparisonService | AI-powered response evaluation | Semantic answer validation using Gemini |
| AudioMixerService | Background audio mixing | Add realistic noise to test scenarios |
| AudioRecordingService | Microphone recording via naudiodon | Capture user responses |
| STTHandlerManager | Provider-agnostic STT management | Unified interface for multiple STT providers |
Vokal includes a comprehensive command-line interface:
# Basic TTS generation
vokal voice generate "Hello, world!" --voice en-US-Neural2-F --lang en-US
# With background audio
vokal voice generate "Welcome" --voice en-US-Neural2-D --lang en-US --bg cafe --bgvol 0.2 --play
# Advanced settings
vokal voice generate "Fast speech" --voice en-US-Neural2-A --rate 1.5 --pitch 5.0 --output speech.mp3# List all voices
vokal voices
# Filter by language
vokal voices en-US
# JSON output
vokal voices en-IN --format json# List available background sounds
vokal backgrounds# Test system audio capability
vokal test-audio
# Play an audio file
vokal play ./output.wav# Create sample configuration
vokal test --save-sample
# Run test suite
vokal test ./config.json
# Run with specific provider and debug mode
vokal test --provider google-ai --debug --verbose# Display comprehensive usage examples
vokal exampleRun vokal --help for complete CLI documentation.
Create a JSON file to define your test scenarios:
{
"metadata": {
"name": "My Voice Bot Tests",
"version": "1.0.0",
"description": "Voice bot test suite"
},
"settings": {
"defaultLanguage": "en-US",
"defaultVoice": "en-US-Neural2-D",
"recordingDuration": 10000,
"passingScore": 0.7,
"sttProvider": "google-ai",
"ttsProvider": "google-ai",
"aiProvider": "google-ai",
"vadSettings": {
"silenceThreshold": 0.02,
"silenceDuration": 2000,
"speechTimeout": 10000
}
},
"questions": [
{
"id": "greeting",
"question": "Hello! How can I help you?",
"intent": "User greets and asks for help",
"expectedElements": ["Greeting", "Request for assistance"],
"sampleResponse": "Hi, I need help with my account"
}
]
}See the examples/sample-config.json for a complete example.
Vokal is built with a provider-agnostic architecture using the Handler pattern for extensibility.
Google Cloud (Default)
- TTS: Google Cloud Text-to-Speech via Neurolink SDK
- STT: Google Cloud Speech-to-Text via
GoogleAISTTHandler - AI Evaluation: Google Gemini via
AIComparisonService
vokal/
βββ src/
β βββ services/ # Core voice services
β β βββ voice-test.ts # TTS service with Neurolink
β β βββ voice-interaction.ts # Complete pipeline orchestration
β β βββ voice-bot-test.ts # Test suite execution
β β βββ ai-comparison.ts # AI-powered evaluation
β β βββ audio-mixer.ts # Background audio processing
β β βββ audio-recording.ts # Microphone capture
β βββ providers/ # Provider implementations
β β βββ google-ai-stt.handler.ts # Google Cloud STT
β β βββ stt-handler-manager.ts # Provider manager
β β βββ stt-registry.ts # Provider registry
β βββ types/ # TypeScript type definitions
β βββ utils/ # Utilities (logging, retry, validation, security)
β βββ constants/ # Audio configuration constants
β βββ errors/ # Custom error classes
β βββ cli/ # Command-line interface
βββ examples/ # Example configurations
β βββ sample-config.json # Test suite example
β βββ basic-example.js # Basic usage template
β βββ stt-handler-example.ts # STT provider example
βββ assets/ # Background audio files
β βββ office-ambience.wav
β βββ cafe-ambience.wav
β βββ nature-sounds.wav
β βββ rain-light.wav
β βββ phone-static.wav
β βββ crowd-distant.wav
βββ memory-bank/ # AI assistant context
βββ docs/ # Documentation (coming soon)
// Handler pattern for provider abstraction
interface STTHandler {
startStreaming(config, onResult, onSpeechStart, onSpeechEnd, onError);
stopStreaming();
}
// Register providers
STTHandlerManager.registerHandler('google-ai', GoogleAISTTHandler);
// Get provider instance
const handler = STTHandlerManager.getHandler('google-ai');Available background sound presets for realistic test environments:
| Sound | Description | Recommended Volume | Use Case |
|---|---|---|---|
| office | Office ambience with typing and quiet chatter | 0.15 | Business applications, productivity bots |
| cafe | Coffee shop atmosphere with ambient noise | 0.20 | Customer service, casual conversations |
| nature | Outdoor setting with birds and gentle wind | 0.18 | Wellness apps, meditation guides |
| rain | Gentle rainfall ambience | 0.12 | Calming applications, sleep aids |
| phone | Phone line static and connection noise | 0.08 | IVR testing, call center simulations |
| crowd | Distant crowd noise and murmurs | 0.10 | Public space simulations, event apps |
All audio files are located in the assets/ directory as WAV files.
Vokal follows security best practices:
- β
Input validation and sanitization via
validation.ts - β Secure credential handling (no hardcoded secrets)
- β
Command injection prevention in
secure-exec.ts - β Safe file path handling with path resolution
- β API key validation
- β Spawn-based command execution (no shell injection)
# Build the project
npm run build
# Run linting
npm run lint
# Format code
npm run format
# Type checking
npm run typecheck| Script | Description |
|---|---|
npm run build |
Build TypeScript to JavaScript (dist/) |
npm run dev |
Build in watch mode |
npm run clean |
Clean build directory |
npm run lint |
Lint code with ESLint |
npm run format |
Format code with Prettier |
npm run typecheck |
Run TypeScript type checking |
npm run prebuild |
Format and lint before build |
Contributions are welcome! Please read our Contributing Guide for details.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes using Conventional Commits
- Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
See CODE_OF_CONDUCT.md for community guidelines.
This project is licensed under the MIT License - see the LICENSE file for details.
- @juspay/neurolink - TTS generation and audio synthesis
- Google Cloud Speech-to-Text - Streaming STT with voice activity detection
- Google Gemini - AI-powered semantic evaluation
- naudiodon - Native audio I/O for Node.js
- π Documentation: Coming soon in
/docs - π¬ Issues: GitHub Issues
- π§ Email: opensource@juspay.in
Made with β€οΈ by the Breeze Team