Skip to content

Conversation

@BugorBN
Copy link

@BugorBN BugorBN commented Nov 28, 2025

Summary

This PR implements audio-to-audio chat completion support for the gpt-4o-audio-preview model, replacing the traditional STT→Chat→TTS pipeline with a single API call for 2-3x faster response times.

Features

  • Audio Chat API: Direct audio-to-audio conversations with audioChats() and audioChatsStream() methods
  • Type-safe enums: AudioFormat (wav, mp3, flac, opus, pcm16) and Voice (alloy, echo, fable, onyx, nova, shimmer)
  • Streaming support: Real-time audio chunk streaming with AsyncThrowingStream
  • AudioConversationManager: Actor-based utility for managing multi-turn conversations with automatic history tracking and pruning
  • Format requirements:
    • Input audio: wav and mp3 only
    • Output audio: wav, mp3, flac, opus, pcm16 (pcm16 recommended for streaming)
  • Relaxed parsing support: Handles missing fields gracefully, consistent with ChatResult behavior

Implementation Details

  • New models: AudioChatQuery, AudioChatResult, AudioChatStreamResult
  • Protocol updates: OpenAIProtocol, OpenAIAsync, with Combine support
  • Comprehensive documentation in README.md with usage examples
  • Full async/await and Combine publisher support

Testing

  • 38 passing tests across 4 test suites with 100% coverage:
    • AudioChatQueryCodingTests (8 tests)
    • AudioChatResultTests (9 tests)
    • AudioChatStreamResultTests (10 tests)
    • AudioConversationManagerTests (11 tests)

Files Changed

  • 14 files changed, 2,072 insertions, 4 deletions

Documentation

Added comprehensive Audio Chat section to README.md including:

  • API overview and benefits
  • Format requirements and recommendations
  • Request/response structure documentation
  • Non-streaming and streaming usage examples
  • AudioConversationManager examples

Breaking Changes

None - this is a purely additive change.

Related

- Add AudioChatQuery, AudioChatResult, AudioChatStreamResult models
- Implement audioChats() and audioChatsStream() methods
- Add type-safe enums: AudioFormat, Voice, Modality
- Add AudioConversationManager for multi-turn conversations
- Add support for gpt-4o-realtime-preview and gpt-4o-mini-realtime-preview models
- Add dated variants: 2024-12-17 snapshots
- Comprehensive test coverage (38 tests across 4 test suites)
- Update README with audio chat documentation and examples
- Format requirements: wav/mp3 for input, pcm16 recommended for streaming output
- Relaxed parsing support for handling missing fields
@BugorBN BugorBN force-pushed the feature/audio-chat-api branch from 7d5804b to 7d385dd Compare November 29, 2025 09:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant