Add audio chat API support (gpt-4o-audio-preview, gpt-audio-1.5) by BugorBN · Pull Request #397 · MacPaw/OpenAI

BugorBN · 2025-11-28T22:19:18Z

Summary

This PR implements audio-to-audio chat completion support, replacing the traditional STT→Chat→TTS pipeline with a single API call for 2-3x faster response times.

Supports both gpt-4o-audio-preview / gpt-4o-mini-audio-preview (preview models) and the new gpt-audio-1.5 (recommended by OpenAI for best performance with improved instruction following, tool calling, and multilingual accuracy).

Features

Audio Chat API: Direct audio-to-audio conversations with audioChats() and audioChatsStream() methods
Type-safe enums: AudioFormat (wav, mp3, flac, opus, pcm16) and Voice (alloy, echo, fable, onyx, nova, shimmer)
Streaming support: Real-time audio chunk streaming with AsyncThrowingStream
AudioConversationManager: Actor-based utility for managing multi-turn conversations with automatic history tracking and pruning
Format requirements:
- Input audio: wav and mp3 only
- Output audio: wav, mp3, flac, opus, pcm16 (pcm16 recommended for streaming)
Relaxed parsing support: Handles missing fields gracefully, consistent with ChatResult behavior

Implementation Details

New models: AudioChatQuery, AudioChatResult, AudioChatStreamResult
Model constants: .gpt_4o_audio_preview, .gpt_4o_mini_audio_preview, .gpt_audio_1_5
Protocol updates: OpenAIProtocol, OpenAIAsync, with Combine support
Comprehensive documentation in README.md with usage examples
Full async/await and Combine publisher support

Testing

38 passing tests across 4 test suites with 100% coverage:
- AudioChatQueryCodingTests (8 tests)
- AudioChatResultTests (9 tests)
- AudioChatStreamResultTests (10 tests)
- AudioConversationManagerTests (11 tests)

Files Changed

14 files changed, 2,072 insertions, 4 deletions

Documentation

Added comprehensive Audio Chat section to README.md including:

API overview and benefits
Format requirements and recommendations
Request/response structure documentation
Non-streaming and streaming usage examples
AudioConversationManager examples

Breaking Changes

None - this is a purely additive change.

- Add AudioChatQuery, AudioChatResult, AudioChatStreamResult models - Implement audioChats() and audioChatsStream() methods - Add type-safe enums: AudioFormat, Voice, Modality - Add AudioConversationManager for multi-turn conversations - Add support for gpt-4o-realtime-preview and gpt-4o-mini-realtime-preview models - Add dated variants: 2024-12-17 snapshots - Comprehensive test coverage (38 tests across 4 test suites) - Update README with audio chat documentation and examples - Format requirements: wav/mp3 for input, pcm16 recommended for streaming output - Relaxed parsing support for handling missing fields

BugorBN force-pushed the feature/audio-chat-api branch from 7d5804b to 7d385dd Compare November 29, 2025 09:47

Add gpt-audio-1.5 model constant for Chat Completions

58b84a0

BugorBN changed the title ~~Add audio chat API support for gpt-4o-audio-preview model~~ Add audio chat API support (gpt-4o-audio-preview, gpt-audio-1.5) Apr 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add audio chat API support (gpt-4o-audio-preview, gpt-audio-1.5)#397

Add audio chat API support (gpt-4o-audio-preview, gpt-audio-1.5)#397
BugorBN wants to merge 2 commits into
MacPaw:mainfrom
BugorBN:feature/audio-chat-api

BugorBN commented Nov 28, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

BugorBN commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Features

Implementation Details

Testing

Files Changed

Documentation

Breaking Changes

Related

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

BugorBN commented Nov 28, 2025 •

edited

Loading