feat(ai): Image and Audio generation APIs (DALL-E 3 + OpenAI TTS)#109
Open
bedus-creation wants to merge 2 commits into
Open
feat(ai): Image and Audio generation APIs (DALL-E 3 + OpenAI TTS)#109bedus-creation wants to merge 2 commits into
bedus-creation wants to merge 2 commits into
Conversation
Implements a Laravel-style fluent API for image generation (DALL-E 3), image editing (DALL-E 2 with attachments), and text-to-speech (OpenAI TTS). - `Image.of(prompt)` — text-to-image via DALL-E 3; `.landscape()`, `.portrait()`, `.square()`, `.quality()`, `.model()` modifiers; `.attachments([…])` switches to DALL-E 2 image editing - `Audio.of(text)` — TTS via OpenAI; `.female()` / `.male()` voice shortcuts, `.voice()`, `.speed()`, `.format()`, `.model()` modifiers - `Files.Image.fromStorage/fromPath/fromUrl` — image attachment factories - `ImageResponse` / `AudioResponse` — async `.store()`, `.storeAs()`, `.storePublicly()`, `.storePubliclyAs()` backed by Storage facade - 45 new unit tests (all mocked, no real API calls) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
bedus-creation
commented
Jun 10, 2026
Comment on lines
+164
to
+174
| def _generate_sync(self) -> AudioResponse: | ||
| client = OpenAI(api_key=self._resolve_api_key()) | ||
| response = client.audio.speech.create( | ||
| model=self._model, | ||
| voice=self._voice, | ||
| input=self._text, | ||
| speed=self._speed, | ||
| response_format=self._response_format, | ||
| ) | ||
| data = response.read() | ||
| return AudioResponse(data=data, fmt=self._response_format) |
Contributor
Author
There was a problem hiding this comment.
can we use langchain or other library so we can support multiple provider ?
bedus-creation
commented
Jun 10, 2026
Comment on lines
+9
to
+23
| class ImageAttachment: | ||
| """Represents an image file to attach to an Image editing request. | ||
|
|
||
| Instances are created via the :class:`Files.Image` factory, not directly:: | ||
|
|
||
| attachment = Files.Image.fromPath("/tmp/photo.jpg") | ||
| attachment = Files.Image.fromStorage("photo.jpg") | ||
| attachment = Files.Image.fromUrl("https://example.com/photo.jpg") | ||
| """ | ||
|
|
||
| def __init__( | ||
| self, | ||
| data: bytes, | ||
| name: str = "", | ||
| media_type: str = "image/jpeg", |
Contributor
Author
There was a problem hiding this comment.
we already have Document class in AI, can't we use that ?
bedus-creation
commented
Jun 10, 2026
Comment on lines
+190
to
+200
| def _create(self) -> ImageResponse: | ||
| """Generate a new image from a text prompt.""" | ||
| client = OpenAI(api_key=self._resolve_api_key()) | ||
| params: dict = { | ||
| "model": self._model, | ||
| "prompt": self._prompt, | ||
| "size": self._size, | ||
| "n": self._n, | ||
| "response_format": "b64_json", | ||
| } | ||
| if self._model == "dall-e-3": |
Contributor
Author
There was a problem hiding this comment.
can we use other package so we can support multiple provider ?
bedus-creation
commented
Jun 10, 2026
| assert isinstance(audio, Audio) | ||
| assert audio._text == "Hello world" | ||
|
|
||
|
|
Contributor
Author
There was a problem hiding this comment.
can we write class based test ?
… class-based tests Comment 1 — Document extended for binary image attachments: - content field now accepts str | bytes - from_path() auto-detects binary (UnicodeDecodeError fallback to rb mode) - New async from_url() downloads bytes via httpx - New async from_storage() reads binary via Storage facade (or direct path) - New to_bytes() returns binary content regardless of how it was loaded - files.py (ImageAttachment/Files) no longer exported; Document is the single type Comment 2 — Multi-provider support for Image: - New ai/image_providers.py: ImageGenerationProvider ABC, OpenAIImageProvider (AsyncOpenAI), StabilityImageProvider (stub) - Image.generate() is now truly async via provider abstraction - Provider resolved from AIConfig.image_provider (AI_IMAGE_PROVIDER env var) Comment 3 — Multi-provider support for Audio: - New ai/audio_providers.py: AudioSynthesisProvider ABC, OpenAIAudioProvider (AsyncOpenAI), ElevenLabsAudioProvider (stub) - Audio.generate() is now truly async via provider abstraction - Provider resolved from AIConfig.audio_provider (AI_AUDIO_PROVIDER env var) - AIConfig gains image_provider and audio_provider fields Comment 4 — Class-based tests: - test_image.py: TestDocumentImageAttachment, TestImageBuilder, TestImageGeneration, TestImageResult - test_audio.py: TestAudioBuilder, TestAudioGeneration, TestAudioResult All 156 AI tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Image.of(prompt).generate()— text-to-image via OpenAI DALL-E 3; returnsImageResponsewith raw PNG bytes.attachments([Files.Image.fromPath(...)])switches to DALL-E 2 image editing.landscape()(1792×1024),.portrait()(1024×1792),.square(),.quality(),.model()Audio.of(text).generate()— text-to-speech via OpenAI TTS; returnsAudioResponse.female()(nova),.male()(onyx),.voice("shimmer"),.speed(),.format(),.model()Files.Image— factory with.fromStorage(),.fromPath(),.fromUrl()for image attachment sourcesImageResponse/AudioResponseexpose async.store(),.storeAs(),.storePublicly(),.storePubliclyAs()backed by the Storage facade (with fallback to temp dir)API examples
Files changed
ai/files.pyImageAttachment,Files.Imagefactoryai/image.pyImagebuilder,ImageResponseai/audio.pyAudiobuilder,AudioResponseai/__init__.pyImage,Audio,Files,ImageAttachment,ImageResponse,AudioResponsetests/ai/test_image.pytests/ai/test_audio.pyTest plan
🤖 Generated with Claude Code