Skip to content

Commit c840dad

Browse files
committed
feature #861 [Demo] Converting audio demo to speech-to-text-to-speech (chr-hertel)
This PR was merged into the main branch. Discussion ---------- [Demo] Converting audio demo to speech-to-text-to-speech | Q | A | ------------- | --- | Bug fix? | no | New feature? | yes | Docs? | no | Issues | | License | MIT Converting the audio bot demo to speech-to-text-to-speech with subagent for RAG on Symfony Blog. <img width="1479" height="1002" alt="image" src="https://github.com/user-attachments/assets/47cd9bdf-9038-416c-8ecb-16cff25fadc7" /> [Last Response](https://github.com/user-attachments/files/23511827/download.mp3) Commits ------- 52c13d0 Converting audio demo to speech-to-text-to-speech
2 parents f7cc77c + 52c13d0 commit c840dad

File tree

3 files changed

+33
-7
lines changed

3 files changed

+33
-7
lines changed

demo/config/packages/ai.yaml

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,12 +39,19 @@ ai:
3939
audio:
4040
platform: 'ai.platform.openai'
4141
model: 'gpt-4o-mini?temperature=1.0'
42-
prompt: 'You are a friendly chatbot that likes to have a conversation with users and asks them some questions.'
42+
prompt: |
43+
You are a friendly, positive and energetic voice assistant. You can engage in light discussions, ask
44+
questions, and do general small-talk. If asked about the Symfony Framework or their community events,
45+
you delegate to your subagent "symfony_blog" and use their answer for answering user's questions.
46+
If you don't know the answer, say so. Keep in mind that you are in a spoken conversation, so keep your
47+
answers concise and to the point. They will be read out loud to the user.
4348
tools:
4449
# Agent in agent 🤯
4550
- agent: 'blog'
4651
name: 'symfony_blog'
47-
description: 'Can answer questions based on the Symfony blog.'
52+
description: |
53+
Subagent, that can answer questions about latest news around the Symfony Framework, like latest
54+
features, events or community news.
4855
orchestrator:
4956
platform: 'ai.platform.openai'
5057
model: 'gpt-4o-mini'

demo/src/Audio/Chat.php

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
namespace App\Audio;
1313

1414
use Symfony\AI\Agent\AgentInterface;
15+
use Symfony\AI\Platform\Bridge\OpenAi\TextToSpeech\Voice;
1516
use Symfony\AI\Platform\Message\Content\Audio;
1617
use Symfony\AI\Platform\Message\Message;
1718
use Symfony\AI\Platform\Message\MessageBag;
@@ -58,7 +59,14 @@ public function submitMessage(string $message): void
5859

5960
\assert($result instanceof TextResult);
6061

61-
$messages->add(Message::ofAssistant($result->getContent()));
62+
$assistantMessage = Message::ofAssistant($result->getContent());
63+
$messages->add($assistantMessage);
64+
65+
$result = $this->platform->invoke('tts-1', $result->getContent(), [
66+
'voice' => Voice::CORAL,
67+
'instructions' => 'Speak in a cheerful and positive tone.',
68+
]);
69+
$assistantMessage->getMetadata()->add('audio', $result->asDataUri('audio/mpeg'));
6270

6371
$this->saveMessages($messages);
6472
}

demo/templates/components/audio.html.twig

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
{% import "_message.html.twig" as message %}
1+
{% import "_message.html.twig" as msg %}
22

33
<div class="card mx-auto shadow-lg" {{ attributes.defaults(stimulus_controller('audio')) }}>
44
<div class="card-header p-2">
@@ -8,7 +8,18 @@
88
</div>
99
<div id="chat-body" class="card-body p-4 overflow-auto">
1010
{% for message in this.messages %}
11-
{% include '_message.html.twig' with { message, latest: loop.last } %}
11+
{% if 'user' == message.role.value %}
12+
{% include '_message.html.twig' with { message, latest: loop.last } %}
13+
{% else %}
14+
<div class="d-flex align-items-baseline mb-4">
15+
<div class="bot avatar rounded-3 shadow-sm">
16+
{{ ux_icon('fluent:bot-24-filled', { height: '45px', width: '45px' }) }}
17+
</div>
18+
<div class="ps-2">
19+
<audio class="pt-3" controls {{ loop.last ? 'autoplay' }} src="{{ message.metadata.get('audio') }}"></audio>
20+
</div>
21+
</div>
22+
{% endif %}
1223
{% else %}
1324
<div id="welcome" class="text-center mt-5 py-5 bg-white rounded-5 shadow-sm w-75 mx-auto">
1425
{{ ux_icon('iconoir:microphone-solid', { height: '200px', width: '200px' }) }}
@@ -17,8 +28,8 @@
1728
</div>
1829
{% endfor %}
1930
<div id="loading-message" class="d-none">
20-
{{ message.user([{text:'Converting your speech to text ...'}], true) }}
21-
{{ message.bot('The Bot is looking for an answer ...', true) }}
31+
{{ msg.user([{text:'Converting your speech to text ...'}], true) }}
32+
{{ msg.bot('The Bot is looking for an answer ...', true) }}
2233
</div>
2334
</div>
2435
<div class="card-footer p-2 text-center">

0 commit comments

Comments
 (0)