diff --git a/content/posts/2026-02-01-generating-ai-audio/agentic_edit_diagram.png b/content/posts/2026-02-01-generating-ai-audio/agentic_edit_diagram.png new file mode 100644 index 0000000..992f97e Binary files /dev/null and b/content/posts/2026-02-01-generating-ai-audio/agentic_edit_diagram.png differ diff --git a/content/posts/2026-02-01-generating-ai-audio/audio_podcast.mp3 b/content/posts/2026-02-01-generating-ai-audio/audio_podcast.mp3 new file mode 100644 index 0000000..a409843 Binary files /dev/null and b/content/posts/2026-02-01-generating-ai-audio/audio_podcast.mp3 differ diff --git a/content/posts/2026-02-01-generating-ai-audio/audry.png b/content/posts/2026-02-01-generating-ai-audio/audry.png new file mode 100644 index 0000000..e9e0331 Binary files /dev/null and b/content/posts/2026-02-01-generating-ai-audio/audry.png differ diff --git a/content/posts/2026-02-01-generating-ai-audio/index.md b/content/posts/2026-02-01-generating-ai-audio/index.md new file mode 100644 index 0000000..26f85d2 --- /dev/null +++ b/content/posts/2026-02-01-generating-ai-audio/index.md @@ -0,0 +1,50 @@ +--- +author: "Gethin James" +title: "Generating AI Audio" +description: "Exploring the use of Generative AI to create accessible and engaging audio content from long-form documents" +draft: false +date: 2026-01-01 +tags: ["AI", "Generative AI", "Audio", "Accessibility"] +categories: ["AI"] +ShowToc: false +TocOpen: false +--- +[You may prefer to listen to the audio version of this blog post.](audio_podcast.mp3) + +At the DVLA Emerging Technology Lab, we wondered whether Generative AI could be used to make long-form documents more accessible and engaging. + +Many individuals find reading extensive written documents challenging. Recent technological advances have enabled the generation of "audio overviews" and "podcasts" from text content. Our idea was to explore how far this technology might assist neurodiverse individuals, or those for whom English is not a first language. + +As a government agency, we are committed to ensuring that content is handled securely and remains accessible only to authorised staff within the agency. To achieve this, we created a Microsoft Teams Bot called "Audry". Audry allows a user to upload a document and automatically transform it into a podcast or news briefing. We wanted to produce audio that features authentic regional UK accents. + +{{
}} + +## Agentic Review +Many advances in Generative AI have originated in the United States, resulting in some technology displaying a US bias. For example, generated transcripts occasionally contained American expressions that are unsuitable for a UK audience, such as "DMV" instead of "DVLA". To address this, we adopted an agentic approach to reviewing the transcript. Using [LangGraph](https://www.langchain.com/langgraph), we created three personas to review the transcript, and a fourth expert to edit it based on the feedback. + +{{
}} + +- British Expert: Assessed grammar and verified the use of appropriate British cultural references. +- Content Reviewer: Moderated content for compliance with UK government standards. +- Expressive Delivery Advisor (think Drama teacher): Suggested emotional and non-verbal sound cues. (For example, adding pauses or varying tone.) +- Editor: Incorporated feedback from the previous three experts and rewrote the transcript accordingly. + + +## Challenges +- There has been a significant improvement in voice quality due to recent advances. However, accessing the latest multi-speaker voice models remains difficult. These models are often still in preview stages and provide limited support for British English. +- Achieving consistent voice generation is challenging. Submitting the same parameters to a large language model (LLM) does not always produce identical results. While this makes generative AI powerful, it also impedes reliable and repeatable voice outputs. We experimented with dividing large transcripts (5 mins+) into smaller requests. However, combining these segments often resulted in noticeable changes in the voices during conversations. +- Regional accents can be influenced through specific prompts, for example requesting a Welsh or Scottish accent. In our experience, this approach was not consistently reliable. Further work is needed to create uniform regional accents. + + +## Technology +Here are some of the technologies we used: +- Microsoft Teams AI and Bot Framework +- Azure Document Intelligence, Cosmos DB, Speech Service, App Service +- Google Gemini 2.5-flash and Text-to-Speech (TTS) models +- Eleven Labs Text-to-Speech API +- LangGraph for agentic review + +## Conclusions +This technology is still emerging, and producing consistently accurate regional British audio content remains a challenge. However, the technology may already be sufficiently usable. [This podcast was generated by uploading this blog post through our system](audio_podcast.mp3). You can decide for yourself if we succeeded. + +Following Government Service Standards, the [code for Audry is open source](https://github.com/dvla/audry). \ No newline at end of file