Skip to content

Fix OpenAI-compatible STT for "Speech to text selected lines"#11291

Open
dkakaie wants to merge 2 commits into
SubtitleEdit:mainfrom
dkakaie:main
Open

Fix OpenAI-compatible STT for "Speech to text selected lines"#11291
dkakaie wants to merge 2 commits into
SubtitleEdit:mainfrom
dkakaie:main

Conversation

@dkakaie
Copy link
Copy Markdown
Contributor

@dkakaie dkakaie commented May 31, 2026

Two fixes to the OpenAI-compatible STT engine when transcribing selected lines:

  • Empty transcription text: the result was matched back to the audio clip by the transcoded temp file name instead of the original clip, so it never attached and the selected line came back empty. Now matched by _videoFileName, like the whisper engines.
  • Multi-segment results not split: a selected line whose audio the engine split into several segments was only split into multiple subtitle lines when the line was longer than 10 s; otherwise the segments were concatenated into one line. Removed the 10 s gate so any multi-segment result splits into one line per segment.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes OpenAI-compatible speech-to-text behavior when transcribing selected subtitle lines, ensuring transcriptions attach to the correct extracted clip and multi-segment results become multiple subtitle lines.

Changes:

  • Matches OpenAI-compatible STT results back to the original clip filename instead of the transcoded temporary upload file.
  • Splits selected-line transcription output whenever multiple paragraphs/segments are returned, rather than only for lines longer than 10 seconds.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/ui/Features/Video/SpeechToText/SpeechToTextViewModel.cs Updates OpenAI-compatible transcription result assignment to use _videoFileName, aligning with other STT paths.
src/ui/Features/Main/MainViewModel.cs Removes the duration gate so multi-paragraph transcription results replace the selected line with multiple timed lines.

@niksedk
Copy link
Copy Markdown
Member

niksedk commented Jun 1, 2026

Thanks for the PR! Just a quick heads up regarding the logic here:

The condition selectedLine.Duration.TotalSeconds > 10 is used to switch the logic from single line mode to one-selection-to-many-lines mode. Your changes might break this distinction I think...

@dkakaie
Copy link
Copy Markdown
Contributor Author

dkakaie commented Jun 1, 2026

Thanks for the reply. What user scenario was the Duration.TotalSeconds > 10 check originally intended to handle? Was it mainly to avoid creating many short subtitle segments from a single selection?

My thinking was that modern ASR models are generally quite good at segmentation and timestamping. When a transcription contains multiple timestamped segments, concatenating them back into a single subtitle line—especially for longer clips—can discard some of that structure.

If the 10-second threshold was added as a UX choice rather than due to limitations of the transcription engine, perhaps we could consider relying on the model's segmentation directly and remove the single line mode branching. Another option would be to make the threshold configurable, although that may be unnecessary if the segmentation quality is consistently good.

I'm interested in the original rationale and would be curious to hear your thoughts on whether the model's segmentation should drive subtitle splitting here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants