[whisper] word timings in verbose_json

**Is your feature request related to a problem? Please describe.**
When calling the transcription endpoint with the format `verbose_json`, localai currently only returns a list of segments with `text`, `start` & `end`. The `words` attribute is always None.

**Describe the solution you'd like**
Localai should also provide word level timestamps when requested with `timestamp_granularities=["word"]`

**Describe alternatives you've considered**

**Additional context**
I'm trying to generate subtitles for videos with hardware acceleration (Vulkan). The data returned with format as srt works, but the timestamps are inaccurate. I would like to use [stable-ts](https://github.com/jianfch/stable-ts) to improve the timestamps, but without word level timestamps the results are still suboptimal.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[whisper] word timings in verbose_json #9306

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[whisper] word timings in verbose_json #9306

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions