Opencast 3368 implement whisper transcription service#108
Opencast 3368 implement whisper transcription service#108
Conversation
…ad workflows and obs workflow
files/config/default/etc/org.opencastproject.execute.impl.ExecuteServiceImpl.cfg
Outdated
Show resolved
Hide resolved
| - source-flavors: captions/trimmed | ||
| - target-flavor: captions/vtt | ||
| - target-tags: engage-download,generator-type:auto,generator:whisper-api,lang:en,trimmed | ||
| - encoding-profile: uct-mkv-to-vtt |
There was a problem hiding this comment.
I imagined the post-edit operation would cut the source captions to an output captions file - is this how the other WOHs handle it?
There was a problem hiding this comment.
The editor doesn't actually output a VTT file here. After cutting, it returns the trimmed captions inside an MKV container with mimetype (text/s-matroska), even though they represent VTT content. As a result, the trimmed captions don't appear on playback because they are not text/vtt. And if the trimmed captions are published without this step, downloading them from Admin UI results in an .mkv file which isn't playable.
| import subprocess | ||
| import tempfile | ||
| import json | ||
| from openai import OpenAI |
There was a problem hiding this comment.
We need a requirements.txt somewhere that is the set of python dependencies for the worker nodes so we'll get dependabot alerts for dependency vulnerabilities. @TurRil
There was a problem hiding this comment.
Pull request overview
Implements a Whisper (OpenAI) transcription path for Opencast by adding a wfexec Python transcriber + wrapper, wiring it into the captioning include workflow, and updating multiple publish/process workflows to carry captions through publication and trimming.
Changes:
- Added
wrapper-whisper-api-transcribe.shandwhisper-api-transcribe.pyto run Whisper API transcription and produce VTT. - Updated captioning/transcription workflow wiring to run Whisper before publishing and to propagate captions in publish operations.
- Added encoding profiles for Whisper audio extraction and MKV→VTT caption extraction; removed older Whisper-specific workflows.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| files/worker/wfexec/wrapper-whisper-api-transcribe.sh | New bash wrapper to execute the Whisper Python transcription script. |
| files/worker/wfexec/whisper-api-transcribe.py | New Python implementation to split audio and request VTT transcription via OpenAI. |
| files/reconfig.sh | Ensures wfexec Python scripts are executable after reconfig. |
| files/deploy.sh | Ensures wfexec Python scripts are executable after deploy. |
| files/config/default/etc/workflows/uct-whisper-attach-transcripts.yaml | Removed legacy Whisper attach workflow. |
| files/config/default/etc/workflows/uct-transcript-whisper.yaml | Removed legacy Whisper submission workflow. |
| files/config/default/etc/workflows/uct-test-whisper-api-transcription.yaml | Removed legacy test workflow for Whisper API transcription. |
| files/config/default/etc/workflows/uct-studio-upload.yaml | Enables captioning and loads metadata workflow properties; includes captions in publish flavors. |
| files/config/default/etc/workflows/uct-publish-after-edit.yaml | Adjusts captioning include conditions and preserves caption flavors on cleanup. |
| files/config/default/etc/workflows/uct-process-upload.yaml | Adds transcription toggles and includes captions in publish flavors. |
| files/config/default/etc/workflows/uct-process-obs.yaml | Adds metadata property loading and optional Whisper captioning; includes captions in publish flavors. |
| files/config/default/etc/workflows/uct-process-before-edit.yaml | Loads metadata properties and includes captioning earlier in the pipeline. |
| files/config/default/etc/workflows/uct-partial-publish.yaml | Adds caption cleanup/retraction and trimming/extraction path for captions during publish. |
| files/config/default/etc/workflows/uct-include-transcription-whisper.yaml | Refactors Whisper include workflow to extract audio + run wfexec transcription wrapper. |
| files/config/default/etc/workflows/uct-include-captioning.yaml | Routes captioning between Whisper and Google; updates source flavor note/logic. |
| files/config/default/etc/workflows/uct-include-before-edit-auto-trim.yaml | Moves captioning include earlier and standardizes defaults. |
| files/config/default/etc/org.opencastproject.execute.impl.ExecuteServiceImpl.cfg | Allows Whisper wrapper and script to be executed via Execute Service. |
| files/config/default/etc/encoding/uct.properties | Adds Whisper audio FLAC profile and MKV→VTT extraction profile. |
Comments suppressed due to low confidence (1)
files/config/default/etc/workflows/uct-include-captioning.yaml:27
- The second
defaultsoperation no longer has a condition (theif: NOT(${isUpload})is commented out). That makes it run for uploads too and it will override the earlier upload defaults (e.g., it resetsuse_googleto false and changestranscription_src_flavor). Restore a mutually exclusive condition or restructure so upload/non-upload defaults can’t override each other.
- audio_trim_lapel: false
- transcription_src_flavor: delivery
- id: defaults
# if: NOT(${isUpload})
description: Applying default configuration values for captions
configurations:
- use_watson: false
- use_nibity: false
- use_whisper: false
- use_google: false
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - target-tags: +engage-download | ||
| - output-filename: captions.vtt | ||
| - expected-type: Track | ||
| - target-tags: archive,original-captions,lang:en,generator-type:auto,generator:whisper-api |
There was a problem hiding this comment.
The generated captions/source track is tagged archive,original-captions,... but not engage-download. Several updated publish operations filter by download-source-tags: engage-download,..., so these captions won’t actually be distributed unless another step adds that tag. Add engage-download here (or adjust downstream publish filters) to match the publication intent.
| - target-tags: archive,original-captions,lang:en,generator-type:auto,generator:whisper-api | |
| - target-tags: archive,original-captions,engage-download,lang:en,generator-type:auto,generator:whisper-api |
| fi | ||
|
|
||
| # ---- CALL PYTHON SCRIPT ---- | ||
| python3 "$SCRIPT" "$@" > /tmp/whisper.log 2>&1 |
There was a problem hiding this comment.
The Python script’s shebang points at /opt/opencast/wfexec/venv/bin/python, but the wrapper forces python3. If the openai dependency is only installed in the venv (likely), this will fail at runtime. Prefer executing the script directly ($SCRIPT ...) or invoking the venv interpreter explicitly.
| python3 "$SCRIPT" "$@" > /tmp/whisper.log 2>&1 | |
| "$SCRIPT" "$@" > /tmp/whisper.log 2>&1 |
| python3 "$SCRIPT" "$@" > /tmp/whisper.log 2>&1 | ||
| EXIT_CODE=$? | ||
|
|
||
| echo "Python script exited with code $EXIT_CODE" | ||
| echo "Last 20 lines of log:" | ||
| tail -n 20 /tmp/whisper.log |
There was a problem hiding this comment.
The wrapper writes all runs to a fixed /tmp/whisper.log and then tails it. Concurrent transcription jobs will clobber each other’s logs (and /tmp is often world-readable). Use a per-invocation temp file (e.g., mktemp under $TMPDIR), set restrictive permissions, and clean it up on exit.
| if len(sys.argv) != 4: | ||
| sys.exit(1) | ||
|
|
||
| input_file = sys.argv[1] | ||
| output_vtt = sys.argv[2] | ||
| # output_json = sys.argv[2] | ||
|
|
||
| print(f"Input file: {input_file}") | ||
| print(f"Output file: {output_vtt}") | ||
| # print(f"Output file: {output_json}") |
There was a problem hiding this comment.
main() requires 3 parameters (len==4) but only uses the first two; the third argument is currently ignored. Either consume it explicitly (e.g., for logging/traceability) or relax the arg-count check so callers don’t need to pass an unused value.
| if len(sys.argv) != 4: | |
| sys.exit(1) | |
| input_file = sys.argv[1] | |
| output_vtt = sys.argv[2] | |
| # output_json = sys.argv[2] | |
| print(f"Input file: {input_file}") | |
| print(f"Output file: {output_vtt}") | |
| # print(f"Output file: {output_json}") | |
| # Expect at least input and output paths; optionally accept a third argument. | |
| if len(sys.argv) < 3 or len(sys.argv) > 4: | |
| sys.exit(1) | |
| input_file = sys.argv[1] | |
| output_vtt = sys.argv[2] | |
| # Optionally capture and log a third CLI argument for traceability. | |
| extra_arg = sys.argv[3] if len(sys.argv) == 4 else None | |
| print(f"Input file: {input_file}") | |
| print(f"Output file: {output_vtt}") | |
| if extra_arg is not None: | |
| print(f"Additional argument: {extra_arg}") |
| - target-tags: archive | ||
| # Start Whisper API transcription job | ||
| - id: whisper-start-transcription | ||
| - source-flavor: presenter/source |
There was a problem hiding this comment.
This workflow hard-codes source-flavor: presenter/source for audio extraction. Other transcription workflows in this repo use */${transcription_src_flavor} (+ process-first-match-only) so they work for different track layouts and configured source flavors. Consider switching to the same pattern to avoid failures when a mediapackage doesn’t have presenter/source (or when transcription_src_flavor is set differently).
| - source-flavor: presenter/source | |
| - source-flavor: "*/${transcription_src_flavor}" | |
| - process-first-match-only: true |
| - thumbnailType: 0 | ||
| - thumbnailPosition: 1 | ||
| - doPictureInPicture: true | ||
| - doCaptioning: true |
There was a problem hiding this comment.
doCaptioning is set to true in the defaults here, but the only captioning include later is gated solely on ${use_whisper}. If the intent is to use doCaptioning as the master switch (as in other workflows), update the condition to incorporate it; otherwise consider removing doCaptioning from defaults to avoid an unused/misleading workflow property.
| - doCaptioning: true |
- Remove whisper-api-transcribe.py script from commands.allowed in executeServiceImpl.cfg - Add 30 seconds overlapping audio chunks for better transcription context
- Fix argument handling and usage message logic - Prevent premature exit in main() in whisper-api-transcribe.py - Improve error handling for missing input file - Minor code cleanup
No description provided.