Skip to content

Opencast 3368 implement whisper transcription service#108

Open
npmali wants to merge 8 commits intor/19.xfrom
OPENCAST-3368-Implement_Whisper_Transcription_Service
Open

Opencast 3368 implement whisper transcription service#108
npmali wants to merge 8 commits intor/19.xfrom
OPENCAST-3368-Implement_Whisper_Transcription_Service

Conversation

@npmali
Copy link
Copy Markdown
Contributor

@npmali npmali commented Mar 26, 2026

No description provided.

@npmali npmali requested review from TurRil and smarquard March 26, 2026 15:15
- source-flavors: captions/trimmed
- target-flavor: captions/vtt
- target-tags: engage-download,generator-type:auto,generator:whisper-api,lang:en,trimmed
- encoding-profile: uct-mkv-to-vtt
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagined the post-edit operation would cut the source captions to an output captions file - is this how the other WOHs handle it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The editor doesn't actually output a VTT file here. After cutting, it returns the trimmed captions inside an MKV container with mimetype (text/s-matroska), even though they represent VTT content. As a result, the trimmed captions don't appear on playback because they are not text/vtt. And if the trimmed captions are published without this step, downloading them from Admin UI results in an .mkv file which isn't playable.

import subprocess
import tempfile
import json
from openai import OpenAI
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a requirements.txt somewhere that is the set of python dependencies for the worker nodes so we'll get dependabot alerts for dependency vulnerabilities. @TurRil

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements a Whisper (OpenAI) transcription path for Opencast by adding a wfexec Python transcriber + wrapper, wiring it into the captioning include workflow, and updating multiple publish/process workflows to carry captions through publication and trimming.

Changes:

  • Added wrapper-whisper-api-transcribe.sh and whisper-api-transcribe.py to run Whisper API transcription and produce VTT.
  • Updated captioning/transcription workflow wiring to run Whisper before publishing and to propagate captions in publish operations.
  • Added encoding profiles for Whisper audio extraction and MKV→VTT caption extraction; removed older Whisper-specific workflows.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
files/worker/wfexec/wrapper-whisper-api-transcribe.sh New bash wrapper to execute the Whisper Python transcription script.
files/worker/wfexec/whisper-api-transcribe.py New Python implementation to split audio and request VTT transcription via OpenAI.
files/reconfig.sh Ensures wfexec Python scripts are executable after reconfig.
files/deploy.sh Ensures wfexec Python scripts are executable after deploy.
files/config/default/etc/workflows/uct-whisper-attach-transcripts.yaml Removed legacy Whisper attach workflow.
files/config/default/etc/workflows/uct-transcript-whisper.yaml Removed legacy Whisper submission workflow.
files/config/default/etc/workflows/uct-test-whisper-api-transcription.yaml Removed legacy test workflow for Whisper API transcription.
files/config/default/etc/workflows/uct-studio-upload.yaml Enables captioning and loads metadata workflow properties; includes captions in publish flavors.
files/config/default/etc/workflows/uct-publish-after-edit.yaml Adjusts captioning include conditions and preserves caption flavors on cleanup.
files/config/default/etc/workflows/uct-process-upload.yaml Adds transcription toggles and includes captions in publish flavors.
files/config/default/etc/workflows/uct-process-obs.yaml Adds metadata property loading and optional Whisper captioning; includes captions in publish flavors.
files/config/default/etc/workflows/uct-process-before-edit.yaml Loads metadata properties and includes captioning earlier in the pipeline.
files/config/default/etc/workflows/uct-partial-publish.yaml Adds caption cleanup/retraction and trimming/extraction path for captions during publish.
files/config/default/etc/workflows/uct-include-transcription-whisper.yaml Refactors Whisper include workflow to extract audio + run wfexec transcription wrapper.
files/config/default/etc/workflows/uct-include-captioning.yaml Routes captioning between Whisper and Google; updates source flavor note/logic.
files/config/default/etc/workflows/uct-include-before-edit-auto-trim.yaml Moves captioning include earlier and standardizes defaults.
files/config/default/etc/org.opencastproject.execute.impl.ExecuteServiceImpl.cfg Allows Whisper wrapper and script to be executed via Execute Service.
files/config/default/etc/encoding/uct.properties Adds Whisper audio FLAC profile and MKV→VTT extraction profile.
Comments suppressed due to low confidence (1)

files/config/default/etc/workflows/uct-include-captioning.yaml:27

  • The second defaults operation no longer has a condition (the if: NOT(${isUpload}) is commented out). That makes it run for uploads too and it will override the earlier upload defaults (e.g., it resets use_google to false and changes transcription_src_flavor). Restore a mutually exclusive condition or restructure so upload/non-upload defaults can’t override each other.
      - audio_trim_lapel: false
      - transcription_src_flavor: delivery
  - id: defaults
    # if: NOT(${isUpload})
    description: Applying default configuration values for captions
    configurations:
      - use_watson: false
      - use_nibity: false
      - use_whisper: false
      - use_google: false

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- target-tags: +engage-download
- output-filename: captions.vtt
- expected-type: Track
- target-tags: archive,original-captions,lang:en,generator-type:auto,generator:whisper-api
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generated captions/source track is tagged archive,original-captions,... but not engage-download. Several updated publish operations filter by download-source-tags: engage-download,..., so these captions won’t actually be distributed unless another step adds that tag. Add engage-download here (or adjust downstream publish filters) to match the publication intent.

Suggested change
- target-tags: archive,original-captions,lang:en,generator-type:auto,generator:whisper-api
- target-tags: archive,original-captions,engage-download,lang:en,generator-type:auto,generator:whisper-api

Copilot uses AI. Check for mistakes.
fi

# ---- CALL PYTHON SCRIPT ----
python3 "$SCRIPT" "$@" > /tmp/whisper.log 2>&1
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Python script’s shebang points at /opt/opencast/wfexec/venv/bin/python, but the wrapper forces python3. If the openai dependency is only installed in the venv (likely), this will fail at runtime. Prefer executing the script directly ($SCRIPT ...) or invoking the venv interpreter explicitly.

Suggested change
python3 "$SCRIPT" "$@" > /tmp/whisper.log 2>&1
"$SCRIPT" "$@" > /tmp/whisper.log 2>&1

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +24
python3 "$SCRIPT" "$@" > /tmp/whisper.log 2>&1
EXIT_CODE=$?

echo "Python script exited with code $EXIT_CODE"
echo "Last 20 lines of log:"
tail -n 20 /tmp/whisper.log
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wrapper writes all runs to a fixed /tmp/whisper.log and then tails it. Concurrent transcription jobs will clobber each other’s logs (and /tmp is often world-readable). Use a per-invocation temp file (e.g., mktemp under $TMPDIR), set restrictive permissions, and clean it up on exit.

Copilot uses AI. Check for mistakes.
Comment on lines +91 to +100
if len(sys.argv) != 4:
sys.exit(1)

input_file = sys.argv[1]
output_vtt = sys.argv[2]
# output_json = sys.argv[2]

print(f"Input file: {input_file}")
print(f"Output file: {output_vtt}")
# print(f"Output file: {output_json}")
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main() requires 3 parameters (len==4) but only uses the first two; the third argument is currently ignored. Either consume it explicitly (e.g., for logging/traceability) or relax the arg-count check so callers don’t need to pass an unused value.

Suggested change
if len(sys.argv) != 4:
sys.exit(1)
input_file = sys.argv[1]
output_vtt = sys.argv[2]
# output_json = sys.argv[2]
print(f"Input file: {input_file}")
print(f"Output file: {output_vtt}")
# print(f"Output file: {output_json}")
# Expect at least input and output paths; optionally accept a third argument.
if len(sys.argv) < 3 or len(sys.argv) > 4:
sys.exit(1)
input_file = sys.argv[1]
output_vtt = sys.argv[2]
# Optionally capture and log a third CLI argument for traceability.
extra_arg = sys.argv[3] if len(sys.argv) == 4 else None
print(f"Input file: {input_file}")
print(f"Output file: {output_vtt}")
if extra_arg is not None:
print(f"Additional argument: {extra_arg}")

Copilot uses AI. Check for mistakes.
- target-tags: archive
# Start Whisper API transcription job
- id: whisper-start-transcription
- source-flavor: presenter/source
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow hard-codes source-flavor: presenter/source for audio extraction. Other transcription workflows in this repo use */${transcription_src_flavor} (+ process-first-match-only) so they work for different track layouts and configured source flavors. Consider switching to the same pattern to avoid failures when a mediapackage doesn’t have presenter/source (or when transcription_src_flavor is set differently).

Suggested change
- source-flavor: presenter/source
- source-flavor: "*/${transcription_src_flavor}"
- process-first-match-only: true

Copilot uses AI. Check for mistakes.
- thumbnailType: 0
- thumbnailPosition: 1
- doPictureInPicture: true
- doCaptioning: true
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doCaptioning is set to true in the defaults here, but the only captioning include later is gated solely on ${use_whisper}. If the intent is to use doCaptioning as the master switch (as in other workflows), update the condition to incorporate it; otherwise consider removing doCaptioning from defaults to avoid an unused/misleading workflow property.

Suggested change
- doCaptioning: true

Copilot uses AI. Check for mistakes.
npmali added 2 commits March 31, 2026 11:54
- Remove whisper-api-transcribe.py script from commands.allowed in executeServiceImpl.cfg
- Add 30 seconds overlapping audio chunks for better transcription context
- Fix argument handling and usage message logic
- Prevent premature exit in main() in whisper-api-transcribe.py
- Improve error handling for missing input file
- Minor code cleanup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants