Opencast 3368 implement whisper transcription service by npmali · Pull Request #108 · cilt-uct/oc-scripts

npmali · 2026-03-26T15:15:39Z

No description provided.

…rkflow

…ion_Service

…ad workflows and obs workflow

…ion_Service

files/config/default/etc/org.opencastproject.execute.impl.ExecuteServiceImpl.cfg

smarquard · 2026-03-26T16:06:47Z

files/config/default/etc/workflows/uct-partial-publish.yaml

+      - source-flavors: captions/trimmed
+      - target-flavor: captions/vtt
+      - target-tags: engage-download,generator-type:auto,generator:whisper-api,lang:en,trimmed
+      - encoding-profile: uct-mkv-to-vtt


I imagined the post-edit operation would cut the source captions to an output captions file - is this how the other WOHs handle it?

The editor doesn't actually output a VTT file here. After cutting, it returns the trimmed captions inside an MKV container with mimetype (text/s-matroska), even though they represent VTT content. As a result, the trimmed captions don't appear on playback because they are not text/vtt. And if the trimmed captions are published without this step, downloading them from Admin UI results in an .mkv file which isn't playable.

files/config/default/etc/workflows/uct-partial-publish.yaml

files/config/default/etc/workflows/uct-studio-upload.yaml

smarquard · 2026-03-26T16:11:24Z

files/worker/wfexec/whisper-api-transcribe.py

+import subprocess
+import tempfile
+import json
+from openai import OpenAI


We need a requirements.txt somewhere that is the set of python dependencies for the worker nodes so we'll get dependabot alerts for dependency vulnerabilities. @TurRil

files/worker/wfexec/whisper-api-transcribe.py

Copilot

Pull request overview

Implements a Whisper (OpenAI) transcription path for Opencast by adding a wfexec Python transcriber + wrapper, wiring it into the captioning include workflow, and updating multiple publish/process workflows to carry captions through publication and trimming.

Changes:

Added wrapper-whisper-api-transcribe.sh and whisper-api-transcribe.py to run Whisper API transcription and produce VTT.
Updated captioning/transcription workflow wiring to run Whisper before publishing and to propagate captions in publish operations.
Added encoding profiles for Whisper audio extraction and MKV→VTT caption extraction; removed older Whisper-specific workflows.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 13 comments.

Show a summary per file

File	Description
files/worker/wfexec/wrapper-whisper-api-transcribe.sh	New bash wrapper to execute the Whisper Python transcription script.
files/worker/wfexec/whisper-api-transcribe.py	New Python implementation to split audio and request VTT transcription via OpenAI.
files/reconfig.sh	Ensures wfexec Python scripts are executable after reconfig.
files/deploy.sh	Ensures wfexec Python scripts are executable after deploy.
files/config/default/etc/workflows/uct-whisper-attach-transcripts.yaml	Removed legacy Whisper attach workflow.
files/config/default/etc/workflows/uct-transcript-whisper.yaml	Removed legacy Whisper submission workflow.
files/config/default/etc/workflows/uct-test-whisper-api-transcription.yaml	Removed legacy test workflow for Whisper API transcription.
files/config/default/etc/workflows/uct-studio-upload.yaml	Enables captioning and loads metadata workflow properties; includes captions in publish flavors.
files/config/default/etc/workflows/uct-publish-after-edit.yaml	Adjusts captioning include conditions and preserves caption flavors on cleanup.
files/config/default/etc/workflows/uct-process-upload.yaml	Adds transcription toggles and includes captions in publish flavors.
files/config/default/etc/workflows/uct-process-obs.yaml	Adds metadata property loading and optional Whisper captioning; includes captions in publish flavors.
files/config/default/etc/workflows/uct-process-before-edit.yaml	Loads metadata properties and includes captioning earlier in the pipeline.
files/config/default/etc/workflows/uct-partial-publish.yaml	Adds caption cleanup/retraction and trimming/extraction path for captions during publish.
files/config/default/etc/workflows/uct-include-transcription-whisper.yaml	Refactors Whisper include workflow to extract audio + run wfexec transcription wrapper.
files/config/default/etc/workflows/uct-include-captioning.yaml	Routes captioning between Whisper and Google; updates source flavor note/logic.
files/config/default/etc/workflows/uct-include-before-edit-auto-trim.yaml	Moves captioning include earlier and standardizes defaults.
files/config/default/etc/org.opencastproject.execute.impl.ExecuteServiceImpl.cfg	Allows Whisper wrapper and script to be executed via Execute Service.
files/config/default/etc/encoding/uct.properties	Adds Whisper audio FLAC profile and MKV→VTT extraction profile.

Comments suppressed due to low confidence (1)

files/config/default/etc/workflows/uct-include-captioning.yaml:27

The second defaults operation no longer has a condition (the if: NOT(${isUpload}) is commented out). That makes it run for uploads too and it will override the earlier upload defaults (e.g., it resets use_google to false and changes transcription_src_flavor). Restore a mutually exclusive condition or restructure so upload/non-upload defaults can’t override each other.

      - audio_trim_lapel: false
      - transcription_src_flavor: delivery
  - id: defaults
    # if: NOT(${isUpload})
    description: Applying default configuration values for captions
    configurations:
      - use_watson: false
      - use_nibity: false
      - use_whisper: false
      - use_google: false

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

files/worker/wfexec/wrapper-whisper-api-transcribe.sh

files/worker/wfexec/whisper-api-transcribe.py

Copilot · 2026-03-26T19:29:57Z

files/config/default/etc/workflows/uct-include-transcription-whisper.yaml

-      - target-tags: +engage-download
+      - output-filename: captions.vtt
+      - expected-type: Track
+      - target-tags: archive,original-captions,lang:en,generator-type:auto,generator:whisper-api


The generated captions/source track is tagged archive,original-captions,... but not engage-download. Several updated publish operations filter by download-source-tags: engage-download,..., so these captions won’t actually be distributed unless another step adds that tag. Add engage-download here (or adjust downstream publish filters) to match the publication intent.

Suggested change

- target-tags: archive,original-captions,lang:en,generator-type:auto,generator:whisper-api

- target-tags: archive,original-captions,engage-download,lang:en,generator-type:auto,generator:whisper-api

Copilot · 2026-03-26T19:29:58Z

files/worker/wfexec/wrapper-whisper-api-transcribe.sh

+fi
+
+# ---- CALL PYTHON SCRIPT ----
+python3 "$SCRIPT" "$@" > /tmp/whisper.log 2>&1


The Python script’s shebang points at /opt/opencast/wfexec/venv/bin/python, but the wrapper forces python3. If the openai dependency is only installed in the venv (likely), this will fail at runtime. Prefer executing the script directly ($SCRIPT ...) or invoking the venv interpreter explicitly.

Suggested change

python3 "$SCRIPT" "$@" > /tmp/whisper.log 2>&1

"$SCRIPT" "$@" > /tmp/whisper.log 2>&1

Copilot · 2026-03-26T19:29:59Z

files/worker/wfexec/wrapper-whisper-api-transcribe.sh

+python3 "$SCRIPT" "$@" > /tmp/whisper.log 2>&1
+EXIT_CODE=$?
+
+echo "Python script exited with code $EXIT_CODE"
+echo "Last 20 lines of log:"
+tail -n 20 /tmp/whisper.log


The wrapper writes all runs to a fixed /tmp/whisper.log and then tails it. Concurrent transcription jobs will clobber each other’s logs (and /tmp is often world-readable). Use a per-invocation temp file (e.g., mktemp under $TMPDIR), set restrictive permissions, and clean it up on exit.

Copilot · 2026-03-26T19:29:59Z

files/worker/wfexec/whisper-api-transcribe.py

+    if len(sys.argv) != 4:
+        sys.exit(1)
+
+    input_file = sys.argv[1]
+    output_vtt = sys.argv[2]
+    # output_json = sys.argv[2]
+
+    print(f"Input file: {input_file}")
+    print(f"Output file: {output_vtt}")
+    # print(f"Output file: {output_json}")


main() requires 3 parameters (len==4) but only uses the first two; the third argument is currently ignored. Either consume it explicitly (e.g., for logging/traceability) or relax the arg-count check so callers don’t need to pass an unused value.

Suggested change

if len(sys.argv) != 4:

sys.exit(1)

input_file = sys.argv[1]

output_vtt = sys.argv[2]

# output_json = sys.argv[2]

print(f"Input file: {input_file}")

print(f"Output file: {output_vtt}")

# print(f"Output file: {output_json}")

# Expect at least input and output paths; optionally accept a third argument.

if len(sys.argv) < 3 or len(sys.argv) > 4:

sys.exit(1)

input_file = sys.argv[1]

output_vtt = sys.argv[2]

# Optionally capture and log a third CLI argument for traceability.

extra_arg = sys.argv[3] if len(sys.argv) == 4 else None

print(f"Input file: {input_file}")

print(f"Output file: {output_vtt}")

if extra_arg is not None:

print(f"Additional argument: {extra_arg}")

Copilot · 2026-03-26T19:29:59Z

files/config/default/etc/workflows/uct-include-transcription-whisper.yaml

-      - target-tags: archive
-  # Start Whisper API transcription job
-  - id: whisper-start-transcription
+      - source-flavor: presenter/source


This workflow hard-codes source-flavor: presenter/source for audio extraction. Other transcription workflows in this repo use */${transcription_src_flavor} (+ process-first-match-only) so they work for different track layouts and configured source flavors. Consider switching to the same pattern to avoid failures when a mediapackage doesn’t have presenter/source (or when transcription_src_flavor is set differently).

Suggested change

- source-flavor: presenter/source

- source-flavor: "*/${transcription_src_flavor}"

- process-first-match-only: true

Copilot · 2026-03-26T19:29:59Z

files/config/default/etc/workflows/uct-process-obs.yaml

      - thumbnailType: 0
      - thumbnailPosition: 1
      - doPictureInPicture: true
+      - doCaptioning: true


doCaptioning is set to true in the defaults here, but the only captioning include later is gated solely on ${use_whisper}. If the intent is to use doCaptioning as the master switch (as in other workflows), update the condition to incorporate it; otherwise consider removing doCaptioning from defaults to avoid an unused/misleading workflow property.

Suggested change

- doCaptioning: true

- Remove whisper-api-transcribe.py script from commands.allowed in executeServiceImpl.cfg - Add 30 seconds overlapping audio chunks for better transcription context

- Fix argument handling and usage message logic - Prevent premature exit in main() in whisper-api-transcribe.py - Improve error handling for missing input file - Minor code cleanup

npmali added 5 commits March 23, 2026 06:59

OPENCAST-3368: Enable Whisper transcription in process-before-edit wo…

8988865

…rkflow

Merge branch 'r/19.x' into OPENCAST-3368-Implement_Whisper_Transcript…

7aa328a

…ion_Service

Merge branch 'r/19.x' into OPENCAST-3368-Implement_Whisper_Transcript…

042e002

…ion_Service

OPENCAST-3368: Include whisper transription in studio and manual uplo…

fa087c4

…ad workflows and obs workflow

OPENCAST-3368: Removed unused workflows

738337a

npmali requested review from TurRil and smarquard March 26, 2026 15:15

Merge branch 'r/19.x' into OPENCAST-3368-Implement_Whisper_Transcript…

f60abdb

…ion_Service

smarquard reviewed Mar 26, 2026

View reviewed changes

files/config/default/etc/org.opencastproject.execute.impl.ExecuteServiceImpl.cfg Outdated Show resolved Hide resolved

smarquard reviewed Mar 26, 2026

View reviewed changes

files/config/default/etc/workflows/uct-partial-publish.yaml Show resolved Hide resolved

smarquard reviewed Mar 26, 2026

View reviewed changes

files/config/default/etc/workflows/uct-studio-upload.yaml Outdated Show resolved Hide resolved

smarquard reviewed Mar 26, 2026

View reviewed changes

files/worker/wfexec/whisper-api-transcribe.py Outdated Show resolved Hide resolved

TurRil requested a review from Copilot March 26, 2026 19:22

Copilot started reviewing on behalf of TurRil March 26, 2026 19:22 View session

Copilot AI reviewed Mar 26, 2026

View reviewed changes

smarquard assigned npmali Mar 27, 2026

npmali added 2 commits March 31, 2026 11:54

OPENCAST-3368:Workflow fixes and cleanup based on PR review

fc3cb9a

- Remove whisper-api-transcribe.py script from commands.allowed in executeServiceImpl.cfg - Add 30 seconds overlapping audio chunks for better transcription context

OPENCAST-3368:Address some Copilot review comments

fc17cb4

- Fix argument handling and usage message logic - Prevent premature exit in main() in whisper-api-transcribe.py - Improve error handling for missing input file - Minor code cleanup

	- target-tags: archive,original-captions,lang:en,generator-type:auto,generator:whisper-api
	- target-tags: archive,original-captions,engage-download,lang:en,generator-type:auto,generator:whisper-api

	python3 "$SCRIPT" "$@" > /tmp/whisper.log 2>&1
	"$SCRIPT" "$@" > /tmp/whisper.log 2>&1

	- source-flavor: presenter/source
	- source-flavor: "*/${transcription_src_flavor}"
	- process-first-match-only: true

Conversation

npmali commented Mar 26, 2026

Uh oh!

Uh oh!

smarquard Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

npmali Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

smarquard Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants