Skip to content

SysAdminDoc/VideoSubtitleRemover

Repository files navigation

Video Subtitle Remover

Version License Platform

Video Subtitle Remover Pro

Version Platform License Python

Professional AI-powered tool for removing hard-coded subtitles from videos and images

Features | Installation | Usage | Configuration | CLI | Troubleshooting


Overview

Video Subtitle Remover Pro uses real AI neural networks to remove hard-coded subtitles and text watermarks from videos and images. Unlike simple blur or crop methods, it intelligently fills in removed areas with content that matches the surrounding video.

Based on YaoFANGUK/video-subtitle-remover, enhanced with a professional interface, real LaMa inpainting, multi-engine detection, and 12-language support.

Features

  • Real Video Inpainting -- Temporal Background Exposure (TBE) reconstructs the true background from neighbouring frames where the subtitle is absent. No external model weight downloads required.
  • Real AI Inpainting -- LaMa neural network for still-frame and residual refinement (via simple-lama-inpainting)
  • AUTO Inpaint Routing -- Per-batch routing between TBE and LaMa based on exposure score
  • Multi-Engine Detection -- RapidOCR (ONNX PP-OCR, 4-5x faster, leak-free) > PaddleOCR > Surya (GPL opt-in) > EasyOCR > OpenCV fallback chain (automatic)
  • Lossless Pipeline -- FFV1 lossless intermediate (only the final encode is lossy) for noticeably cleaner outputs than the legacy mp4v intermediate
  • HEVC + AV1 Output -- Pick H.264 / H.265 / AV1 from a dropdown; NVENC/QSV/AMF for HW encoding, libx265 / libsvtav1 software fallback
  • Multi-region Masks -- Draw multiple subtitle rects on a scrubbable video frame; backend honours every rect
  • Inpaint Preview -- "Preview cleanup" button runs detect + inpaint on the selected frame so you can A/B settings before committing
  • Seamless Boundaries -- Gaussian alpha feathering at every inpaint boundary, no visible cut lines
  • ~50 Language Support -- English / Chinese / Japanese / Korean / European, plus Thai, Vietnamese, Polish, Greek, Ukrainian, Filipino, Hebrew, Czech, and more
  • GPU Acceleration -- NVIDIA CUDA, AMD/Intel DirectML through ONNX Runtime, hardware-decode hints (D3D11 / VAAPI / MFX), CPU fallback
  • Subtitle Region Selector -- Scrub to any frame and draw one or more rectangles
  • Batch Processing -- Queue files or drag entire folders; per-item cancellation
  • Multi-track Audio + Loudness Normalisation -- Pass through every audio track on Bluray rips; optional per-stream EBU R128 normalisation to LUFS targets (YouTube -14, Apple -16, broadcast -23)
  • Quality Self-Test -- PSNR / SSIM report, optional FFmpeg/libvmaf VMAF score, ROI-cropped metrics for the inpaint region, and an optional side-by-side comparison PNG
  • CLI + Presets -- python -m backend.processor --pattern ... --preset "YouTube (default)"; six built-in presets + user presets persisted to %APPDATA%
  • Chyron vs Subtitle Filter -- Keep persistent text (logos, lower-thirds) and remove dialogue, or vice versa
  • Karaoke Grouping -- Per-syllable boxes fuse into a single line mask so highlighted lyrics do not leak through the gaps
  • Live Preview During Processing -- 15 FPS throttled preview piped from the backend worker
  • Pre-batch ETA Estimate -- 30-frame detect probe seeds the ETA so users see "about X left" from the very first frame
  • Crash-Resume Checkpointing -- SHA-256 input fingerprint per file; re-running a glob skips finished work
  • Premium Dark UI -- Cohesive design system with custom sliders, toggles, status chips, taskbar progress, onboarding modal
  • Settings Persistence -- All knobs saved/restored between sessions; versioned schema with backfill migration
  • CI/CD Releases -- Automated Windows builds via GitHub Actions, pip-audit scan, strict artifact/version/dependency verification, and winget submission support

System Requirements

Component Minimum Recommended
OS Windows 10 Windows 11
CPU Intel i5 / AMD Ryzen 5 Intel i7 / AMD Ryzen 7
RAM 8 GB 16+ GB
GPU Any (CPU mode) NVIDIA RTX 2060+ (RTX 50-series supported via CUDA 12.8)
VRAM - 6+ GB
Python 3.10 3.12 or 3.13 for CUDA

Installation

Quick Install

  1. Download or clone this repository
  2. Double-click Run_VSR_Pro.bat — first run automatically:
    • Creates a virtual environment
    • Detects your GPU and installs appropriate packages
    • Installs PaddleOCR, EasyOCR, and LaMa inpainting
    • Launches the application
    • Use Run_VSR_Pro_Debug.bat if you want the same bootstrap flow with a visible console for troubleshooting

After the Windows Package Manager manifest is accepted, signed release installers can also be installed with:

winget install SysAdminDoc.VideoSubtitleRemoverPro

Manual Install

cd VideoSubtitleRemover

# Create virtual environment
python -m venv venv
.\venv\Scripts\activate

# Install PyTorch (choose one -- torch 2.7+ supports Python 3.9-3.13):
# NVIDIA RTX 20/30/40-series (Turing/Ampere/Ada):
pip install torch>=2.10.0 torchvision>=0.25.0 --index-url https://download.pytorch.org/whl/cu118
# NVIDIA RTX 50-series (Blackwell -- 5070/5080/5090, needs CUDA 12.8):
pip install torch>=2.10.0 torchvision>=0.25.0 --index-url https://download.pytorch.org/whl/cu128
# CPU:
pip install torch>=2.10.0 torchvision>=0.25.0 --index-url https://download.pytorch.org/whl/cpu

# Install dependencies
pip install -r requirements.txt

# Run
python VideoSubtitleRemover.py

FFmpeg (Required for audio)

winget install ffmpeg

Validation

python -m unittest discover -s tests -v

Usage

  1. Launch via Run_VSR_Pro.bat
  2. Add files — Click to browse, press Ctrl+O, right-click for folders, or drag & drop
  3. Select algorithm — LAMA (recommended), STTN, or ProPainter
  4. Set language if subtitles are non-English
  5. Optionally set region — Click "Set Region" to draw a rectangle on the subtitle area
  6. Start Processing and monitor progress
  7. Select a queue item to preview it, use Review mask to confirm detection, and double-click the preview for a larger source frame

Algorithm Comparison

Algorithm Inpainting Engine Speed Quality Best For
STTN Temporal Background Exposure Fastest Great Live-action video with changing subtitles (default)
LAMA Neural (LaMa) Medium Best still-frame Images, animations, static backgrounds
ProPainter TBE + LaMa refinement Slowest Best motion Motion-heavy footage, thick/decorative text

All three modes now do real inpainting. STTN recovers the literal background from adjacent frames where the subtitle is absent -- this works because hard-coded subtitles are sparse in time, and the pixels behind them are revealed whenever the text changes or disappears. LAMA is a single-frame neural fill. ProPainter is a hybrid: TBE reconstructs the background, then LaMa refines any residual.

Detection Engines

The app automatically selects the best available engine:

Priority Engine Install Languages Notes
1 RapidOCR (ONNX PP-OCR) pip install rapidocr 100+ 4-5x faster than PaddleOCR, leak-free (default)
2 PaddleOCR (PP-OCRv5) pip install paddleocr>=3.0.0 106 High accuracy reference implementation
3 Surya pip install surya-ocr 90+ Layout-aware (GPL)
4 EasyOCR pip install easyocr 80+ Legacy fallback
5 OpenCV fallback Built-in Any Threshold-based

On AMD/Intel systems, setup installs onnxruntime-directml. When ONNX Runtime reports DmlExecutionProvider, RapidOCR is initialized with its DirectML provider settings; unsupported RapidOCR versions or missing providers fall back to CPU automatically. RapidOCR legacy tuple output and current structured object/dict output are both normalized to the same axis-aligned detector boxes. Opt-in ONNX inpainters inspect their model opset_import metadata before creating a DirectML session; if the default ONNX opset is newer than DirectML's supported ceiling, VSR uses the CPU provider instead of failing at session creation.

Optional model paths such as VSR_LAMA_ONNX, VSR_MIGAN_ONNX, VSR_FASTDVDNET, and VSR_TRANSNETV2 are checked against a local adapter manifest before loading. Known SHA-256 mismatches fall back instead of deserializing the file. Legacy adapters without a pinned hash still run, but new strict adapters can require a known hash unless VSR_ALLOW_UNVERIFIED_MODELS=1 is set and recorded in release evidence.

CLI Usage

Process files from the command line:

python -m backend.processor -i input.mp4 -o output.mp4 -m lama --lang en --crf 20

For OCR-empty frames with speech, the optional Whisper fallback can mask the bottom subtitle band. The default backend is faster-whisper; FFmpeg 8 builds that include the whisper filter can instead use a local whisper.cpp ggml model without Python ML dependencies:

python -m backend.processor -i input.mp4 -o output.mp4 --whisper-fallback --whisper-backend ffmpeg --ffmpeg-whisper-model C:\models\ggml-base.en.bin

Embedded subtitle tracks can be inspected or remuxed without OCR, frame decode, inpainting, or video re-encode:

python -m backend.processor -i input.mkv --soft-subtitle-dry-run
python -m backend.processor --pattern "inputs/*.mkv" --soft-subtitle-dry-run --soft-subtitle-plan-json soft-plan.json
python -m backend.processor -i input.mkv -o stripped.mkv --strip-soft-subtitles

When the input is a directory of images, --output-frames writes the cleaned frames as individual PNGs instead of encoding a video:

python -m backend.processor -i frames_dir/ -o cleaned_dir/ --output-frames

In the GUI, queued videos with embedded subtitle tracks show a track summary; right-click the item to fast strip, fast remux/keep, or continue with burned-in cleanup.

Pattern batches and GUI batches write vsr-batch-summary.json and vsr-batch-summary.md next to their outputs when they finish. The report records each input, selected output path, codec/duration/subtitle preflight data, planned action, final status, and elapsed time for skipped, checkpointed, remuxed, processed, or failed files. When quality reports are enabled, batch summaries also include a passed, review, or unknown quality gate using ROI metrics, a cheap residual-text score, and an adjacent-frame temporal flicker score, plus any quality-sheet preview path for review-needed outputs. A failed gate changes the batch row status to review-needed; skipped and remux-only rows are marked not_applicable.

Flag Description Default
-i, --input Input file path Required
-o, --output Output file path Required
--pattern Glob pattern for batch (e.g. inputs/*.mp4) -
--out-dir Output directory for batch mode -
--config JSON config overlay -
--preset NAME Apply a built-in or user preset by name -
--list-presets List every preset and exit -
-m, --mode Algorithm (sttn/lama/propainter/auto) sttn
--codec Output codec (h264/h265/av1) h264
-g, --gpu GPU device ID (-1 for CPU) 0
-l, --lang Detection language en
--crf Output quality (15-35, lower=better) 23
--skip-detection Use manual region only Off
--fast LAMA fast mode Off
--no-audio Strip audio Off
--single-audio Mux only first audio stream Off
--loudnorm <LUFS> EBU R128 loudness target (0 disables) 0
--frame-skip N Reuse mask for N frames (0=every frame) 0
--mask-dilate N Expand masks by N pixels 8
--no-hw-encode Force software encoding Off
--decode-accel HW decode hint (off/auto/d3d11/vaapi/mfx) off
--keep-chyrons Leave persistent text (logos / lower-thirds) Off
--keep-subtitles Leave dialogue subtitles Off
--karaoke-grouping Fuse per-syllable boxes on the same line Off
--whisper-fallback Use Whisper timing to mask OCR-empty speech frames Off
--whisper-backend Whisper backend (faster-whisper or ffmpeg) faster-whisper
--whisper-model faster-whisper model size tiny
--ffmpeg-whisper-model Local whisper.cpp ggml model for FFmpeg Whisper -
--ffmpeg-whisper-queue FFmpeg whisper queue size in seconds 3.0
--soft-subtitle-dry-run Print embedded subtitle tracks and planned action without loading OCR Off
--soft-subtitle-plan-json Write soft-subtitle dry-run preflight details as JSON -
--strip-soft-subtitles Stream-copy remux that removes embedded subtitle tracks Off
--keep-soft-subtitles Stream-copy remux that keeps embedded subtitle tracks Off
--burned-in-only Ignore embedded tracks and run visual cleanup normally Off
--quality-report Compute PSNR/SSIM and VMAF when libvmaf is available Off
--quality-sheet Side-by-side comparison PNG Off
--validate-config Print resolved config and exit Off
--skip-existing Skip files whose output already exists Off
--no-prefetch Disable worker-thread frame prefetcher Off
--output-frames Write cleaned frames as individual PNGs instead of a video Off
--json-log PATH Append a structured JSON-line log -

Configuration

Settings are stored in %APPDATA%\VideoSubtitleRemoverPro\settings.json and persist across sessions.

Advanced Settings

Setting Description Default Range
Neighbor Stride STTN temporal window 10 5-30
Reference Length STTN reference frames 10 5-30
Max Load Frames Batch size 30 10-100
CRF Quality Output quality (lower=better) 23 15-35
Output Codec H.264 / H.265 / AV1 h264 h264/h265/av1
Frame Skip Reuse detection mask for N frames 0 0-10
Mask Dilate Expand detected regions (px) 8 0-20
Mask Feather Soft alpha-blend at boundary (px) 4 0-15
TBE Coverage Min frames a pixel must be unmasked to trust its exposure 3 1-10
HW Encoding Use NVENC/QSV/AMF if available On On/Off
HW Decode Hint cv2 HW-accel hint with software fallback off off/auto/d3d11/vaapi/mfx
Loudness Target EBU R128 LUFS target (0 = off) 0 0 or -70..-5
Multi-track Audio Pass through every audio stream On On/Off
Quality Sheet Side-by-side PNG next to output Off On/Off

Troubleshooting

RTX 50-series (Blackwell): "no kernel image is available" or CPU-only

RTX 50-series cards (5070 / 5080 / 5090, compute capability sm_120) need CUDA 12.8 wheels, i.e. PyTorch 2.7 or newer from the cu128 index. The older cu118 / cu121 builds contain no Blackwell kernels and will either raise no kernel image is available for execution on the device or silently fall back to CPU.

Run_VSR_Pro.bat / setup.py now auto-detect 50-series cards and install the cu128 build. To fix an existing environment manually:

.\venv\Scripts\activate
pip uninstall -y torch torchvision
pip install torch>=2.10.0 torchvision>=0.25.0 --index-url https://download.pytorch.org/whl/cu128

torch 2.7+ supports Python 3.9-3.13, so a recent Python is fine. If PaddleOCR fails to load on Blackwell, detection automatically falls back to RapidOCR (ONNX Runtime), which is GPU-generation agnostic.

Python 3.14 installs but NVIDIA CUDA is unavailable

PyTorch does not publish Windows CUDA wheels for Python 3.14 yet. If you run setup with Python 3.14 and an NVIDIA GPU, setup stops before silently installing a CPU-only torch build and recommends Python 3.12 or 3.13 for GPU acceleration.

CPU-only use is still possible. Set VSR_ALLOW_PY314_CPU=1 before running setup if you explicitly accept slower CPU inference.

Colors shift / look washed out (TV vs full color range)

The upstream project re-encodes the output without carrying the source's color signalling, so a limited / TV-range (BT.601/709) clip can come back looking washed out or with shifted colors. This fork preserves the source's color_primaries, color_transfer, color_space, and color_range tags onto the final encode (preserve_color_metadata, on by default; CLI --no-color-preserve to disable). Decoding is handled by OpenCV's FFmpeg backend, which applies the correct YUV->RGB conversion for the signalled range, and the same tags are re-applied on write so players interpret the result the same way as the source.

Note: the internal pixel pipeline is still 8-bit BGR, so true 10-bit HDR sources are tone-mapped to SDR (the output is tagged correctly but not 10-bit). For standard SDR limited-range content, colors are preserved. If you still see a mismatch, attach the ffprobe color fields of your source to a bug report.

CUDA out of memory
  • Reduce Max Load Frames in Advanced Settings
  • Switch to LAMA mode (lower VRAM)
  • Use CPU mode as fallback
No audio in output
  • Install FFmpeg: winget install ffmpeg
  • Ensure "Preserve original audio" is checked
Poor detection accuracy
  • Try changing the detection language to match your subtitles
  • Use "Set Region" to manually define the subtitle area
  • Install PaddleOCR for best detection accuracy
Application won't start
  • Ensure Python 3.10+ is installed; use Python 3.12 or 3.13 for NVIDIA CUDA
  • Delete venv folder and re-run setup
  • Try Run_VSR_Pro_Debug.bat to keep the console open during startup
  • Check the log file: %APPDATA%\VideoSubtitleRemoverPro\vsr_pro.log
  • If the log warns that OpenCV bundles libpng older than 1.6.54, avoid opening untrusted PNG files until opencv-python ships a fixed wheel

Log Files

  • GUI log panel (collapsible, click "Open Log File" for full log)
  • File log: %APPDATA%\VideoSubtitleRemoverPro\vsr_pro.log (5MB rotating)

Project Structure

VideoSubtitleRemover/
|-- VideoSubtitleRemover.py   # Main GUI application
|-- backend/
|   |-- __init__.py           # Module exports
|   |-- processor.py          # Legacy import/CLI compatibility shim
|   |-- detection.py          # OCR cascade and detector routing
|   |-- tracking.py           # Kalman, pHash, karaoke helpers
|   |-- io.py                 # Capture, ffprobe, intermediate writers
|   |-- cli.py                # Command-line entry point
|   |-- inpainters/           # Built-in STTN/LaMa/ProPainter/AUTO paths
|   |-- presets.py            # Shared preset library (GUI + CLI)
|   |-- adapter_manifest.py   # Optional model provenance and hash policy
|   `-- model_hashes.py       # Vendored SHA-256 weight hashes
|-- docs/
|   |-- architecture.md       # Pipeline map for new contributors
|   |-- edge_case_corpus.md   # Community regression-corpus guide
|   `-- archive/              # Retired audits and completed checklists
|-- ROADMAP.md                # Active incomplete work
|-- RESEARCH.md               # Current research synthesis
|-- setup.py                  # First-time environment setup
|-- Run_VSR_Pro.bat           # Windows launcher
|-- Run_VSR_Pro_Debug.bat     # Windows launcher with a visible console
|-- build_exe.bat             # PyInstaller build script
|-- requirements.txt          # Python dependencies
|-- tests/                    # Focused regression coverage for hardened paths
|-- .github/workflows/
|   `-- build.yml             # CI/CD release workflow + pip-audit
|-- assets/                   # Application assets
|-- models/                   # AI model weights (auto-downloaded)
`-- output/                   # Default output location

See docs/architecture.md for a walkthrough of the detect -> tracker -> mask -> TBE -> refine -> mux pipeline and the "add a new feature" checklist.

Planning entry points: ROADMAP.md for active incomplete work and RESEARCH.md for current research synthesis. Retired audits and completed checklists live under docs/archive/.

Credits

License

This project is licensed under the MIT License.


Video Subtitle Remover Pro -- Built by SysAdminDoc

Report Bug | Request Feature

About

AI-powered Python GUI for removing hard-coded subtitles and text watermarks from videos using STTN, LAMA, and ProPainter inpainting with GPU acceleration.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages