Skip to content

Diffio-AI/VFC-Audio-Restoration-Benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Audio Restoration Benchmark by Diffio AI

Self-contained benchmark for speech restoration on archival Voices for Christ audio.

Benchmark metadata lives in benchmark_manifest.json. Current versions:

  • benchmark: 0.3.0
  • dataset: 0.1.0

Overview

  • task: restore each archival clip to improve intelligibility and perceptual quality
  • dataset: 100 clips, 30 seconds each, 50 minutes total
  • layout: benchmark_data/original plus one benchmark_data/<mode> directory per system
  • current public baselines:
    • adobe_podcast: Adobe Podcast
    • diffio_3_5: Diffio.ai 3.5

Current working assumption: the source Voices for Christ material is pre-1990 and open-domain or public-domain compatible, but that has not been independently verified.

Metrics

  • SCOREQ: primary no-reference quality score, higher is better
  • WER: proxy intelligibility score against frozen transcripts, lower is better
  • DNSMOS P.835: no-reference perceptual quality score, higher is better

Removed from the default benchmark:

  • speaker preservation
  • NISQA
  • SRMR

Run

Audio should be stored with Git LFS.

git lfs install
git lfs pull
python score_benchmark.py
python score_benchmark.py --metrics wer
python score_benchmark.py --metrics scoreq dnsmos
python score_benchmark.py --list-metrics
python plot_results.py

score_benchmark.py runs the full benchmark by default. Use --metrics to run only the stages you want.

Full benchmark outputs:

  • scores.csv
  • scores_metadata.json
  • reference_transcripts.csv
  • reference_transcripts_metadata.json
  • transcripts.csv
  • wer.csv
  • wer_metadata.json
  • dnsmos.csv
  • dnsmos_metadata.json

plot_results.py reads the benchmark CSVs and writes images into ./plots/.

Submission

Submit a pull request that adds a new benchmark_data/<mode> directory.

Each submission should:

  • keep the exact same filenames as benchmark_data/original
  • include one output file per original file
  • avoid truncation, silence padding, or file count mismatches
  • describe the method, version, and inference settings in the pull request

Recommended naming:

  • benchmark_data/<system_name>_<system_version>
  • example: benchmark_data/diffio_3_5

Plots

Leaderboard Summary

Leaderboard summary

  • compares mean SCOREQ, mean WER, and mean DNSMOS OVR
  • WER omits original because the original clips are the frozen reference source and score zero by construction
  • good for a fast headline comparison between systems like Adobe Podcast and Diffio.ai 3.5

Metric Distributions

Metric distributions

  • shows per-file spread for SCOREQ, WER, and DNSMOS OVR
  • WER omits original for the same reason as above
  • useful for checking consistency, not just averages

Relative Improvement Heatmap

Relative improvement heatmap

  • shows mean improvement relative to the degraded original clips
  • positive values are better in every column
  • useful for seeing where Diffio.ai 3.5 and Adobe Podcast differ

Methods

WER

There is no human transcript ground truth, so WER is a proxy metric.

  • freeze one strong ASR decode of benchmark_data/original into reference_transcripts.csv
  • score restored outputs with a weaker ASR decode against those frozen transcripts

This makes the benchmark more sensitive to restoration gains while keeping the target stable across reruns.

DNSMOS

DNSMOS P.835 is run locally through TorchMetrics.

  • audio is resampled to 16 kHz mono
  • reported outputs are p808_mos, sig, bak, and ovr
  • no external scoring API is used

Notes

  • score_benchmark.py is the normal entrypoint
  • audio/layout validation requires every non-empty mode directory to match the original filenames exactly
  • the benchmark currently assumes a CUDA-capable environment for the default scoring path

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages