Removes use of torchaudio and moves transforms inside of NeMo #15211

blisc · 2025-12-19T20:17:43Z

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Removes use of torchaudio.transforms and moves transforms inside of NeMo.
NOTE: we will use torchsquirm in nemo/collections/audio/metrics/squim.py and nemo/collections/tts/models/magpietts_preference_optimization.py

Collection: audio, asr, tts

Changelog

Move frequently used torchaudio transform into NeMo

PR Type:

New Feature
Bugfix
Documentation

Signed-off-by: Jason <jasoli@nvidia.com>

Signed-off-by: blisc <blisc@users.noreply.github.com>

Signed-off-by: Jason <jasoli@nvidia.com>

Signed-off-by: blisc <blisc@users.noreply.github.com>

Signed-off-by: Jason <jasoli@nvidia.com>

Signed-off-by: blisc <blisc@users.noreply.github.com>

nithinraok

Can we update or remove torchaudio references from following files as well:

nemo/collections/tts/models/magpietts_preference_optimization.py
docker/Dockerfile.speech
scripts/installers/install_torchaudio_latest.sh
docs/source/speechlm2/intro.rst
nemo/collections/audio/metrics/squim.py

In following files they are being installed but not used. We can update them too.

tutorials/00_NeMo_Primer.ipynb
tutorials/01_NeMo_Models.ipynb
tutorials/asr/Online_Offline_Microphone_VAD_Demo.ipynb
tutorials/asr/Online_Offline_Speech_Commands_Demo.ipynb
tutorials/asr/Streaming_Multitalker_ASR.ipynb
tutorials/audio/speech_enhancement/BNR_Speech_enhancement_with_NeMo.ipynb
tutorials/audio/speech_enhancement/Speech_Enhancement_with_NeMo.ipynb
tutorials/speaker_tasks/ASR_with_SpeakerDiarization.ipynb
tutorials/speaker_tasks/End_to_End_Diarization_Inference.ipynb
tutorials/speaker_tasks/Speaker_Diarization_Inference.ipynb
tutorials/speaker_tasks/Speaker_Identification_Verification.ipynb
tutorials/speaker_tasks/Streaming_End_to_End_Diarization_Inference.ipynb

nithinraok · 2026-01-03T18:45:06Z

nemo/collections/asr/modules/audio_preprocessing.py

        nb_max_freq (int) : Frequency above which all frequencies will be masked for narrowband augmentation.
            Defaults to 4000
-        use_torchaudio: Whether to use the `torchaudio` implementation.
+        use_torchaudio: Whether to use the FilterbankFeatures or FilterbankFeaturesTA class


can we remove this option altogether?

nithinraok · 2026-01-03T18:47:02Z

nemo/collections/asr/modules/audio_preprocessing.py

-            featurizer_class = FilterbankFeatures
-        else:
-            featurizer_class = FilterbankFeaturesTA
+        featurizer_class = FilterbankFeaturesTA if use_torchaudio else FilterbankFeatures


I see from features.py FilterbankFeaturesTA doesn;t use torchaudio. I think we can remove this condition and default to FilterbankFeatures

nithinraok · 2026-01-03T18:49:13Z

nemo/collections/audio/models/__init__.py

 # See the License for the specific language governing permissions and
 # limitations under the License.
-
-from nemo.collections.audio.models.audio_to_audio import AudioToAudioModel


why to remove these?

Causes a circular dependency if this stays in the init

blisc · 2026-01-06T16:22:03Z

Can we update or remove torchaudio references from following files as well:

nemo/collections/tts/models/magpietts_preference_optimization.py

docker/Dockerfile.speech

scripts/installers/install_torchaudio_latest.sh

docs/source/speechlm2/intro.rst

nemo/collections/audio/metrics/squim.py

We do not have a replacement for SQIUM-MOS. I think we still need to keep the import check there. But we can remove it from the dockerfile and tutorials.

remove use of torchaudio.transforms; SQUIM todo

9a46c09

Signed-off-by: Jason <jasoli@nvidia.com>

blisc requested a review from pzelasko December 19, 2025 20:17

github-actions bot added TTS ASR audio labels Dec 19, 2025

blisc requested a review from nithinraok December 19, 2025 20:17

blisc added the Run CICD label Dec 19, 2025

Apply isort and black reformatting

f84393d

Signed-off-by: blisc <blisc@users.noreply.github.com>