Skip to content

Conversation

@blisc
Copy link
Collaborator

@blisc blisc commented Dec 19, 2025

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Removes use of torchaudio.transforms and moves transforms inside of NeMo.
NOTE: we will use torchsquirm in nemo/collections/audio/metrics/squim.py and nemo/collections/tts/models/magpietts_preference_optimization.py

Collection: audio, asr, tts

Changelog

  • Move frequently used torchaudio transform into NeMo

PR Type:

  • New Feature
  • Bugfix
  • Documentation

Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: blisc <blisc@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: blisc <blisc@users.noreply.github.com>
Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: blisc <blisc@users.noreply.github.com>
Copy link
Member

@nithinraok nithinraok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update or remove torchaudio references from following files as well:

  • nemo/collections/tts/models/magpietts_preference_optimization.py
  • docker/Dockerfile.speech
  • scripts/installers/install_torchaudio_latest.sh
  • docs/source/speechlm2/intro.rst
  • nemo/collections/audio/metrics/squim.py

In following files they are being installed but not used. We can update them too.

  • tutorials/00_NeMo_Primer.ipynb
  • tutorials/01_NeMo_Models.ipynb
  • tutorials/asr/Online_Offline_Microphone_VAD_Demo.ipynb
  • tutorials/asr/Online_Offline_Speech_Commands_Demo.ipynb
  • tutorials/asr/Streaming_Multitalker_ASR.ipynb
  • tutorials/audio/speech_enhancement/BNR_Speech_enhancement_with_NeMo.ipynb
  • tutorials/audio/speech_enhancement/Speech_Enhancement_with_NeMo.ipynb
  • tutorials/speaker_tasks/ASR_with_SpeakerDiarization.ipynb
  • tutorials/speaker_tasks/End_to_End_Diarization_Inference.ipynb
  • tutorials/speaker_tasks/Speaker_Diarization_Inference.ipynb
  • tutorials/speaker_tasks/Speaker_Identification_Verification.ipynb
  • tutorials/speaker_tasks/Streaming_End_to_End_Diarization_Inference.ipynb

nb_max_freq (int) : Frequency above which all frequencies will be masked for narrowband augmentation.
Defaults to 4000
use_torchaudio: Whether to use the `torchaudio` implementation.
use_torchaudio: Whether to use the FilterbankFeatures or FilterbankFeaturesTA class
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove this option altogether?

featurizer_class = FilterbankFeatures
else:
featurizer_class = FilterbankFeaturesTA
featurizer_class = FilterbankFeaturesTA if use_torchaudio else FilterbankFeatures
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see from features.py FilterbankFeaturesTA doesn;t use torchaudio. I think we can remove this condition and default to FilterbankFeatures

# See the License for the specific language governing permissions and
# limitations under the License.

from nemo.collections.audio.models.audio_to_audio import AudioToAudioModel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why to remove these?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Causes a circular dependency if this stays in the init

@blisc
Copy link
Collaborator Author

blisc commented Jan 6, 2026

Can we update or remove torchaudio references from following files as well:

  • nemo/collections/tts/models/magpietts_preference_optimization.py
  • docker/Dockerfile.speech
  • scripts/installers/install_torchaudio_latest.sh
  • docs/source/speechlm2/intro.rst
  • nemo/collections/audio/metrics/squim.py

We do not have a replacement for SQIUM-MOS. I think we still need to keep the import check there. But we can remove it from the dockerfile and tutorials.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants