Skip to content

Conversation

@ppadjinTT
Copy link

What does this PR do?

This PR avoids the use of conditional indexing in the SpeechT5 RelativePositionalEncoding forward pass. As explained in
#42087 this type of advanced indexing can result in zero-shaped tensors which can be problematic for custom AI accelerators / torch backends.

This PR provides a simple and equivalent alternative. Instead of using:

        pos_seq[pos_seq < -self.max_length] = -self.max_length
        pos_seq[pos_seq >= self.max_length] = self.max_length - 1

These torch operations can be used:

        pos_seq = torch.where(pos_seq < -self.max_length, -self.max_length, pos_seq)
        pos_seq = torch.where(pos_seq >= self.max_length, self.max_length - 1, pos_seq)

Fixes #42087

Quick test of equivalence

This simple script can be used to verify that indeed these two expressions are same:

import torch

def method1(pos_seq, max_length):
    pos_seq = torch.where(pos_seq < -max_length, -max_length, pos_seq)
    pos_seq = torch.where(pos_seq >= max_length, max_length - 1, pos_seq)
    return pos_seq

def method2(pos_seq, max_length):
    pos_seq[pos_seq < -max_length] = -max_length
    pos_seq[pos_seq >= max_length] = max_length - 1
    return pos_seq

seq_len = 512
max_length = 160
device = 'cpu'
pos_seq = torch.arange(0, seq_len).to(device=device, dtype=torch.long)
pos_seq = pos_seq[:, None] - pos_seq[None, :]

print(torch.allclose(method1(pos_seq, max_length), method2(pos_seq, max_length)))
print(torch.max(torch.abs(method1(pos_seq, max_length) - method2(pos_seq, max_length))))

Output of the script above:

True
tensor(0)

Who can review?

@eustlb @ebezzam

@github-actions
Copy link
Contributor

github-actions bot commented Nov 7, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: speecht5

Copy link
Contributor

@eustlb eustlb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! thanks for the PR 🤗

@eustlb eustlb enabled auto-merge (squash) November 7, 2025 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SpeechT5 RelativePositionalEncoding can create empty tensors

3 participants