Skip to content

Conversation

@scotts
Copy link
Contributor

@scotts scotts commented Nov 18, 2025

I've thought this was strange for a long time now - on main, in the public VideoDecoder and AudioDecoder, we add a stream before getting the metadata. This was not the originally intended order, as evidenced by some of the error checking we do:

if stream_index is None:
if (stream_index := container_metadata.best_video_stream_index) is None:
raise ValueError(
"The best video stream is unknown and there is no specified stream. "
+ ERROR_REPORTING_INSTRUCTIONS
)

We should never hit that error condition, as before we call it, we add the stream. And if the video file has no best video stream, the C++ layer would have thrown before we ever had a chance to reach this condition. I feel that it's more natural to do things in the order in this PR: first get the metadata from the file, then add the stream if the metadata is valid.

The reason why I'm doing this now is that this should simplify the decoder-native transforms. We'll want to know a video stream's height and width when pre-processing the transforms before adding a stream. And that means getting that metadata before adding a stream. In the C++ layer, this does mean accessing values in the headers in initializeDecoder() through AVCodecParameters that we didn't before.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 18, 2025
@scotts scotts marked this pull request as ready for review November 18, 2025 03:00
Comment on lines 527 to 530
// This metadata was already set in initializeDecoder() from the
// AVCodecParameters that are part of the AVStream. But we consider the
// AVCodecContext to be more authoritative, so we use that for our decoding
// stream.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, the AVCodecContext fields were set to those of the AVCodecParameters when we called avcodec_parameters_to_context just above in addStream:

int retVal = avcodec_parameters_to_context(
streamInfo.codecContext.get(), streamInfo.stream->codecpar);

I think it's best to remove the lines below and trust that avcodec_parameters_to_context is doing what we expect it to do. Right now, we are setting the streamMetadata in a lot of different places and it makes it harder to reason about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good call! Yup, I'm happy to remove more code. :)


int getNumChannels(const UniqueAVFrame& avFrame);
int getNumChannels(const SharedAVCodecContext& avCodecContext);
int getNumChannels(const AVCodecParameters* codecpar);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may not need the one above anymore, I'm not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're using it in CPUDeviceInterface:

int srcNumChannels = getNumChannels(codecContext_);

And:
audioStreamOptions_.numChannels.value_or(getNumChannels(codecContext_));

Maybe we could instead go to the metadata, but I think it's better to get it through the codec context there. I think it's possible for the codec context to have more accurate info here while decoding. At least I believe that is the case for video dimensions, as we can have variable resolution streams. I'm unsure if that's also true for the number of channels for audio. If you can say definitely that you know it's okay to just use the header-based metadata in both of these places, I can make the change. But if you're unsure, let's handle that later - I can create an issue for follow-up.

@NicolasHug
Copy link
Contributor

oh, I approved but I wonder if the docs failure is real 🤔

 File "/__w/_temp/conda_environment_19470203429/lib/python3.10/site-packages/torchcodec/decoders/_video_decoder.py", line 423, in _get_and_validate_stream_metadata
    raise ValueError(
ValueError: The minimum pts value in seconds is unknown. 
This should never happen. Please report an issue following the steps in

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants