-
Notifications
You must be signed in to change notification settings - Fork 70
Refactor order of getting metadata and adding a stream #1060
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| // This metadata was already set in initializeDecoder() from the | ||
| // AVCodecParameters that are part of the AVStream. But we consider the | ||
| // AVCodecContext to be more authoritative, so we use that for our decoding | ||
| // stream. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand, the AVCodecContext fields were set to those of the AVCodecParameters when we called avcodec_parameters_to_context just above in addStream:
torchcodec/src/torchcodec/_core/SingleStreamDecoder.cpp
Lines 462 to 463 in 22bcf4d
| int retVal = avcodec_parameters_to_context( | |
| streamInfo.codecContext.get(), streamInfo.stream->codecpar); |
I think it's best to remove the lines below and trust that avcodec_parameters_to_context is doing what we expect it to do. Right now, we are setting the streamMetadata in a lot of different places and it makes it harder to reason about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good call! Yup, I'm happy to remove more code. :)
|
|
||
| int getNumChannels(const UniqueAVFrame& avFrame); | ||
| int getNumChannels(const SharedAVCodecContext& avCodecContext); | ||
| int getNumChannels(const AVCodecParameters* codecpar); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may not need the one above anymore, I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're using it in CPUDeviceInterface:
| int srcNumChannels = getNumChannels(codecContext_); |
And:
| audioStreamOptions_.numChannels.value_or(getNumChannels(codecContext_)); |
Maybe we could instead go to the metadata, but I think it's better to get it through the codec context there. I think it's possible for the codec context to have more accurate info here while decoding. At least I believe that is the case for video dimensions, as we can have variable resolution streams. I'm unsure if that's also true for the number of channels for audio. If you can say definitely that you know it's okay to just use the header-based metadata in both of these places, I can make the change. But if you're unsure, let's handle that later - I can create an issue for follow-up.
|
oh, I approved but I wonder if the docs failure is real 🤔
|
I've thought this was strange for a long time now - on main, in the public
VideoDecoderandAudioDecoder, we add a stream before getting the metadata. This was not the originally intended order, as evidenced by some of the error checking we do:torchcodec/src/torchcodec/decoders/_video_decoder.py
Lines 409 to 414 in 22bcf4d
We should never hit that error condition, as before we call it, we add the stream. And if the video file has no best video stream, the C++ layer would have thrown before we ever had a chance to reach this condition. I feel that it's more natural to do things in the order in this PR: first get the metadata from the file, then add the stream if the metadata is valid.
The reason why I'm doing this now is that this should simplify the decoder-native transforms. We'll want to know a video stream's height and width when pre-processing the transforms before adding a stream. And that means getting that metadata before adding a stream. In the C++ layer, this does mean accessing values in the headers in
initializeDecoder()throughAVCodecParametersthat we didn't before.