[FFmpeg-user] How does the decoder know to make a frame ready for receiving?

Wed Feb 14 03:37:40 EET 2024

Let's say I have a video with I, P and B frames. My understanding is
that these frames are sorted by DTS when in the file (but not
necessarily by PTS).

I go through the packets in the file (which are in DTS order I
assume), and feed them to the decoder. If it's an I-frame, the decoder
can make it available right away to the caller of
avcodec_receive_frame().

However, if it's a B frame, how does the decoder know whether to
buffer the frame internally or make it available to the user? Since
the decoder has to make frames available to the user in PTS order, how
does it know this frame is the *frame with the lowest PTS since the
previously made-available-frame*?

A trivial example: let's say we have the following frames in DTS order:

DTS:  1  2  3  4  5  10
PTS:  1  9  7  5  6  10
TYPE: I  P  B  B  B  I

We are feeding them to the decoder in DTS order by calling
avcodec_send_packet(). Note that we don't have uniform gaps between
DTS or PTS because the stream has variable frame rate.

We need to display the frames in the following order:
PTS=1, 5, 6, 7, 9, 10, ...
That means avcodec_receive_frame() needs to return frames in that
order. IIUC, avcodec_receive_frame() returns frames in increasing PTS
order.

Now when the decoder gets fed the {DTS=3,PTS=7} frame, somehow it
needs to not make this frame available to avcodec_receive_frame().
Instead we must return AVERROR(EAGAIN) to the user when the user calls
avcodec_receive_frame().

When the decoder gets fed the {DTS=4,PTS=5) frame, somehow it needs to
realize that this is the frame with the lowest PTS that it will ever
encounter in the future. And it needs to make it available to the user
when the user calls avcodec_receive_frame().

My questions:

1. How does the decoder guarantee that the frame it is returning will
be the lowest PTS it will ever encounter in the future?
Does it rely upon monotonically increasing DTS and maintain a set of
frames in increasing PTS and just make one available to the user when
the lowest in the set has PTS <= DTS (since for any frame PTS >= DTS)?
Could the decoder not run into a situation where it has to hold a lot
of frames in memory because it can't determine it is safe to make any
available to avcodec_receive_frame()?

2. Is it possible to know the PTS of the next frame without reading it
from the stream? Assuming the stream will not be corrupted, etc.?