[FFmpeg-devel] [PATCH 3/4 v2] ffmpeg: move A/V non-streamcopy initialization to a later point

Jan Ekström jeebjp at gmail.com
Mon Sep 21 00:59:46 EEST 2020

On Wed, Sep 16, 2020 at 1:20 PM Jan Ekström <jeebjp at gmail.com> wrote:
> On Wed, Sep 16, 2020 at 12:24 AM Michael Niedermayer
> <michael at niedermayer.cc> wrote:
> >
> > On Mon, Sep 14, 2020 at 12:33:14AM +0300, Jan Ekström wrote:
> > > - For video, this means a single initialization point in do_video_out.
> > > - For audio we unfortunately need to do it in two places just
> > >   before the buffer sink is utilized (if av_buffersink_get_samples
> > >   would still work according to its specification after a call to
> > >   avfilter_graph_request_oldest was made, we could at least remove
> > >   the one in transcode_step).
> > >
> > > Other adjustments to make things work:
> > > - As the AVFrame PTS adjustment to encoder time base needs the encoder
> > >   to be initialized, so it is now moved to do_{video,audio}_out,
> > >   right after the encoder has been initialized. Due to this,
> > >   the additional parameter in do_video_out is removed as it is no
> > >   longer necessary.
> > > ---
> > >  fftools/ffmpeg.c | 112 ++++++++++++++++++++++++++++++++---------------
> > >  1 file changed, 77 insertions(+), 35 deletions(-)
> >
> > breaks this:
> > ./ffmpeg -ss 30.0 -i ~/tickets/1745/1745-Sample.mkv -f vob -c:a copy  -bitexact -t 1 -f framecrc -
> > (sample file is linked in the ticket https://trac.ffmpeg.org/ticket/1745)
> >
> > (Too many packets buffered for output stream 0:1. Conversion failed!)
> >
> > thx
> With an initial look with -debug_ts -v verbose -max_muxing_queue_size
> 10000 , it appears that audio packets start at about -5.5 seconds, and
> video is getting skipped until an exact zero point is hit.
> So either the offset is incorrect, or we should also be dropping the
> audio packets as well until zero point is found.
> Jan

So, with a further look this stems from a difference in how stream
copy and non-stream copy cases are handled:
- For stream copy by default any packets received after the seek point
are thrown into the muxer, even if we end up way before the actual
seek point. There is `-copypriorss 0` to control that behavior.
- For re-encoding use cases we wait until we hit the wanted seek point.

In this specific example, seeking gets us to 5 or so seconds before
the wanted point.

- Audio stream packets would get thrown to muxer, as both streams get
initialized at a similar point. Any buffering/sync issues are left to
lavf/the muxer.
- Video stream packets would get thrown out until the seek point was hit.
- So you have ~5.5 seconds of audio only, and then video.

- Audio stream packets are passed into ffmpeg.c's muxing buffer, as
the video stream has not yet been initialized as the first video frame
has not been decoded.
- Video stream packets still get thrown out until the seek point is hit.
- Since the audio packets contain (1536/48000) seconds of audio, quite
a few have to be buffered (187 according to my testing) for the mux to
succeed. This way larger than the default muxing queue.
- Thus, the remux part fails.

- Alternative A: To keep the current behavior, we would have to start
decoding those frames, but dropping them out later in the logic. That
way the output stream may be initialized, but no video output would
- Alternative B: We normalize the behavior according to how stream
copy works by default. Video also gets output if the packets are
- Alternative C: We normalize the behavior according to how the
re-encoding logic works by default. `-copypriorss 0` would effectively
become the default.

I think in a way I would prefer Alternative C, as I'm not sure how
many people expect to get packets from way before their requested seek
point. Of course, in a perfect world we would do such things with
indexing a la ffms2, where we know exactly how long and how structured
an input is - and to which packet to seek and how much to decode or


More information about the ffmpeg-devel mailing list