[FFmpeg-devel] mpegts encoder not interleaving audio/video packets well

Tony Strauss tony at animoto.com
Tue Jun 14 19:48:14 CEST 2011


This issue exists in the most recent versions of both ffmpeg and libav.
 Here's how to reproduce:

wget http://s3.amazonaws.com/tony-strauss-public/180p_d736.mp4
ffmpeg -y -i 180p_d736.mp4 -acodec copy -vcodec copy -vbsf h264_mp4toannexb
180p.ts
ffprobe -show_packets 180p.ts | grep pts_time | less -i

Scan through the output; at some point, you should see a jump like:
pts_time=23.800000
pts_time=23.866667
pts_time=23.933333
pts_time=24.000000
pts_time=24.066667
pts_time=20.231378 # JUMP back 4 seconds!
pts_time=20.254589
pts_time=20.277800
pts_time=20.301011
pts_time=20.324222
pts_time=20.347433

It turns out that video frames go up to pts 24.066667, and then audio frames
starting at 20.231378 are encountered.  This large jump is causing issues in
a downstream player.

Why aren't the audio and video frames interleaved better?  The cause seems
to be the following code in libavformat/mpegtsenc.c (around line 950):

    if (st->codec->codec_type != AVMEDIA_TYPE_AUDIO) {
        // for video and subtitle, write a single pes packet
        mpegts_write_pes(s, st, buf, size, pts, dts);
        av_free(data);
        return 0;
    }

    if (ts_st->payload_index + size > DEFAULT_PES_PAYLOAD_SIZE) {
        mpegts_write_pes(s, st, ts_st->payload, ts_st->payload_index,
                         ts_st->payload_pts, ts_st->payload_dts);
        ts_st->payload_index = 0;
    }

Basically, it seems as if the mpegts encoder writes video frames as soon as
they're encountered but buffers audio frames up to DEFAULT_PES_PAYLOAD_SIZE.
 I can fix my issue by simply changing DEFAULT_PES_PAYLOAD_SIZE to 0 in the
above conditional (always write audio frames).  Does the encoder buffer
audio frames because audio frames in general are smaller than video frames
and so PES overhead is more of an issue?  I did see that my file size
increased a bit with my "fix".

I'd like to create a long-term fix for this, but I'm not sure of the right
move.  One thought I have is to keep track of the pts of the first audio
frame in the buffer; if the pts of a later audio frame is greater than the
pts of the first audio frame by some threshold (say .5 sec), then flush the
buffer.

What does everyone think?

Thank you!

Tony


More information about the ffmpeg-devel mailing list