[FFmpeg-trac] #3354(avcodec:new): enhancement: Zero latency av_read_frame()

FFmpeg trac at avcodec.org
Mon Jan 27 15:40:46 CET 2014

#3354: enhancement: Zero latency av_read_frame()
             Reporter:  pjw          |                     Type:
               Status:  new          |  enhancement
            Component:  avcodec      |                 Priority:  normal
             Keywords:  mpegts,      |                  Version:  git-
  latency                            |  master
             Blocking:               |               Blocked By:
Analyzed by developer:  0            |  Reproduced by developer:  0
 == Summary ==
 I'm using ffmpeg to setup a (very) low latency video streaming application
 (in C++). Ffmpeg is used on both the server and client side. The video is
 encoded as h264 (i.e. "libx264") and transported in a transport stream
 (mpegts) over UDP.

 Up to now I've been able to reduce the latency to: encoding + transport +
 decoding + one-frame-time. I'd like it to be just encoding + transport +

 The problem seems to be that {{{av_read_frame()}}} always holds back one
 video frame, i.e. frame ''n'' is returned only when {{{av_read_frame()}}}
 for ''n+1'' is called. I'd like {{{av_read_frame()}}} to return a frame as
 soon as possible, without any delay.

 == How to test ==
 Besides the video stream, I've added an extra data stream. It is used to
 transport a timestamp along-side each video frame. The timestamp
 represents the point at which the data is sent. A data packet and a video
 packet have the same PTS. In the client code, i can synchronize the data
 stream packets with the video stream packets using PTS. Now i know when
 the data pkt and video frame were sent and I also know when they arrived.
 This allows me to calculate the transport-delay. This is of course only
 true when the server and client use the same clock. In my tests I executed
 the server and client code on the same host.

 The transport delay of the data packet is sub-milliseconds, as expected.
 However, the delay of the video frame is ~25ms (presumably 20ms = one
 frame time at 50 Hz + 5ms for encoding/decoding). I expected it to be just
 ~5ms, i.e. ecoding/decoding time.

 Wireshark shows that both the video data packets and private data packet
 are sent at the same time. So the delay of one frame (20ms) is introduced
 by the client code. I think the it is caused by (a combination of) mpegts
 and h264_parser.

 == Server code ==
 This is what i did on the server side. Note that this is not a working
 example. It has been stripped to keep it short(-ish):
 avformat_alloc_output_context2(&mFormatContext, nullptr,
    "mpegts", "udp://");

 // These flags don't seem to help.
 // mFormatContext->avio_flags |= AVIO_FLAG_DIRECT;
 // mFormatContext->flags |= AVFMT_FLAG_FLUSH_PACKETS;

 // Add video stream:
 mCodec = avcodec_find_encoder_by_name("libx264");
 mVidStream = avformat_new_stream(mFormatContext, mCodec);
 mVidStream->id = mFormatContext->nb_streams - 1;
 mCodecContext = mVidStream->codec;
 mCodecContext->codec_id = mCodec->id;
 mCodecContext->bit_rate = mBitrate;
 mCodecContext->width    = 1280;
 mCodecContext->height   = 720;
 mCodecContext->gop_size = 1;
 mCodecContext->pix_fmt  = AV_PIX_FMT_YUV420P;
 mCodecContext->time_base.den = 50; // 50 Hz
 mCodecContext->time_base.num = 1;
 mCodecContext->max_b_frames  = 0;
 mCodecContext->thread_count  = mThreads; // tried 1 - 4. All have same
 mCodecContext->thread_type   = FF_THREAD_SLICE;

 // These options also don't have the desired effect.
 // mCodecContext->flags |= CODEC_FLAG_LOW_DELAY;
 // mCodecContext->flags2 |= CODEC_FLAG2_FAST;

 // Add (private) data stream. Will be used to send a packet containing a
 // along-side each video frame.
 mDataStream = avformat_new_stream(mFormatContext, nullptr);
 mDataStream->id                = mFormatContext->nb_streams - 1;
 mDataStream->codec             = avcodec_alloc_context3(nullptr);
 mDataStream->codec->codec_type = AVMEDIA_TYPE_DATA;
 mDataStream->codec->codec_id   = AV_CODEC_ID_SMPTE_KLV;
 mDataCodecContext              = mDataStream->codec;

 if (mFormatContext->oformat->flags & AVFMT_GLOBALHEADER)
   mCodecContext->flags     |= CODEC_FLAG_GLOBAL_HEADER;
   mDataCodecContext->flags |= CODEC_FLAG_GLOBAL_HEADER;

 // No delay please! :-)
 av_opt_set(mCodecContext->priv_data, "preset", "ultrafast", 0);
 av_opt_set(mCodecContext->priv_data, "tune", "zerolatency", 0);

 avcodec_open2(mCodecContext, mCodec, nullptr);
 avformat_write_header(mFormatContext, nullptr);

 // encode loop:

 fill_yuv_image(&mDstPicture, frame_count++,
    mCodecContext->width, mCodecContext->height);

 AVPacket pkt;

 int got_packet;
 int err = avcodec_encode_video2(mCodecContext, &pkt, mFrame, &got_packet);
 if (err < 0) return;
 if (!err && got_packet && pkt.size)
   pkt.stream_index = mVidStream->index;
   pkt.duration     = 0;

   // Send a data packet alongside each video frame.
   AVPacket dataPkt;
   dataPkt.stream_index = mDataStream->index;
   dataPkt.pts          = pkt.pts; // Same as video
   dataPkt.dts          = pkt.dts;
   double tNow = get_system_time_since_epoch_in_seconds();
   dataPkt.data = (unsigned char*) &tNow;
   dataPkt.size = sizeof(double);

   // Write side data.
   av_write_frame(mFormatContext, &dataPkt);
   // Flush; probably not necessary but should not hurt.
   av_write_frame(mFormatContext, nullptr);

   // Write video frame.
   err = av_write_frame(mFormatContext, &pkt);
   // Flush; again.. probably not necessary but should not hurt.
   err = av_write_frame(mFormatContext, nullptr);

 // PTS for next frame
 mFrame->pts += av_rescale_q(1, mVidStream->codec->time_base,


 == Client code ==
 This is the client code. Also stripped in an attempt to keep it short:
 avformat_open_input(&mFormatContext, "udp://"",
 nullptr, nullptr);

 // These flags work! But this negates the use of av_read_frame() as it now
 // does not guarantee to return one frame.
 // mFormatContext->flags |= AVFMT_FLAG_NOPARSE | AVFMT_FLAG_NOFILLIN;

 // These flags seem to have no effect.
 // mFormatContext->flags |= AVFMT_FLAG_NOBUFFER;
 // mFormatContext->flags |= AVFMT_FLAG_FLUSH_PACKETS;
 // mFormatContext->avio_flags |= AVIO_FLAG_DIRECT;

 avformat_find_stream_info(mFormatContext, nullptr);

 // Video stream.
 mVideoStreamIdx = av_find_best_stream(mFormatContext, AVMEDIA_TYPE_VIDEO,
 -1, -1, nullptr, 0);

 AVStream* st = mFormatContext->streams[mVideoStreamIdx];
 // This flag works! But this negates the use of av_read_frame() as it now
 // does not guarantee to return one frame.
 // st->need_parsing = AVSTREAM_PARSE_NONE;

 // find decoder for the stream
 AVCodecContext* dec_ctx = st->codec;
 AVCodec* dec = avcodec_find_decoder(dec_ctx->codec_id);
 if (dec->capabilities & CODEC_CAP_TRUNCATED)
   dec_ctx->flags |= CODEC_FLAG_TRUNCATED;
 dec_ctx->thread_type  = FF_THREAD_SLICE;
 dec_ctx->thread_count = mThreads;

 // These don't have the desired effect:
 //    dec_ctx->flags |= CODEC_FLAG_LOW_DELAY;
 //    dec_ctx->flags2 |= CODEC_FLAG2_FAST;
 //    dec_ctx->flags2 |= CODEC_FLAG2_CHUNKS;
 //    dec_ctx->refcounted_frames = 1;

 avcodec_open2(dec_ctx, dec, nullptr);
 mVideoStream        = mFormatContext->streams[mVideoStreamIdx];
 mVideoDecodeContext = mVideoStream->codec;

 mDataStreamIdx = av_find_best_stream(mFormatContext, AVMEDIA_TYPE_DATA,
 -1, -1, nullptr, 0);

 // Decoding loop:

 static std::queue<std::pair<int64_t, double> > ptsDb;

 AVPacket pkt;
 pkt.data = nullptr;
 pkt.size = 0;

 // wait for data.
 if (av_read_frame(mFormatContext, &pkt) < 0)

 double tRecv = get_system_time_since_epoch_in_seconds();
 if (pkt.stream_index == mDataStreamIdx)
   double tData = *(double*) (pkt.data);
   printf("DAT PTS %li\trecv'd @ %.2lf [ms], trans delay %.4lf [ms]\n",
 pkt.pts, tRecv * 1e3, (tRecv - tData) * 1e3);

   ptsDb.emplace(std::make_pair(pkt.pts, tRecv));
 else if (pkt.stream_index == mVideoStreamIdx)
   // Quick hack to sync data stream packets with video packets.
   std::pair<int64_t, double> elem {0, 0};
   while (!ptsDb.empty())
     elem = ptsDb.front();
     if (elem.first < pkt.pts)
     if (elem.first == pkt.pts)
     if (elem.first > pkt.pts)
       elem.second = 0;

   double tData = elem.second;
   printf("VID PTS %li\trecv'd @ %.2lf [ms], delta with data %.2lf [ms]
 (%i)\n", pkt.pts, tRecv * 1e3, (tRecv - tData) * 1e3, ptsDb.size());

   // decode video frame
   int got_frame = 0;
   mFrame = avcodec_alloc_frame();
   double t1  = get_system_time_since_epoch_in_seconds();
   avcodec_decode_video2(mVideoDecodeContext, mFrame, &got_frame, &pkt);
   double t2  = get_system_time_since_epoch_in_seconds();
   if (got_frame)
     printf("[DEBUG] Got frame VID PTS %lli\tdecoding time %.2lf [ms]\n",
 mFrame->pkt_pts, (t2 - t1) * 1e3);


 As noted in the code above, the flags {{{AVFMT_FLAG_NOPARSE |
 and/or {{{AVSTREAM_PARSE_NONE}}} seem to (almost) do what i want. My
 understanding of these flags is that
 they essentially disable the functionality which ensures one frame is
 available. So once
 these flags are used there is no way of knowing when a frame is ready to
 be decoded, in which case they are not usable.

Ticket URL: <https://trac.ffmpeg.org/ticket/3354>
FFmpeg <http://ffmpeg.org>
FFmpeg issue tracker

More information about the FFmpeg-trac mailing list