[FFmpeg-devel] [PATCH v3 3/6] lavc/qsvdec: Replace current parser with MFXVideoDECODE_DecodeHeader()

Rogozhkin, Dmitry V dmitry.v.rogozhkin at intel.com
Tue Mar 12 01:37:18 EET 2019


On Mon, 2019-03-11 at 17:23 +0800, Li, Zhong wrote:
> > -----Original Message-----
> > From: Rogozhkin, Dmitry V
> > Sent: Saturday, March 9, 2019 8:48 AM
> > To: ffmpeg-devel at ffmpeg.org
> > Cc: Li, Zhong <zhong.li at intel.com>
> > Subject: Re: [FFmpeg-devel] [PATCH v3 3/6] lavc/qsvdec: Replace
> > current
> > parser with MFXVideoDECODE_DecodeHeader()
> > 
> > On Fri, 2019-03-08 at 15:40 +0800, Zhong Li wrote:
> > > Using MSDK parser can improve qsv decoder pass rate in some cases
> > > (E.g:
> > > sps declares a wrong level_idc, smaller than it should be).
> > > And it is necessary for adding new qsv decoders such as MJPEG and
> > > VP9
> > > since current parser can't provide enough information.
> > > Actually using MFXVideoDECODE_DecodeHeader() was disscussed at
> > > https://ffmpeg.org/pipermail/ffmpeg-devel/2015-July/175734.html
> > > and
> > > merged as commit 1acb19d, but was overwritten when merged libav
> > > patches (commit: 1f26a23) without any explain.
> > > 
> > > v2: split decode header from decode_init, and call it for
> > > everyframe
> > > to detect format/resoultion change. It can fix some regression
> > > issues
> > > such as hevc 10bits decoding.
> > > 
> > > Signed-off-by: Zhong Li <zhong.li at intel.com>
> > > ---
> > >  libavcodec/qsvdec.c       | 172 ++++++++++++++++++++----------
> > > ----
> > > ----
> > >  libavcodec/qsvdec.h       |   2 +
> > >  libavcodec/qsvdec_h2645.c |   1 +
> > >  libavcodec/qsvdec_other.c |   1 +
> > >  4 files changed, 93 insertions(+), 83 deletions(-)
> > > 
> > > diff --git a/libavcodec/qsvdec.c b/libavcodec/qsvdec.c index
> > > 4a0be811fb..b78026e14d 100644
> > > --- a/libavcodec/qsvdec.c
> > > +++ b/libavcodec/qsvdec.c
> > > @@ -120,19 +120,17 @@ static inline unsigned int
> > > qsv_fifo_size(const
> > > AVFifoBuffer* fifo)
> > >      return av_fifo_size(fifo) / qsv_fifo_item_size();
> > >  }
> > > 
> > > -static int qsv_decode_init(AVCodecContext *avctx, QSVContext *q)
> > > +static int qsv_decode_preinit(AVCodecContext *avctx, QSVContext
> > > *q,
> > > enum AVPixelFormat *pix_fmts, mfxVideoParam *param)
> > >  {
> > > -    const AVPixFmtDescriptor *desc;
> > >      mfxSession session = NULL;
> > >      int iopattern = 0;
> > > -    mfxVideoParam param = { 0 };
> > > -    int frame_width  = avctx->coded_width;
> > > -    int frame_height = avctx->coded_height;
> > >      int ret;
> > > 
> > > -    desc = av_pix_fmt_desc_get(avctx->sw_pix_fmt);
> > > -    if (!desc)
> > > -        return AVERROR_BUG;
> > > +    ret = ff_get_format(avctx, pix_fmts);
> > > +    if (ret < 0) {
> > > +        q->orig_pix_fmt = avctx->pix_fmt = AV_PIX_FMT_NONE;
> > > +        return ret;
> > > +    }
> > > 
> > >      if (!q->async_fifo) {
> > >          q->async_fifo = av_fifo_alloc(q->async_depth *
> > > qsv_fifo_item_size()); @@ -170,48 +168,72 @@ static int
> > > qsv_decode_init(AVCodecContext *avctx, QSVContext *q)
> > >          return ret;
> > >      }
> > > 
> > > -    ret = ff_qsv_codec_id_to_mfx(avctx->codec_id);
> > > -    if (ret < 0)
> > > -        return ret;
> > > +    param->IOPattern   = q->iopattern;
> > > +    param->AsyncDepth  = q->async_depth;
> > > +    param->ExtParam    = q->ext_buffers;
> > > +    param->NumExtParam = q->nb_ext_buffers;
> > > 
> > > -    param.mfx.CodecId      = ret;
> > > -    param.mfx.CodecProfile = ff_qsv_profile_to_mfx(avctx-
> > > >codec_id,
> > > avctx->profile);
> > > -    param.mfx.CodecLevel   = avctx->level ==
> > 
> > FF_LEVEL_UNKNOWN ?
> > > MFX_LEVEL_UNKNOWN : avctx->level;
> > > -
> > > -    param.mfx.FrameInfo.BitDepthLuma   = desc->comp[0].depth;
> > > -    param.mfx.FrameInfo.BitDepthChroma = desc->comp[0].depth;
> > > -    param.mfx.FrameInfo.Shift          = desc->comp[0].depth >
> > 
> > 8;
> > > -    param.mfx.FrameInfo.FourCC         = q->fourcc;
> > > -    param.mfx.FrameInfo.Width          = frame_width;
> > > -    param.mfx.FrameInfo.Height         = frame_height;
> > > -    param.mfx.FrameInfo.ChromaFormat   =
> > 
> > MFX_CHROMAFORMAT_YUV420;
> > > -
> > > -    switch (avctx->field_order) {
> > > -    case AV_FIELD_PROGRESSIVE:
> > > -        param.mfx.FrameInfo.PicStruct =
> > 
> > MFX_PICSTRUCT_PROGRESSIVE;
> > > -        break;
> > > -    case AV_FIELD_TT:
> > > -        param.mfx.FrameInfo.PicStruct =
> > 
> > MFX_PICSTRUCT_FIELD_TFF;
> > > -        break;
> > > -    case AV_FIELD_BB:
> > > -        param.mfx.FrameInfo.PicStruct =
> > 
> > MFX_PICSTRUCT_FIELD_BFF;
> > > -        break;
> > > -    default:
> > > -        param.mfx.FrameInfo.PicStruct =
> > 
> > MFX_PICSTRUCT_UNKNOWN;
> > > -        break;
> > > -    }
> > > +    return 0;
> > > + }
> > > 
> > > -    param.IOPattern   = q->iopattern;
> > > -    param.AsyncDepth  = q->async_depth;
> > > -    param.ExtParam    = q->ext_buffers;
> > > -    param.NumExtParam = q->nb_ext_buffers;
> > > +static int qsv_decode_init(AVCodecContext *avctx, QSVContext *q,
> > > mfxVideoParam *param)
> > > +{
> > > +    int ret;
> > > 
> > > -    ret = MFXVideoDECODE_Init(q->session, &param);
> > > +    avctx->width        = param->mfx.FrameInfo.CropW;
> > > +    avctx->height       = param->mfx.FrameInfo.CropH;
> > > +    avctx->coded_width  = param->mfx.FrameInfo.Width;
> > > +    avctx->coded_height = param->mfx.FrameInfo.Height;
> > > +    avctx->level        = param->mfx.CodecLevel;
> > > +    avctx->profile      = param->mfx.CodecProfile;
> > > +    avctx->field_order  = ff_qsv_map_picstruct(param-
> > > > mfx.FrameInfo.PicStruct);
> > > 
> > > +    avctx->pix_fmt      = ff_qsv_map_fourcc(param-
> > > > mfx.FrameInfo.FourCC);
> > > 
> > > +
> > > +    ret = MFXVideoDECODE_Init(q->session, param);
> > >      if (ret < 0)
> > >          return ff_qsv_print_error(avctx, ret,
> > >                                    "Error initializing the
> > 
> > MFX video
> > > decoder");
> > > 
> > > -    q->frame_info = param.mfx.FrameInfo;
> > > +    q->frame_info = param->mfx.FrameInfo;
> > > +
> > > +    return 0;
> > > +}
> > > +
> > > +static int qsv_decode_header(AVCodecContext *avctx, QSVContext
> > > *q,
> > > AVPacket *avpkt, enum AVPixelFormat *pix_fmts, mfxVideoParam
> > > *param)
> > > +{
> > > +    int ret;
> > > +
> > > +    mfxBitstream bs = { { { 0 } } };
> > > +
> > > +    if (avpkt->size) {
> > > +        bs.Data       = avpkt->data;
> > > +        bs.DataLength = avpkt->size;
> > > +        bs.MaxLength  = bs.DataLength;
> > > +        bs.TimeStamp  = avpkt->pts;
> > > +        if (avctx->field_order == AV_FIELD_PROGRESSIVE)
> > > +            bs.DataFlag   |=
> > 
> > MFX_BITSTREAM_COMPLETE_FRAME;
> > > +    } else
> > > +        return AVERROR_INVALIDDATA;
> > > +
> > > +
> > > +    if(!q->session) {
> > > +        ret = qsv_decode_preinit(avctx, q, pix_fmts, param);
> > > +        if (ret < 0)
> > > +            return ret;
> > > +    }
> > > +
> > > +    ret = ff_qsv_codec_id_to_mfx(avctx->codec_id);
> > > +    if (ret < 0)
> > > +        return ret;
> > > +
> > > +    param->mfx.CodecId = ret;
> > > +    ret = MFXVideoDECODE_DecodeHeader(q->session, &bs, param);
> > > +    if (MFX_ERR_MORE_DATA == ret) {
> > > +       return AVERROR(EAGAIN);
> > > +    }
> > > +    if (ret < 0)
> > > +        return ff_qsv_print_error(avctx, ret,
> > > +                "Error decoding stream header");
> > > 
> > >      return 0;
> > >  }
> > > @@ -494,7 +516,10 @@ int ff_qsv_process_data(AVCodecContext
> > > *avctx,
> > > QSVContext *q,
> > >      uint8_t *dummy_data;
> > >      int dummy_size;
> > >      int ret;
> > > -    const AVPixFmtDescriptor *desc;
> > > +    mfxVideoParam param = { 0 };
> > > +    enum AVPixelFormat pix_fmts[3] = { AV_PIX_FMT_QSV,
> > > 
> > 
> > +                                       AV_PIX_FMT_NV
> > 12,
> > > 
> > 
> > +                                       AV_PIX_FMT_N
> > ONE };
> > > 
> > >      if (!q->avctx_internal) {
> > >          q->avctx_internal = avcodec_alloc_context3(NULL); @@
> > 
> > -508,7
> > > +533,6 @@ int ff_qsv_process_data(AVCodecContext *avctx,
> > > QSVContext
> > > *q,
> > >              return AVERROR(ENOMEM);
> > > 
> > >          q->parser->flags |= PARSER_FLAG_COMPLETE_FRAMES;
> > > -        q->orig_pix_fmt   = AV_PIX_FMT_NONE;
> > >      }
> > > 
> > >      if (!pkt->size)
> > > @@ -521,15 +545,17 @@ int ff_qsv_process_data(AVCodecContext
> > 
> > *avctx,
> > > QSVContext *q,
> > >                       pkt->data, pkt->size, pkt->pts, pkt->dts,
> > >                       pkt->pos);
> > > 
> > > -    avctx->field_order  = q->parser->field_order;
> > >      /* TODO: flush delayed frames on reinit */
> > > -    if (q->parser->format       != q->orig_pix_fmt    ||
> > > -        FFALIGN(q->parser->coded_width, 16)  != FFALIGN(avctx-
> > > > coded_width, 16) ||
> > > 
> > > -        FFALIGN(q->parser->coded_height, 16) != FFALIGN(avctx-
> > > > coded_height, 16)) {
> > > 
> > > -        enum AVPixelFormat pix_fmts[3] = { AV_PIX_FMT_QSV,
> > > 
> > 
> > -                                           AV_PIX_F
> > MT_NONE,
> > > 
> > 
> > -                                           AV_PIX_F
> > MT_NONE };
> > > -        enum AVPixelFormat qsv_format;
> > > +
> > > +    // sw_pix_fmt was initialized as NV12, will be updated after
> > > header decoded if not true.
> > > +    if (q->orig_pix_fmt != AV_PIX_FMT_NONE)
> > > +        pix_fmts[1] = q->orig_pix_fmt;
> > 
> > Well, it still seems non self-explanatory to me why we have
> > pix_fmts[1] here.
> 
> I think it has been explained and descripted in my comment. It means
> sw_pix_fmt.
> For more detail, probably you want to check the implementation of
> ff_get_format().
> 
> > > +
> > > +    ret = qsv_decode_header(avctx, q, pkt, pix_fmts, &param);
> > 
> > Parsing can be expensive to perform on each frame. See more below.
> 
> 1. Parsing is just to reading header bits, I don't think it is CPU
> heavy calculation task, compared with slice level decoding. 
> 2. Mostly only key frames have completed header. For non-key frame,
> just return MFX_ERR_MORE_DATA instead of real parsing work. 


That was long when I looked myself in details of DecodeHeader, but I
think that what the following will happen if frame is missing a header
and you feed it into mediasdk: mediasdk will read the whole bitstream
to the end trying to find a header. Once it will fail to find it
mediasdk will return MFX_ERR_MORE_DATA. I think you will be able to see
that if you will print mfxBitstream::DataOffset and
mfxBitstream::DataLength before and after your call for
MFXVideoDECODE_DecodeHeader(). So, mediasdk actually changes DataOffset
either pointing to the beginning of the header or right after the
header (I don't remember exactly, sorry). If header not found  it sets
DataOffset=DataLength and DataLength=0 requesting new bitstream
portion. The reason why you think everything is ok is that you
construct mfxBitstream structure anew each time in qsv_decode_header.
Hence, you just don't notice that mediasdk performs a massive parsing.

I afraid that as is your change will result in massive CPU%
regressions. 

> 
> > > +
> > > +    if (ret >= 0 && (q->orig_pix_fmt !=
> > > ff_qsv_map_fourcc(param.mfx.FrameInfo.FourCC) ||
> > > +        avctx->coded_width  != param.mfx.FrameInfo.Width ||
> > > +        avctx->coded_height != param.mfx.FrameInfo.Height)) {
> > >          AVPacket zero_pkt = {0};
> > 
> > You are actually trying to perform the task which is being done by
> > mediasdk.
> > Moreover, you don't detect all the cases when mediasdk decoder will
> > require
> > to be reset. For example, you may have 2 streams concatenated which
> > have
> > perfectly the same resolution, but still decoder reset will be
> > required because
> > of profile/level/gop structure changes and memory reallocation or
> > better to
> > say additional allocation could also be required because mediasdk
> > may
> > require different (bigger or lesser) number of surfaces to properly
> > handle
> > new bitstream. See more below.
> 
> Which case of gop structure changing need to reset? Could you please,
> give an example.
> 
> > > 
> > >          if (q->buffered_count) {
> > > @@ -538,45 +564,24 @@ int ff_qsv_process_data(AVCodecContext
> > 
> > *avctx,
> > > QSVContext *q,
> > >              q->buffered_count--;
> > >              return qsv_decode(avctx, q, frame, got_frame,
> > 
> > &zero_pkt);
> > >          }
> > > -
> > >          q->reinit_flag = 0;
> > > 
> > > -        qsv_format = ff_qsv_map_pixfmt(q->parser->format, &q-
> > > > fourcc);
> > > 
> > > -        if (qsv_format < 0) {
> > > -            av_log(avctx, AV_LOG_ERROR,
> > > -                   "Decoding pixel format '%s' is not
> > 
> > supported\n",
> > > -                   av_get_pix_fmt_name(q->parser->format));
> > > -            ret = AVERROR(ENOSYS);
> > > -            goto reinit_fail;
> > > -        }
> > > +        q->orig_pix_fmt = avctx->pix_fmt = pix_fmts[1] =
> > > ff_qsv_map_fourcc(param.mfx.FrameInfo.FourCC);
> > > 
> > > -        q->orig_pix_fmt     = q->parser->format;
> > > -        avctx->pix_fmt      = pix_fmts[1] = qsv_format;
> > > -        avctx->width        = q->parser->width;
> > > -        avctx->height       = q->parser->height;
> > > -        avctx->coded_width  = FFALIGN(q->parser->coded_width,
> > 
> > 16);
> > > -        avctx->coded_height = FFALIGN(q->parser->coded_height,
> > 
> > 16);
> > > -        avctx->level        = q->avctx_internal->level;
> > > -        avctx->profile      = q->avctx_internal->profile;
> > > +        avctx->coded_width  = param.mfx.FrameInfo.Width;
> > > +        avctx->coded_height = param.mfx.FrameInfo.Height;
> > > 
> > > -        ret = ff_get_format(avctx, pix_fmts);
> > > +        ret = qsv_decode_preinit(avctx, q, pix_fmts, &param);
> > >          if (ret < 0)
> > >              goto reinit_fail;
> > > +        q->initialized = 0;
> > > +    }
> > > 
> > > -        avctx->pix_fmt = ret;
> > > -
> > > -        desc = av_pix_fmt_desc_get(avctx->pix_fmt);
> > > -        if (!desc)
> > > -            goto reinit_fail;
> > > -
> > > -         if (desc->comp[0].depth > 8) {
> > > -            avctx->coded_width
> > 
> > =  FFALIGN(q->parser->coded_width,
> > > 32);
> > > -            avctx->coded_height =
> > 
> > FFALIGN(q->parser->coded_height,
> > > 32);
> > > -        }
> > > -
> > > -        ret = qsv_decode_init(avctx, q);
> > > +    if (!q->initialized) {
> > > +        ret = qsv_decode_init(avctx, q, &param);
> > >          if (ret < 0)
> > >              goto reinit_fail;
> > > +        q->initialized = 1;
> > >      }
> > > 
> > >      return qsv_decode(avctx, q, frame, got_frame, pkt);
> > 
> > So, from my perspective you should not attempt to call decode
> > header on
> > each incoming data portion and try to interpret results. Instead
> > you should
> > respect what mediasdk DecodeFrameAsync is returning. Basically it
> > will
> > return MFX_ERR_INCOMPATIBLE_VIDEO_PARAM is decoder reset is
> > required.
> > This gives you 2 situations when you should call
> > DecodeHeader:
> > 1. You either call it if decoder is not initalized at all 2. Or you
> > call it upon
> > receiving MFX_ERR_INCOMPATIBLE_VIDEO_PARAM
> > 
> > Once you received MFX_ERR_INCOMPATIBLE_VIDEO_PARAM step to
> > perform are:
> > 1. Flush cached frames from medisdk decoder passing nullptr
> > bitstream to
> > DecodeFrameAsync 2. Run decode header and get updated parameters 3.
> > Check (via QueryIOSurf) and potentially reallocate surfaces to
> > satisfy new
> > bitstream requirement 4. Reset decoder with new parameters
> 
> 1. This patch is just to replace original parser to MSDK parser. I
> don't like to mix other thing here (thus probably introduces new
> bugs, but full round testing has been done for this patch).
>   If it is really necessary, should be done by an additional patch. 
> 2. Unfortunately, if we don't reset and set correct
> resolution/pix_fmt, FFmpeg-qsv already failed in the progress of MSDK
> initialization (e,g: pass a NV12 format but actually it is P010. see
> : https://github.com/FFmpeg/FFmpeg/blob/master/libavcodec/qsv.c#L424
> )
>   There is no opportunity to get the status of DecodeFrameAsync at
> these case. I haven't compared MSDK sample decoder with current
> FFmpeg (Will take a look later). But IIRC, MSDK sample decoder isn't
> doing a better job than ffmpeg-qsv for reinit case.
>   (one example, it is hang when run sample_decode h264 -i /fate-
> suite/h264/reinit-large_420_8-to-small_420_8.h264 -r , and this clip
> is changing from a larger resolution to a smaller one. I wonder to
> know what will happen if change resolution from smaller to a larger,
> eg, 720x480 to 1920x1080).
> 3. IMHO, reinit work is better to do before decoding, instead of
> decoding it and recover it after it is failed.

Per my opinion you don't actually miss the resolution change following
the algorithm which I described. Secondly - you don't actually recover
from the decoding failure since there was not decoding failure
actually. Mediasdk parses bitstream and if it detects that decoding can
not be completed without user intersection it signals this to you via
warning or error. This as not a fatal error that's just a signal: "I
can't continue as is - handle this". So, mediasdk parser stops on the
point of stream change. Mediasdk internal state is correct one, it is
not corrupted and it corresponds to the bitstream which was handled
before incompatible bitstream part was detected. You need to call
DecodeHeader with the same mfxBitstream you just used to pass to
DecodeFrameAsync. Mind that you may find mfxBitstream::DataOffset
changed after DecodeFrameAsync pointing to the beginning of the new
header.

> 4. This a common function for all decoders, I would like to see it is
> under control of ffmpeg. MSDK may have no unified implementation of
> all codecs ( e.g: https://github.com/Intel-Media-
> SDK/MediaSDK/issues/333 ).
>   Your suggestion may be addressed only when I see any new issues or
> bugs. But currently I can't see the necessity. 
> 
> > Application may attempt to optimize procedure on step 3 above and
> > avoid
> > surfaces reallocation if possible. For example, if you have
> > allocated
> > 1920x1080 surfaces and resolution changed to 720x480, you may
> > actually
> > stay on 1920x1080 if you are ok to use more memory hoping to
> > receive
> > 1920x1080 stream in the future again. Eventually this would work if
> > you have
> > enough number of 1920x1080 surfaces.
> 
> Hm, this is a simpler topic, https://patchwork.ffmpeg.org/patch/12257
> / is to prepare this though decoder should be updated too. But as
> previous discussion, thus should be a separated patch and need pre-
> testing. 


More information about the ffmpeg-devel mailing list