[FFmpeg-devel] Added - HW accelerated H.264 and HEVC encoding for AMD GPUs based on AMF SDK

Mironov, Mikhail Mikhail.Mironov at amd.com
Sun Oct 29 20:39:21 EET 2017


> >      REGISTER_ENCODER(H263_V4L2M2M,      h263_v4l2m2m);
> >      REGISTER_ENCDEC (LIBOPENH264,       libopenh264);
> > +    REGISTER_ENCODER(H264_AMF,          h264_amf);
> > +	REGISTER_ENCODER(H264_AMF,          h264_amf_d3d11va);
> 
> No tabs.  Why is the d3d11 version separate?  The encoder should be able to
> accept multiple pixfmts.
> 

It does accept multiple formants but there is a code that searches for 
accelerator name in the encoder name and unless it is present disables 
passing accelerator to encoder. See hw_device_setup_for_encode().

> > +    { AV_PIX_FMT_NV12,       AMF_SURFACE_NV12 },
> > +    { AV_PIX_FMT_BGRA,       AMF_SURFACE_BGRA },
> > +    { AV_PIX_FMT_ARGB,       AMF_SURFACE_ARGB },
> > +    { AV_PIX_FMT_RGBA,       AMF_SURFACE_RGBA },
> > +    { AV_PIX_FMT_GRAY8,      AMF_SURFACE_GRAY8 },
> > +    { AV_PIX_FMT_YUV420P,    AMF_SURFACE_YUV420P },
> > +    { AV_PIX_FMT_BGR0,       AMF_SURFACE_BGRA },
> > +    { AV_PIX_FMT_YUV420P,    AMF_SURFACE_YV12 },
> > +    { AV_PIX_FMT_YUYV422,    AMF_SURFACE_YUY2 },
> 
> Do all of these formats actually work?

This is just a translation table. Actual support is in AVCodec::rix_fmts 

> 
> > +    { AV_PIX_FMT_D3D11,      AMF_SURFACE_NV12 },
> 
> D3D11 surfaces need not be NV12.  The actual format is in
> AVHWFramesContext.sw_format - if you only support 8-bit then something
> nasty probably happens if you give it P010 surfaces.
> 

Agreed, but how should I define D3D11 NV12 as input format if I only have AV_PIX_FMT_D3D11?

> > +};
> > +
> > +static enum AMF_SURFACE_FORMAT amf_av_to_amf_format(enum
> AVPixelFormat fmt)
> > +{
> > +    for (int i = 0; i < amf_countof(format_map); i++) {
> > +        if (format_map[i].av_format == fmt) {
> > +            return format_map[i].amf_format;
> > +        }
> > +    }
> > +    return AMF_SURFACE_UNKNOWN;
> > +}
> > +
> > +// virtual functions decalred
> 
> What does this comment mean?
> 

These functions are virtual functions put in real virtual table.  

> > +
> > +    version_fun(&ctx->version);
> > +    init_fun(AMF_FULL_VERSION, &ctx->factory);
> > +    ctx->factory->pVtbl->GetTrace(ctx->factory, &ctx->trace);
> > +    ctx->factory->pVtbl->GetDebug(ctx->factory, &ctx->debug);
> 
> Do all of these functions necessarily succeed?
> 

Yes.

 
> > +
> > +    // try to reuse existing DX device
> > +
> > +    if (avctx->hw_frames_ctx) {
> > +        AVHWFramesContext *device_ctx = (AVHWFramesContext*)avctx-
> >hw_frames_ctx->data;
> > +        if (device_ctx->device_ctx->type == AV_HWDEVICE_TYPE_D3D11VA){
> > +            if (device_ctx->device_ctx->hwctx) {
> > +                AVD3D11VADeviceContext *device_d3d11 =
> (AVD3D11VADeviceContext *)device_ctx->device_ctx->hwctx;
> > +                res = ctx->context->pVtbl->InitDX11(ctx->context, device_d3d11-
> >device, AMF_DX11_1);
> > +                if (res == AMF_OK) {
> > +                    ctx->hw_frames_ctx = av_buffer_ref(avctx->hw_frames_ctx);
> > +                } else {
> > +                    av_log(avctx, AV_LOG_INFO, "amf_shared: avctx-
> >hw_frames_ctx has non-AMD device, switching to default");
> 
> I'm not sure this is going to act sensibly - if the user has D3D11 frames input
> on another device, does it work?
> 

If device is not AMD's the code is trying to create another device  - the compatible one . 
In these cases the submission will be via system memory.

> > +    surface->pVtbl->SetPts(surface, frame->pts);
> 
> Does this accept the same range as frame->pts, including AV_NOPTS_VALUE?
> 

Yes, encoder doesn’t use pts, just passes the value through for convenience.

> > +
> > +int ff_amf_encode_frame(AVCodecContext *avctx, AVPacket *pkt,
> > +                        const AVFrame *frame, int *got_packet)
> > +{
> > +    int             ret = 0;
> > +    AMF_RESULT      res = AMF_OK;
> > +    AmfContext     *ctx = avctx->priv_data;
> > +    AMFSurface     *surface = 0;
> > +    AMFData        *data = 0;
> > +    amf_bool       submitted = false;
> > +
> > +    while (!submitted) {
> > +        if (!frame) { // submit drain
> > +            if (!ctx->eof) { // submit drain onre time only
> > +                res = ctx->encoder->pVtbl->Drain(ctx->encoder);
> > +                if (res == AMF_INPUT_FULL) {
> > +                    av_usleep(1000); // input queue is full: wait, poll and submit
> Drain again
> > +                                     // need to get some output and try again
> > +                } else if (res == AMF_OK) {
> > +                    ctx->eof = true; // drain started
> > +                    submitted = true;
> > +                }
> > +            }
> > +        } else { // submit frame
> > +            if (surface == 0) { // prepare surface from frame one time only
> > +                if (frame->hw_frames_ctx && ( // HW frame detected
> > +                                              // check if the same hw_frames_ctx as used in
> initialization
> > +                    (ctx->hw_frames_ctx && frame->hw_frames_ctx->data == ctx-
> >hw_frames_ctx->data) ||
> > +                    // check if the same hw_device_ctx as used in initialization
> > +                    (ctx->hw_device_ctx && ((AVHWFramesContext*)frame-
> >hw_frames_ctx->data)->device_ctx ==
> > +                    (AVHWDeviceContext*)ctx->hw_device_ctx->data)
> > +                )) {
> > +                    ID3D11Texture2D* texture = (ID3D11Texture2D*)frame-
> >data[0]; // actual texture
> > +                    int index = (int)(size_t)frame->data[1]; // index is a slice in
> texture array is - set to tell AMF which slice to use
> > +                    texture->lpVtbl->SetPrivateData(texture,
> &AMFTextureArrayIndexGUID, sizeof(index), &index);
> > +
> > +                    res = ctx->context->pVtbl->CreateSurfaceFromDX11Native(ctx-
> >context, texture, &surface, NULL); // wrap to AMF surface
> > +                    surface->pVtbl->SetCrop(surface, 0, 0, frame->width, frame-
> >height); // decode surfaces are vertically aligned by 16 tell AMF real size
> > +                    surface->pVtbl->SetPts(surface, frame->pts);
> > +                } else {
> > +                    res = ctx->context->pVtbl->AllocSurface(ctx->context,
> AMF_MEMORY_HOST, ctx->format, avctx->width, avctx->height, &surface);
> > +                    AMF_RETURN_IF_FALSE(ctx, res == AMF_OK, AVERROR_BUG,
> "AllocSurface() failed  with error %d", res);
> > +                    amf_copy_surface(avctx, frame, surface);
> > +                }
> > +            }
> > +            // encode
> > +            res = ctx->encoder->pVtbl->SubmitInput(ctx->encoder,
> (AMFData*)surface);
> > +            if (res == AMF_INPUT_FULL) { // handle full queue
> > +                av_usleep(1000); // input queue is full: wait, poll and submit
> surface again
> 
> Is there really no way in the API to wait for this properly?
> 

The AMF runtime is designed without threads and sleeps inside. It is up to
 application to poll output and this way make space in the input HW queue. 
But if input queue is really full it does make sense to sleep and 
continue polling to avoid unnecessary CPU burn. 


> > +            } else {
> > +                surface->pVtbl->Release(surface);
> > +                surface = NULL;
> > +                AMF_RETURN_IF_FALSE(ctx, res == AMF_OK,
> AVERROR_UNKNOWN, "SubmitInput() failed with error %d", res);
> > +                submitted = 1;
> > +            }
> > +        }
> > +        // poll results
> > +        if (!data) {
> > +            res = ctx->encoder->pVtbl->QueryOutput(ctx->encoder, &data);
> > +            if (data) {
> > +                AMFBuffer* buffer;
> > +                AMFGuid guid = IID_AMFBuffer();
> > +                data->pVtbl->QueryInterface(data, &guid, (void**)&buffer); //
> query for buffer interface
> > +                ret = amf_copy_buffer(avctx, pkt, buffer);
> > +                if (!ret)
> > +                    *got_packet = 1;
> > +                buffer->pVtbl->Release(buffer);
> > +                data->pVtbl->Release(data);
> > +                if (ctx->eof) {
> > +                    submitted = true; // we are in the drain state - no submissions
> > +                }
> > +            } else if (res == AMF_EOF) {
> > +                submitted = true; // drain complete
> > +            } else {
> > +                if (!submitted) {
> > +                    av_usleep(1000); // wait and poll again
> > +                }
> > +            }
> > +        }
> > +    }
> 
> I suspect this setup is not actually going to follow the constraints of the
> deprecated encode2().  Given the API here, I think you would be much better
> off writing with send_frame()/receive_packet().

I considered this, but without a thread that would call receive_packet() the implementation 
will fall into the same pattern as it is now but with an additional queue of ready outputs.

> > +    AMFSize             framesize = AMFConstructSize(avctx->width, avctx-
> >height);
> > +    AMFRate             framerate = AMFConstructRate(avctx->time_base.den,
> avctx->time_base.num * avctx->ticks_per_frame);
> 
> Is VFR encoding not supported?
> 

The encoder uses frame rate only for rate control. It does not take in account 
frame duration in case of VFR.

> > +    if (avctx->rc_initial_buffer_occupancy != 0) {
> > +        int percent = avctx->rc_buffer_size * 64 / avctx-
> >rc_initial_buffer_occupancy;
> > +        if (percent > 64)
> > +            percent = 64;
> 
> ???
> 

This is an attempt to translate to 1-64 range which is exposed by the encoder.

> > +    // Bitrate
> > +    AMF_ASSIGN_PROPERTY_INT64(res, ctx->encoder,
> AMF_VIDEO_ENCODER_TARGET_BITRATE, avctx->bit_rate);
> > +    if (ctx->rate_control_mode ==
> AMF_VIDEO_ENCODER_RATE_CONTROL_METHOD_CBR) {
> > +        AMF_ASSIGN_PROPERTY_INT64(res, ctx->encoder,
> AMF_VIDEO_ENCODER_PEAK_BITRATE, avctx->bit_rate);
> > +    } else {
> > +        int rc_max_rate = avctx->rc_max_rate >= avctx->bit_rate ? avctx-
> >rc_max_rate : avctx->bit_rate * 13 / 10;
> 
> Where does 13/10 come from?
> 

For best results rc_max_rate should be bigger then bit_rate  for VBR. For CBR it is ignored. 
What is the better way to correct an unset parameter?

> > +
> > +    // fill extradata
> > +    AMFVariantInit(&var);
> 
> Can this fail?
> 
Not if var cleared with 0.

> > +// encoder connected with D3D11 HW accelerator
> > +AVCodec ff_h264_amf_d3d11va_encoder = {
> > +    .name = "h264_amf_d3d11va",
> > +    .long_name = NULL_IF_CONFIG_SMALL("AMD AMF H.264 Encoder with
> d3d11va"),
> > +    .type = AVMEDIA_TYPE_VIDEO,
> > +    .id = AV_CODEC_ID_H264,
> > +    .init = amf_encode_init_h264,
> > +    .encode2 = ff_amf_encode_frame,
> > +    .close = ff_amf_encode_close,
> > +    .priv_data_size = sizeof(AmfContext),
> > +    .priv_class = &h264_amf_d3d11va_class,
> > +    .defaults = defaults,
> > +    .capabilities = AV_CODEC_CAP_DELAY,
> > +    .caps_internal = FF_CODEC_CAP_INIT_CLEANUP,
> > +    .pix_fmts = ff_amf_pix_fmts,
> > +};
> 
> As above, why does this separate (identical) instance exist?

See explanation about accelerator handling for encoders above.

> > +static const AVOption options[] = {
> > +    { "usage",          "Set the encoding usage",             OFFSET(usage),
> AV_OPT_TYPE_INT,   { .i64 =
> AMF_VIDEO_ENCODER_HEVC_USAGE_TRANSCONDING },
> AMF_VIDEO_ENCODER_HEVC_USAGE_TRANSCONDING,
> AMF_VIDEO_ENCODER_HEVC_USAGE_WEBCAM, VE, "usage" },
> > +    { "transcoding",    "", 0, AV_OPT_TYPE_CONST, { .i64 =
> AMF_VIDEO_ENCODER_HEVC_USAGE_TRANSCONDING },         0, 0, VE,
> "usage" },
> > +    { "ultralowlatency","", 0, AV_OPT_TYPE_CONST, { .i64 =
> AMF_VIDEO_ENCODER_HEVC_USAGE_ULTRA_LOW_LATENCY },    0, 0, VE,
> "usage" },
> > +    { "lowlatency",     "", 0, AV_OPT_TYPE_CONST, { .i64 =
> AMF_VIDEO_ENCODER_HEVC_USAGE_LOW_LATENCY },          0, 0, VE,
> "usage" },
> > +    { "webcam",         "", 0, AV_OPT_TYPE_CONST, { .i64 =
> AMF_VIDEO_ENCODER_HEVC_USAGE_WEBCAM },               0, 0, VE, "usage" },
> 
> Could some of this be in common with the H.264 encoder?  (Maybe in the
> header?)
> 

I tried to keep H264 and HAVC parameters completely separate. I was asked by codec 
team to do so in AMF API  and did the same here.


> >
> 
> Thanks,
> 
> - Mark
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

I tied to answer all questions. Sorry if I missed something. The rest is clear.
Thanks, Mikhail



More information about the ffmpeg-devel mailing list