[FFmpeg-devel] [PATCH] hwcontext_vaapi: use the special UC copy for downloading, frames.

Mark Thompson sw at jkqxz.net
Wed Apr 12 00:00:16 EEST 2017


On 11/04/17 12:26, Mark Thompson wrote:
> On 11/04/17 08:30, Jun Zhao wrote:
>> From 9bab458006369f427fa2f4c6248ee89329e81067 Mon Sep 17 00:00:00 2001
>> From: Jun Zhao <jun.zhao at intel.com>
>> Date: Tue, 11 Apr 2017 14:37:07 +0800
>> Subject: [PATCH] hwcontext_vaapi: use the special UC copy for downloading
>>  frames.
>>
>> used SSE4 UC function for copying image data from GPU mapped memory,
>> see https://software.intel.com/en-us/articles/copying-accelerated-video-decode-frame-buffers
>>
>> before this change, VA-API HWAccel decoder copy image data from GPU
>> mapped memory used vaCreateImage/vaGetImage/av_frame_copy, now use
>> vaDeriveImage/av_image_copy_uc_from.
>>
>> decoding a 3000 frames 1080p h264 stream in Intel(R) Core(TM)
>> i5-6260U CPU @ 1.80GHz, the CPU usage and decode fps as follow:
>>
>> 1. Software decoder.
>> ./ffmpeg -i ./skyfall2-trailer.mp4 -f null /dev/null
>>
>> CPU: 80%, fps: 334fps
>>
>> 2a. vaCreateImage/vaGetImage/av_frame_copy
>> ./ffmpeg -hwaccel vaapi -vaapi_device /dev/dri/renderD128 -i skyfall2-trailer.mp4 -f null /dev/null
>>
>> CPU: 12%, fps: 147fps
>>
>> 2b. vaDeriveImage/av_image_copy_uc_from
>> ./ffmpeg -hwaccel vaapi -vaapi_device /dev/dri/renderD128 -i skyfall2-trailer.mp4 -f null /dev/null
>>
>> CPU: 23%, fps: 628fps
>>
>> Signed-off-by: Jun Zhao <jun.zhao at intel.com>
>> ---
> 
> This change was considered in libav when the UC copy function was introduced (<https://lists.libav.org/pipermail/libav-devel/2016-August/078826.html>, <https://lists.libav.org/pipermail/libav-devel/2016-August/078825.html>), but was not in the end applied.
> 
> The reasons for this were:
> 
> * It had much worse performance on the low-power cores - try your benchmark above on Braswell.

Running on a Braswell N3700, input is 38072 frames of 1920x1080 H.264.

No download at all:        520fps,   52s CPU
Before patch, 4 threads:   107fps,  237s CPU
Before patch, 1 thread:     90fps,  233s CPU
After patch, 4 threads:     30fps, 1294s CPU
After patch, 1 thread:      28fps, 1305s CPU


- Mark


More information about the ffmpeg-devel mailing list