[FFmpeg-trac] #7690(undetermined:new): FFmpeg QSV decode + VPP performance is just a fraction of what one gets with VA-API and MediaSDK
FFmpeg
trac at avcodec.org
Wed Jan 23 17:35:00 EET 2019
#7690: FFmpeg QSV decode + VPP performance is just a fraction of what one gets
with VA-API and MediaSDK
-------------------------------------+-------------------------------------
Reporter: eero-t | Owner:
Type: defect | Status: new
Priority: normal | Component:
Version: git-master | undetermined
Keywords: qsv | Resolution:
Blocking: | Blocked By:
Analyzed by developer: 0 | Reproduced by developer: 0
-------------------------------------+-------------------------------------
Comment (by eero-t):
More testing with KBL-i7 GT2 (and latest Git versions of everything).
With 8-bit 1920x540 HEVC decode, QSV is clearly faster than VA-API:
{{{
ffmpeg -hwaccel qsv -qsv_device /dev/dri/renderD128 -c:v hevc_qsv -i
1920x540_60_yuv420p_4800.h265 -f null -
...
ffmpeg -hwaccel vaapi -vaapi_device /dev/dri/renderD128 -i
1920x540_60_yuv420p_4800.h265 -y -f null -
}}}
Whereas with (8-bit & 10-bit) 1920x1080 HEVC decode, QSV was clearly
slower than VA-API.
In both cases, also when GPU is forced to run at max speed.
In the 1920x540 case, QSV gets (clearly) slower than VA-API when something
is done to the decoded data. If I do VPP upscale from decoded 1920x540 to
1920x1080, unlike 1920x540 decoding, QSV perf of that also impacted by
kernel power management (has much lower perf when GPU isn't forced to
max).
To summarize findings so far:
* Resolution impacts whether doing (HEVC) decoding is slower with QSV or
VA-API backends
* In larger resolutions, VPP operations with QSV backend are slower than
with VA-API
* Depending on what the pipeline does and at what resolutions, QSV backend
can fool kernel to lower GPU speed so that it's '''much''' slower than
with VA-API (when there's single pipeline running at the same time). IMHO
this is larger of the issues, but it could be related to the VPP issue
> I suppose this an issue of qsv hwdownloading. FFmpeg-vaapi can get an
image directly via vaDeriveImage()
(https://github.com/FFmpeg/FFmpeg/blob/master/libavutil/hwcontext_vaapi.c#L788).
But possibly MSDK is via vaGetImage()
(https://github.com/FFmpeg/FFmpeg/blob/master/libavutil/hwcontext_vaapi.c#L815)
thus making a copy cause performance drop
...
> I will take a deeper look and go back to this issue.
Any conclusions? Something like that might explain operations being
synchronous enough that kernel power management doesn't think use-case to
be GPU bound (enough).
--
Ticket URL: <https://trac.ffmpeg.org/ticket/7690#comment:12>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker
More information about the FFmpeg-trac
mailing list