[FFmpeg-trac] #7690(undetermined:new): FFmpeg QSV decode + VPP performance is just a fraction of what one gets with VA-API and MediaSDK
FFmpeg
trac at avcodec.org
Wed Oct 16 11:18:03 EEST 2019
#7690: FFmpeg QSV decode + VPP performance is just a fraction of what one gets
with VA-API and MediaSDK
-------------------------------------+-------------------------------------
Reporter: eero-t | Owner:
Type: defect | Status: new
Priority: normal | Component:
| undetermined
Version: git-master | Resolution:
Keywords: qsv | Blocked By:
Blocking: | Reproduced by developer: 0
Analyzed by developer: 0 |
-------------------------------------+-------------------------------------
Comment (by fulinjie):
Hi eero-t:
The performance evaluation may be a bit confused and there is a related
discussion in MSDK about this performance issue:
https://github.com/Intel-Media-SDK/MediaSDK/issues/1550
> With 8-bit 1920x540 HEVC decode, QSV is clearly faster than VA-API:
> {{{
> ffmpeg -hwaccel qsv -qsv_device /dev/dri/renderD128 -c:v hevc_qsv -i
1920x540_60_yuv420p_4800.h265 -f null -
> ...
> ffmpeg -hwaccel vaapi -vaapi_device /dev/dri/renderD128 -i
1920x540_60_yuv420p_4800.h265 -y -f null -
> }}}
Above command line is not fair.
For VAAPI, "-f null -" means no copy from video surface to system memory.
For QSV, even if "-f null -" is set, there is memory copy from video
surface to system memory internally in MSDK:
1 ) app initializes MSDK to produce system memory. MSDK internally decodes
to video memory and then internally
2) makes copy from video memory to system memory. It can be done by sw
copy("vaDeriveImage->vaMapBuffer->memcpy->vaUnmapBuffer->vaDestroyImage)",
or GPUCopy. Application
3) gets system memory.
That's the root cause for
>* Resolution impacts whether doing (HEVC) decoding is slower with QSV or
VA-API backends
> * In larger resolutions, VPP operations with QSV backend are slower than
with VA-API
(VAAPI without copy, but QSV with copy)
The performance gap is related with copy video memory to system memory.
For qsv, the best performance may be
1. gpucopy for Tile surface data(like nv12)
http://git.ffmpeg.org/gitweb/ffmpeg.git/commit/5345965b3f088ad5acd5151bec421c97470675a4
2. hwmap=mode=direct(if possible) for Linear surface data, derive data in
the surface and use it directly to avoid any memory copy.
3. hwdownload
You can compare the results of exactly output to /dev/null to evalute the
performance.
--
Ticket URL: <https://trac.ffmpeg.org/ticket/7690#comment:15>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker
More information about the FFmpeg-trac
mailing list