[FFmpeg-trac] #7797(undetermined:new): AVC->MPEG-2 transcoding with VA-API 2-3x slower than with QSV
FFmpeg
trac at avcodec.org
Thu Mar 14 18:41:25 EET 2019
#7797: AVC->MPEG-2 transcoding with VA-API 2-3x slower than with QSV
-------------------------------------+-------------------------------------
Reporter: eero-t | Type: defect
Status: new | Priority: normal
Component: | Version: git-
undetermined | master
Keywords: | Blocked By:
Blocking: | Reproduced by developer: 0
Analyzed by developer: 0 |
-------------------------------------+-------------------------------------
Setup:
- Ubuntu 18.04 with drm-tip 5.x kernel from Git
- iHD media driver, MediaSDK and FFmpeg built from Git
Summary of the bug:
- Bad VA-API performance with transcoding. Doing AVC -> MPEG-2
transcoding with QSV is 2-3x faster than using VA-API
How to reproduce:
{{{
$ export LIBVA_DRIVER_NAME=iHD
$ ffmpeg -hwaccel qsv -qsv_device /dev/dri/renderD128 -c:v h264_qsv -i
720x480p_30.00_4mb_h264_cabac_180s.264 -c:v mpeg2_qsv -b:v 2000K
-compression_level 4 -y output.mpg
ffmpeg version N-93330-g7ff89574c7 Copyright (c) 2000-2019 the FFmpeg
developers
built with gcc 7 (Ubuntu 7.3.0-27ubuntu1~18.04)
...
Input #0, h264, from 'input/720x480p_30.00_4mb_h264_cabac_180s.264':
Duration: N/A, bitrate: N/A
Stream #0:0: Video: h264 (High), 1 reference frame, yuv420p(tv,
smpte170m, progressive, left), 720x480 [SAR 10:11 DAR 15:11], 30 fps, 30
tbr, 1200k tbn, 60 tbc
Stream mapping:
Stream #0:0 -> #0:0 (h264 (native) -> mpeg2video (mpeg2_vaapi))
Press [q] to stop, [?] for help
[h264 @ 0x55a62883c380] Reinit context to 720x480, pix_fmt: vaapi_vld
[graph 0 input from stream 0:0 @ 0x55a628872f40] w:720 h:480
pixfmt:vaapi_vld tb:1/1200000 fr:30/1 sar:10/11 sws_param:flags=2
[mpeg2_vaapi @ 0x55a62883eb80] Input surface format is nv12.
[mpeg2_vaapi @ 0x55a62883eb80] Using VAAPI profile VAProfileMPEG2Main (1).
[mpeg2_vaapi @ 0x55a62883eb80] Using VAAPI entrypoint VAEntrypointEncSlice
(6).
[mpeg2_vaapi @ 0x55a62883eb80] Using VAAPI render target format YUV420
(0x1).
[mpeg2_vaapi @ 0x55a62883eb80] RC mode: VBR.
[mpeg2_vaapi @ 0x55a62883eb80] RC target: 50% of 4000000 bps over 500 ms.
[mpeg2_vaapi @ 0x55a62883eb80] RC buffer: 2000000 bits, initial fullness
1500000 bits.
[mpeg2_vaapi @ 0x55a62883eb80] RC framerate: 30/1 (30.00 fps).
[mpeg2_vaapi @ 0x55a62883eb80] Using intra, P- and B-frames (supported
references: 1 / 1).
[mpeg2_vaapi @ 0x55a62883eb80] Driver does not support some wanted packed
headers (wanted 0x3, found 0x10).
[mpeg2_vaapi @ 0x55a62883eb80] Sample aspect ratio 10:11 is not
representable, signalling square pixels instead.
[mpeg @ 0x55a62883a580] VBV buffer size not set, using default size of
230KB
If you want the mpeg file to be compliant to some specification
Like DVD, VCD or others, make sure you set the correct buffer size
Output #0, mpeg, to 'output/0039_SD03MP2_1.0.mpg':
Metadata:
encoder : Lavf58.26.101
Stream #0:0: Video: mpeg2video (mpeg2_vaapi) (Main), vaapi_vld,
720x480 [SAR 10:11 DAR 15:11], q=-1--1, 2000 kb/s, 30 fps, 90k tbn, 30 tbc
Metadata:
encoder : Lavc58.47.103 mpeg2_vaapi
...
}}}
And QSV:
{{{
$ ffmpeg -hwaccel vaapi -vaapi_device /dev/dri/renderD128
-hwaccel_output_format vaapi -i 720x480p_30.00_4mb_h264_cabac_180s.264
-c:v mpeg2_vaapi -b:v 2000K -compression_level 4 -y output.mpg
...
[AVHWDeviceContext @ 0x565392bd4280] Initialize MFX session: API version
is 1.28, implementation version is 1.28
[AVHWDeviceContext @ 0x565392bd4280] MFX compile/runtime API: 1.28/1.28
[AVHWDeviceContext @ 0x565392bf2f00] VAAPI driver: Intel iHD driver -
1.0.0.
[AVHWDeviceContext @ 0x565392bf2f00] Driver not found in known nonstandard
list, using standard behaviour.
[graph 0 input from stream 0:0 @ 0x565392d785c0] w:720 h:480 pixfmt:qsv
tb:1/1200000 fr:30/1 sar:10/11 sws_param:flags=2
[mpeg2_qsv @ 0x565392bd1f40] Using the variable bitrate (VBR) ratecontrol
method
[AVHWDeviceContext @ 0x565392cfc340] VAAPI driver: Intel iHD driver -
1.0.0.
[AVHWDeviceContext @ 0x565392cfc340] Driver not found in known nonstandard
list, using standard behaviour.
[mpeg2_qsv @ 0x565392bd1f40] profile: main; level: 8
[mpeg2_qsv @ 0x565392bd1f40] GopPicSize: 250; GopRefDist: 4; GopOptFlag:
closed ; IdrInterval: 0
[mpeg2_qsv @ 0x565392bd1f40] TargetUsage: 4; RateControlMethod: VBR
[mpeg2_qsv @ 0x565392bd1f40] BufferSizeInKB: 500; InitialDelayInKB: 500;
TargetKbps: 2000; MaxKbps: 2000; BRCParamMultiplier: 1
[mpeg2_qsv @ 0x565392bd1f40] NumSlice: 30; NumRefFrame: 0
[mpeg2_qsv @ 0x565392bd1f40] RateDistortionOpt: unknown
[mpeg2_qsv @ 0x565392bd1f40] RecoveryPointSEI: unknown IntRefType: 0;
IntRefCycleSize: 0; IntRefQPDelta: 0
[mpeg2_qsv @ 0x565392bd1f40] MaxFrameSize: 0; MaxSliceSize: 0;
[mpeg2_qsv @ 0x565392bd1f40] BitrateLimit: unknown; MBBRC: unknown;
ExtBRC: unknown
[mpeg2_qsv @ 0x565392bd1f40] Trellis: auto
[mpeg2_qsv @ 0x565392bd1f40] VDENC: OFF
[mpeg2_qsv @ 0x565392bd1f40] RepeatPPS: unknown; NumMbPerSlice: 0;
LookAheadDS: unknown
[mpeg2_qsv @ 0x565392bd1f40] AdaptiveI: unknown; AdaptiveB: unknown;
BRefType: auto
[mpeg2_qsv @ 0x565392bd1f40] MinQPI: 0; MaxQPI: 0; MinQPP: 0; MaxQPP: 0;
MinQPB: 0; MaxQPB: 0
[mpeg2_qsv @ 0x565392bd1f40] FrameRateExtD: 1; FrameRateExtN: 30
[mpeg @ 0x565392bd1500] VBV buffer size not set, using default size of
230KB
If you want the mpeg file to be compliant to some specification
Like DVD, VCD or others, make sure you set the correct buffer size
Output #0, mpeg, to 'output/0039_SD03MP2_1.0.mpg':
Metadata:
encoder : Lavc58.47.103 mpeg2_qsv
Side data:
cpb: bitrate max/min/avg: 0/0/2000000 buffer size: 0 vbv_delay: -1
...
}}}
GPU is running at full speed in both cases, so this isn't related to
ticket #7690. It could be related to regression #7706, but I can't test
it because ticket #7650 ("invalid RC mode") was fixed only after that
regression.
When looking at CPU utilization and power usage, QSV utilizes more CPU,
but has also more iowait, and correspondingly, it's using both more CPU
and GPU power than VA-API. Maybe VA-API isn't running asynchronously
enough?
There are also (AVC) transcode single-stream cases where VA-API is slower,
but gap is much smaller, and if one runs multiple processes in parallel,
VA-API is actually slightly faster. In this case, VA-API is slower also
with multiple parallel transcode processes.
I'm seeing similar perf gap on all the Core devices [1] currently
supported by iHD: BDW, SKL, KBL & CFL, both on GT2 & GT3e devices i.e.
issue isn't platform specific.
[1] This test-case doesn't work on the only GEN9+ non-core device I have
(BXT/APL).
Extra info:
* With a larger 1280x720p_29.97_10mb_h264_cabac input, performance gap was
still about same >2x
* When using even larger 1920x1080i_29.97_20mb_mpeg2_high as input, gap
decreased to ~25%, but performance with both APIs had also dropped to a
fraction.
--
Ticket URL: <https://trac.ffmpeg.org/ticket/7797>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker
More information about the FFmpeg-trac
mailing list