[FFmpeg-trac] #7706(avcodec:open): 20-30% perf drop in FFmpeg (H264) transcode performance with VAAPI
FFmpeg
trac at avcodec.org
Fri Nov 8 17:37:47 EET 2019
#7706: 20-30% perf drop in FFmpeg (H264) transcode performance with VAAPI
-------------------------------------+-------------------------------------
Reporter: eero-t | Owner:
Type: defect | Status: open
Priority: important | Component: avcodec
Version: git-master | Resolution:
Keywords: vaapi | Blocked By:
regression |
Blocking: | Reproduced by developer: 1
Analyzed by developer: 0 |
-------------------------------------+-------------------------------------
Comment (by eero-t):
Replying to [comment:12 fulinjie]:
> So based on the test results, does this regression affect single process
only?
Looking at the old results, correct. Running multiple (5-50) parallel
transcode processes wasn't affected by the regression, even when they were
not TDP limited.
> If that's true, one possible reason is that multiple process encoding
makes full use of the resource (hardware/encoder maybe), while single
process seems to keep the encoding procedure idle or waiting for sometime.
Yes, single 8-bit AVC transcode doesn't fill any of the GPU engines 100%,
that happens only with multiple parallel transcode operations (it's easy
to see from IGT "intel_gpu_top" output).
If GPU is "full" all the time, stuff needs to be queued. For average
throughput (= what I'm measuring) it doesn't matter when you go to the
queue, GPU is anyway fully utilized. Your patch might help a bit with VA-
API latency in multiple parallel transcode cases, but I don't measure
that.
> And it's kind of weird that it benefits HEVC little since the
modification is in the general vaapi encode code path.
My HEVC test-case is 10-bit instead of 8, and 4K instead of 2K or smaller.
Therefore, it's processing >4x more data than my AVC test-cases. HEVC
encoding is also heavier.
=> As each frame takes longer, feeding GPU timely is less of a problem for
keeping average GPU utilization high.
(I was a bit worried about potential extra CPU usage, because that's away
from power/temperature "budget" shared with iGPU, but that seems to be low
enough not to be a problem.)
> And since the test covers the whole transcoding procedure, how about the
performance of decode/encode separately?
I did some decode tests with HEVC (for 2K 10-bit data) with and without
hwdownload, and as expected, perf of that wasn't impacted.
(RAW data encoding is less of interest for me as the input data is so
large that end-to-end perf can be bottlenecked more by disk/network data
transfer, rather than GPU usage.)
--
Ticket URL: <https://trac.ffmpeg.org/ticket/7706#comment:13>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker
More information about the FFmpeg-trac
mailing list