[FFmpeg-trac] #7706(avcodec:open): 20-30% perf drop in FFmpeg (H264) transcode performance with VAAPI

FFmpeg trac at avcodec.org
Fri Nov 8 17:37:47 EET 2019

#7706: 20-30% perf drop in FFmpeg (H264) transcode performance with VAAPI
             Reporter:  eero-t       |                    Owner:
                 Type:  defect       |                   Status:  open
             Priority:  important    |                Component:  avcodec
              Version:  git-master   |               Resolution:
             Keywords:  vaapi        |               Blocked By:
  regression                         |
             Blocking:               |  Reproduced by developer:  1
Analyzed by developer:  0            |

Comment (by eero-t):

 Replying to [comment:12 fulinjie]:
 > So based on the test results, does this regression affect single process

 Looking at the old results, correct.  Running multiple (5-50) parallel
 transcode processes wasn't affected by the regression, even when they were
 not TDP limited.

 > If that's true, one possible reason is that multiple process encoding
 makes full use of the resource (hardware/encoder maybe), while single
 process seems to keep the encoding procedure idle or waiting for sometime.

 Yes, single 8-bit AVC transcode doesn't fill any of the GPU engines 100%,
 that happens only with multiple parallel transcode operations (it's easy
 to see from IGT "intel_gpu_top" output).

 If GPU is "full" all the time, stuff needs to be queued.  For average
 throughput (= what I'm measuring) it doesn't matter when you go to the
 queue, GPU is anyway fully utilized.  Your patch might help a bit with VA-
 API latency in multiple parallel transcode cases, but I don't measure

 > And it's kind of weird that it benefits HEVC little since the
 modification is in the general vaapi encode code path.

 My HEVC test-case is 10-bit instead of 8, and 4K instead of 2K or smaller.
 Therefore, it's processing >4x more data than my AVC test-cases.  HEVC
 encoding is also heavier.
 => As each frame takes longer, feeding GPU timely is less of a problem for
 keeping average GPU utilization high.

 (I was a bit worried about potential extra CPU usage, because that's away
 from power/temperature "budget" shared with iGPU, but that seems to be low
 enough not to be a problem.)

 > And since the test covers the whole transcoding procedure, how about the
 performance of decode/encode separately?

 I did some decode tests with HEVC (for 2K 10-bit data) with and without
 hwdownload, and as expected, perf of that wasn't impacted.

 (RAW data encoding is less of interest for me as the input data is so
 large that end-to-end perf can be bottlenecked more by disk/network data
 transfer, rather than GPU usage.)

Ticket URL: <https://trac.ffmpeg.org/ticket/7706#comment:13>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker

More information about the FFmpeg-trac mailing list