[FFmpeg-trac] #7582(undetermined:new): hwaccel cuvid/nvenc performance degredation when using aq (temporal-aq or spatial-aq) with multiple concurrent encodes
FFmpeg
trac at avcodec.org
Sun Dec 2 15:36:45 EET 2018
#7582: hwaccel cuvid/nvenc performance degredation when using aq (temporal-aq or
spatial-aq) with multiple concurrent encodes
-------------------------------------+-------------------------------------
Reporter: malakudi | Type: defect
Status: new | Priority: normal
Component: | Version: git-
undetermined | master
Keywords: | Blocked By:
Blocking: | Reproduced by developer: 0
Analyzed by developer: 0 |
-------------------------------------+-------------------------------------
Running multiple hwaccel cuvid/nvenc sessions that utilise temporal-aq or
spatial-aq AND 3 or more reference frames results in a performance
degradation since following commits:
https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/9b82e333b7c4235a3de7ce8d8fe115c53c11f50c
https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/93d1756af2908150f7c8c0590b9ed246951d474a
Those commits enabled the use of cuMemcpy2DAsync instead of cuMemcpy2D.
With this and aq enabled and 3 or more reference frames, performance seems
to be degraded at around 50% of the nvenc capacity. Maybe it could be a
driver problem but still, makes ffmpeg problematic on multiple realtime
encodes scenario. With -hwaccel nvdec this doesn't happen, but since
-hwaccel nvdec utilises much more VRAM, I cannot run the same amount of
concurrent sessions.
To reproduce, I use as input the following file:
https://download.blender.org/demo/movies/BBB/bbb_sunflower_1080p_30fps_normal.mp4
Running with following bash script:
{{{
#!/bin/bash
for i in `seq 1 16` ;
do
./ffmpeg-git -nostdin -loglevel error -hwaccel cuvid -c:v h264_cuvid -re
-i bbb_sunflower_1080p_30fps_normal.mp4 -vf scale_npp=w=1280:h=720 -c:v
h264_nvenc -preset medium -refs 4 -bf 3 -temporal-aq 1 -acodec copy -f
mpegts -y /dev/null &
done
wait
echo done
}}}
Checking utilization with nvidia-smi you will see very low utilization,
and if you run one more session interactively you will see that it cannot
keep encoding at 30 fps, although the utilization of nvenc is very low. If
you set temporal-aq 0 on same script, you will see much higher
utilization.
Sample output of interactive encoding session while already running 16
sessions and nvidia-smi dmon output:
{{{
./ffmpeg-git -hwaccel cuvid -c:v h264_cuvid -re -i
bbb_sunflower_1080p_30fps_normal.mp4 -vf scale_npp=w=1280:h=720 -c:v
h264_nvenc -preset medium -refs 4 -bf 3 -temporal-aq 1 -acodec copy -f
mpegts -y /dev/null
ffmpeg version N-92462-g529debc987 Copyright (c) 2000-2018 the FFmpeg
developers
built with gcc 8 (Debian 8.2.0-9)
configuration: --enable-runtime-cpudetect --disable-decoder=amrnb
--disable-decoder=libopenjpeg --disable-mips32r2 --disable-mips32r6
--disable-mips64r6 --disable-mipsdsp --disable-mipsdspr2 --disable-mipsfpu
--disable-msa --disable-libopencv --disable-podpages --disable-sndio
--disable-debug --enable-libaom --enable-avfilter --enable-gcrypt
--enable-gnutls --enable-gpl --enable-libass --enable-libbluray --enable-
libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-
libfdk-aac --enable-libfontconfig --enable-libfreetype --enable-libfribidi
--enable-libgme --enable-libgsm --enable-libilbc --enable-libkvazaar
--enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb
--enable-libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-
libopus --enable-libpulse --enable-librubberband --enable-libshine
--enable-libsnappy --enable-libsoxr --enable-libspeex --enable-
libtesseract --enable-libtheora --enable-libvidstab --enable-libvo-
amrwbenc --enable-libvorbis --enable-libvpx --enable-libx265 --enable-
libxvid --enable-libzvbi --enable-libnpp --enable-cuda-sdk --enable-
nonfree --enable-opencl --enable-opengl --enable-postproc --enable-
pthreads --enable-static --disable-shared --enable-version3 --enable-
libwebp --incdir=/usr/include/x86_64-linux-gnu --libdir=/usr/lib/x86_64
-linux-gnu --prefix=/usr --toolchain=hardened --enable-frei0r --enable-
chromaprint --enable-libx264 --enable-libiec61883 --enable-libdc1394
--enable-vaapi --enable-libmfx --disable-altivec
--shlibdir=/usr/lib/x86_64-linux-gnu
libavutil 56. 23.101 / 56. 23.101
libavcodec 58. 39.100 / 58. 39.100
libavformat 58. 22.100 / 58. 22.100
libavdevice 58. 6.100 / 58. 6.100
libavfilter 7. 44.100 / 7. 44.100
libswscale 5. 4.100 / 5. 4.100
libswresample 3. 4.100 / 3. 4.100
libpostproc 55. 4.100 / 55. 4.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
'bbb_sunflower_1080p_30fps_normal.mp4':
Metadata:
major_brand : isom
minor_version : 1
compatible_brands: isomavc1
creation_time : 2013-12-16T17:44:39.000000Z
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
genre : Animation
composer : Sacha Goedegebure
Duration: 00:10:34.53, start: 0.000000, bitrate: 3481 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p,
1920x1080 [SAR 1:1 DAR 16:9], 2998 kb/s, 30 fps, 30 tbr, 30k tbn, 60 tbc
(default)
Metadata:
creation_time : 2013-12-16T17:44:39.000000Z
handler_name : GPAC ISO Video Handler
Stream #0:1(und): Audio: mp3 (mp4a / 0x6134706D), 48000 Hz, stereo,
fltp, 160 kb/s (default)
Metadata:
creation_time : 2013-12-16T17:44:42.000000Z
handler_name : GPAC ISO Audio Handler
Stream #0:2(und): Audio: ac3 (ac-3 / 0x332D6361), 48000 Hz, 5.1(side),
fltp, 320 kb/s (default)
Metadata:
creation_time : 2013-12-16T17:44:42.000000Z
handler_name : GPAC ISO Audio Handler
Side data:
audio service type: main
Stream mapping:
Stream #0:0 -> #0:0 (h264 (h264_cuvid) -> h264 (h264_nvenc))
Stream #0:2 -> #0:1 (copy)
Press [q] to stop, [?] for help
Output #0, mpegts, to '/dev/null':
Metadata:
major_brand : isom
minor_version : 1
compatible_brands: isomavc1
composer : Sacha Goedegebure
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
genre : Animation
encoder : Lavf58.22.100
Stream #0:0(und): Video: h264 (h264_nvenc) (Main), cuda, 1280x720 [SAR
1:1 DAR 16:9], q=-1--1, 2000 kb/s, 30 fps, 90k tbn, 30 tbc (default)
Metadata:
creation_time : 2013-12-16T17:44:39.000000Z
handler_name : GPAC ISO Video Handler
encoder : Lavc58.39.100 h264_nvenc
Side data:
cpb: bitrate max/min/avg: 0/0/2000000 buffer size: 4000000
vbv_delay: -1
Stream #0:1(und): Audio: ac3 (ac-3 / 0x332D6361), 48000 Hz, 5.1(side),
fltp, 320 kb/s (default)
Metadata:
creation_time : 2013-12-16T17:44:42.000000Z
handler_name : GPAC ISO Audio Handler
Side data:
audio service type: main
frame= 1372 fps= 14 q=21.0 Lsize= 14369kB time=00:00:45.73
bitrate=2573.8kbits/s speed=0.483x
video:11382kB audio:1781kB subtitle:0kB other streams:0kB global
headers:0kB muxing overhead: 9.155905%
nvidia-smi dmon
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 56 49 - 11 4 30 37 6800 1560
0 54 49 - 10 4 30 39 6800 1560
0 54 49 - 10 4 32 43 6800 1590
0 55 49 - 10 4 31 40 6800 1515
0 57 49 - 10 4 31 40 6800 1635
}}}
getting just near 15 fps instead of 30.
If you check with ffmpeg-4.0.3 (that doesn't have the above mentioned
commits) you will also see correct utilization even when using temporal-aq
1.
If you use -hwaccel nvdec or don't use -hwaccel at all (software decoding
and scaling) the problem also doesn't happen.
Finally, if you use nvidia-cuda-mps to handle the encodes, the problem
also doesn't show.
Finally, a sample output of running with ffmpeg-4.0.3 interactively while
already running 16 sessions AND nvidia-smi dmon output
{{{
./ffmpeg-4.0.3 -hwaccel cuvid -c:v h264_cuvid -re -i
bbb_sunflower_1080p_30fps_normal.mp4 -vf scale_npp=w=1280:h=720 -c:v
h264_nvenc -preset medium -refs 4 -bf 3 -temporal-aq 1 -acodec copy -f
mpegts -y /dev/null
ffmpeg version 4.0.3 Copyright (c) 2000-2018 the FFmpeg developers
built with gcc 8 (Debian 8.2.0-9)
configuration: --enable-runtime-cpudetect --disable-decoder=amrnb
--disable-decoder=libopenjpeg --disable-mips32r2 --disable-mips32r6
--disable-mips64r6 --disable-mipsdsp --disable-mipsdspr2 --disable-mipsfpu
--disable-msa --disable-libopencv --disable-podpages --disable-sndio
--disable-stripping --enable-libaom --enable-avfilter --enable-gcrypt
--enable-gnutls --enable-gpl --enable-libass --enable-libbluray --enable-
libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-
libfdk-aac --enable-libfontconfig --enable-libfreetype --enable-libfribidi
--enable-libgme --enable-libgsm --enable-libilbc --enable-libkvazaar
--enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb
--enable-libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-
libopus --enable-libpulse --enable-librubberband --enable-libshine
--enable-libsnappy --enable-libsoxr --enable-libspeex --enable-
libtesseract --enable-libtheora --enable-libvidstab --enable-libvo-
amrwbenc --enable-libvorbis --enable-libvpx --enable-libx265 --enable-
libxvid --enable-libzvbi --enable-libnpp --enable-cuda-sdk --enable-
nonfree --enable-opencl --enable-opengl --enable-postproc --enable-
pthreads --enable-static --disable-shared --enable-version3 --enable-
libwebp --incdir=/usr/include/x86_64-linux-gnu --libdir=/usr/lib/x86_64
-linux-gnu --prefix=/usr --toolchain=hardened --enable-frei0r --enable-
chromaprint --enable-libx264 --enable-libiec61883 --enable-libdc1394
--enable-vaapi --enable-libmfx --disable-altivec
--shlibdir=/usr/lib/x86_64-linux-gnu
libavutil 56. 14.100 / 56. 14.100
libavcodec 58. 18.100 / 58. 18.100
libavformat 58. 12.100 / 58. 12.100
libavdevice 58. 3.100 / 58. 3.100
libavfilter 7. 16.100 / 7. 16.100
libswscale 5. 1.100 / 5. 1.100
libswresample 3. 1.100 / 3. 1.100
libpostproc 55. 1.100 / 55. 1.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
'bbb_sunflower_1080p_30fps_normal.mp4':
Metadata:
major_brand : isom
minor_version : 1
compatible_brands: isomavc1
creation_time : 2013-12-16T17:44:39.000000Z
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
genre : Animation
composer : Sacha Goedegebure
Duration: 00:10:34.53, start: 0.000000, bitrate: 3481 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p,
1920x1080 [SAR 1:1 DAR 16:9], 2998 kb/s, 30 fps, 30 tbr, 30k tbn, 60 tbc
(default)
Metadata:
creation_time : 2013-12-16T17:44:39.000000Z
handler_name : GPAC ISO Video Handler
Stream #0:1(und): Audio: mp3 (mp4a / 0x6134706D), 48000 Hz, stereo,
fltp, 160 kb/s (default)
Metadata:
creation_time : 2013-12-16T17:44:42.000000Z
handler_name : GPAC ISO Audio Handler
Stream #0:2(und): Audio: ac3 (ac-3 / 0x332D6361), 48000 Hz, 5.1(side),
fltp, 320 kb/s (default)
Metadata:
creation_time : 2013-12-16T17:44:42.000000Z
handler_name : GPAC ISO Audio Handler
Side data:
audio service type: main
Stream mapping:
Stream #0:0 -> #0:0 (h264 (h264_cuvid) -> h264 (h264_nvenc))
Stream #0:2 -> #0:1 (copy)
Press [q] to stop, [?] for help
Output #0, mpegts, to '/dev/null':
Metadata:
major_brand : isom
minor_version : 1
compatible_brands: isomavc1
composer : Sacha Goedegebure
title : Big Buck Bunny, Sunflower version
artist : Blender Foundation 2008, Janus Bager Kristensen 2013
comment : Creative Commons Attribution 3.0 -
http://bbb3d.renderfarming.net
genre : Animation
encoder : Lavf58.12.100
Stream #0:0(und): Video: h264 (h264_nvenc) (Main), cuda, 1280x720 [SAR
1:1 DAR 16:9], q=-1--1, 2000 kb/s, 30 fps, 90k tbn, 30 tbc (default)
Metadata:
creation_time : 2013-12-16T17:44:39.000000Z
handler_name : GPAC ISO Video Handler
encoder : Lavc58.18.100 h264_nvenc
Side data:
cpb: bitrate max/min/avg: 0/0/2000000 buffer size: 4000000
vbv_delay: -1
Stream #0:1(und): Audio: ac3 (ac-3 / 0x332D6361), 48000 Hz, 5.1(side),
fltp, 320 kb/s (default)
Metadata:
creation_time : 2013-12-16T17:44:42.000000Z
handler_name : GPAC ISO Audio Handler
Side data:
audio service type: main
frame= 1106 fps= 30 q=26.0 Lsize= 11893kB time=00:00:36.92
bitrate=2638.3kbits/s speed=0.999x
video:9454kB audio:1444kB subtitle:0kB other streams:0kB global
headers:0kB muxing overhead: 9.133109%
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 90 55 - 21 9 52 73 6800 1950
0 90 55 - 20 8 53 73 6800 1950
0 91 55 - 20 9 52 78 6800 1950
0 90 55 - 20 8 52 75 6800 1950
0 88 55 - 20 8 51 75 6800 1950
0 87 55 - 21 8 53 71 6800 1950
0 88 55 - 20 8 49 75 6800 1950
0 85 55 - 21 8 53 74 6800 1950
0 87 55 - 20 8 49 74 6800 1950
0 85 55 - 21 9 54 74 6800 1950
0 87 55 - 20 8 49 75 6800 1950
0 84 55 - 21 9 54 70 6800 1950
}}}
--
Ticket URL: <https://trac.ffmpeg.org/ticket/7582>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker
More information about the FFmpeg-trac
mailing list