[FFmpeg-trac] #7582(undetermined:new): hwaccel cuvid/nvenc performance degredation when using aq (temporal-aq or spatial-aq) with multiple concurrent encodes

FFmpeg trac at avcodec.org
Sun Dec 2 15:41:49 EET 2018


#7582: hwaccel cuvid/nvenc performance degredation when using aq (temporal-aq or
spatial-aq) with multiple concurrent encodes
-------------------------------------+-------------------------------------
             Reporter:  malakudi     |                    Owner:
                 Type:  defect       |                   Status:  new
             Priority:  important    |                Component:
              Version:  git-master   |  undetermined
             Keywords:  regresssion  |               Resolution:
  cuda nvenc                         |               Blocked By:
             Blocking:               |  Reproduced by developer:  0
Analyzed by developer:  0            |
-------------------------------------+-------------------------------------
Changes (by cehoyos):

 * keywords:   => regresssion cuda nvenc
 * priority:  normal => important


Old description:

> Running multiple hwaccel cuvid/nvenc sessions that utilise temporal-aq or
> spatial-aq AND 3 or more reference frames results in a performance
> degradation since following commits:
> https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/9b82e333b7c4235a3de7ce8d8fe115c53c11f50c
> https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/93d1756af2908150f7c8c0590b9ed246951d474a
> Those commits enabled the use of cuMemcpy2DAsync instead of cuMemcpy2D.
> With this and aq enabled and 3 or more reference frames, performance
> seems to be degraded at around 50% of the nvenc capacity. Maybe it could
> be a driver problem but still, makes ffmpeg problematic on multiple
> realtime encodes scenario. With -hwaccel nvdec this doesn't happen, but
> since -hwaccel nvdec utilises much more VRAM, I cannot run the same
> amount of concurrent sessions.
>
> To reproduce, I use as input the following file:
> https://download.blender.org/demo/movies/BBB/bbb_sunflower_1080p_30fps_normal.mp4
>
> Running with following bash script:
> {{{
> #!/bin/bash
> for i in `seq 1 16` ;
> do
> ./ffmpeg-git -nostdin -loglevel error -hwaccel cuvid -c:v h264_cuvid -re
> -i bbb_sunflower_1080p_30fps_normal.mp4 -vf scale_npp=w=1280:h=720 -c:v
> h264_nvenc -preset medium -refs 4 -bf 3 -temporal-aq 1 -acodec copy -f
> mpegts -y /dev/null &
> done
> wait
> echo done
>
> }}}
> Checking utilization with nvidia-smi you will see very low utilization,
> and if you run one more session interactively you will see that it cannot
> keep encoding at 30 fps, although the utilization of nvenc is very low.
> If you set temporal-aq 0 on same script, you will see much higher
> utilization.
>
> Sample output of interactive encoding session while already running 16
> sessions and nvidia-smi dmon output:
> {{{
> ./ffmpeg-git -hwaccel cuvid -c:v h264_cuvid -re -i
> bbb_sunflower_1080p_30fps_normal.mp4 -vf scale_npp=w=1280:h=720 -c:v
> h264_nvenc -preset medium -refs 4 -bf 3 -temporal-aq 1 -acodec copy -f
> mpegts -y /dev/null
> ffmpeg version N-92462-g529debc987 Copyright (c) 2000-2018 the FFmpeg
> developers
>   built with gcc 8 (Debian 8.2.0-9)
>   configuration: --enable-runtime-cpudetect --disable-decoder=amrnb
> --disable-decoder=libopenjpeg --disable-mips32r2 --disable-mips32r6
> --disable-mips64r6 --disable-mipsdsp --disable-mipsdspr2 --disable-
> mipsfpu --disable-msa --disable-libopencv --disable-podpages --disable-
> sndio --disable-debug --enable-libaom --enable-avfilter --enable-gcrypt
> --enable-gnutls --enable-gpl --enable-libass --enable-libbluray --enable-
> libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-
> libfdk-aac --enable-libfontconfig --enable-libfreetype --enable-
> libfribidi --enable-libgme --enable-libgsm --enable-libilbc --enable-
> libkvazaar --enable-libmp3lame --enable-libopencore-amrnb --enable-
> libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-
> libopenmpt --enable-libopus --enable-libpulse --enable-librubberband
> --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex
> --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-
> libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libx265
> --enable-libxvid --enable-libzvbi --enable-libnpp --enable-cuda-sdk
> --enable-nonfree --enable-opencl --enable-opengl --enable-postproc
> --enable-pthreads --enable-static --disable-shared --enable-version3
> --enable-libwebp --incdir=/usr/include/x86_64-linux-gnu
> --libdir=/usr/lib/x86_64-linux-gnu --prefix=/usr --toolchain=hardened
> --enable-frei0r --enable-chromaprint --enable-libx264 --enable-
> libiec61883 --enable-libdc1394 --enable-vaapi --enable-libmfx --disable-
> altivec --shlibdir=/usr/lib/x86_64-linux-gnu
>   libavutil      56. 23.101 / 56. 23.101
>   libavcodec     58. 39.100 / 58. 39.100
>   libavformat    58. 22.100 / 58. 22.100
>   libavdevice    58.  6.100 / 58.  6.100
>   libavfilter     7. 44.100 /  7. 44.100
>   libswscale      5.  4.100 /  5.  4.100
>   libswresample   3.  4.100 /  3.  4.100
>   libpostproc    55.  4.100 / 55.  4.100
> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
> 'bbb_sunflower_1080p_30fps_normal.mp4':
>   Metadata:
>     major_brand     : isom
>     minor_version   : 1
>     compatible_brands: isomavc1
>     creation_time   : 2013-12-16T17:44:39.000000Z
>     title           : Big Buck Bunny, Sunflower version
>     artist          : Blender Foundation 2008, Janus Bager Kristensen
> 2013
>     comment         : Creative Commons Attribution 3.0 -
> http://bbb3d.renderfarming.net
>     genre           : Animation
>     composer        : Sacha Goedegebure
>   Duration: 00:10:34.53, start: 0.000000, bitrate: 3481 kb/s
>     Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p,
> 1920x1080 [SAR 1:1 DAR 16:9], 2998 kb/s, 30 fps, 30 tbr, 30k tbn, 60 tbc
> (default)
>     Metadata:
>       creation_time   : 2013-12-16T17:44:39.000000Z
>       handler_name    : GPAC ISO Video Handler
>     Stream #0:1(und): Audio: mp3 (mp4a / 0x6134706D), 48000 Hz, stereo,
> fltp, 160 kb/s (default)
>     Metadata:
>       creation_time   : 2013-12-16T17:44:42.000000Z
>       handler_name    : GPAC ISO Audio Handler
>     Stream #0:2(und): Audio: ac3 (ac-3 / 0x332D6361), 48000 Hz,
> 5.1(side), fltp, 320 kb/s (default)
>     Metadata:
>       creation_time   : 2013-12-16T17:44:42.000000Z
>       handler_name    : GPAC ISO Audio Handler
>     Side data:
>       audio service type: main
> Stream mapping:
>   Stream #0:0 -> #0:0 (h264 (h264_cuvid) -> h264 (h264_nvenc))
>   Stream #0:2 -> #0:1 (copy)
> Press [q] to stop, [?] for help
> Output #0, mpegts, to '/dev/null':
>   Metadata:
>     major_brand     : isom
>     minor_version   : 1
>     compatible_brands: isomavc1
>     composer        : Sacha Goedegebure
>     title           : Big Buck Bunny, Sunflower version
>     artist          : Blender Foundation 2008, Janus Bager Kristensen
> 2013
>     comment         : Creative Commons Attribution 3.0 -
> http://bbb3d.renderfarming.net
>     genre           : Animation
>     encoder         : Lavf58.22.100
>     Stream #0:0(und): Video: h264 (h264_nvenc) (Main), cuda, 1280x720
> [SAR 1:1 DAR 16:9], q=-1--1, 2000 kb/s, 30 fps, 90k tbn, 30 tbc (default)
>     Metadata:
>       creation_time   : 2013-12-16T17:44:39.000000Z
>       handler_name    : GPAC ISO Video Handler
>       encoder         : Lavc58.39.100 h264_nvenc
>     Side data:
>       cpb: bitrate max/min/avg: 0/0/2000000 buffer size: 4000000
> vbv_delay: -1
>     Stream #0:1(und): Audio: ac3 (ac-3 / 0x332D6361), 48000 Hz,
> 5.1(side), fltp, 320 kb/s (default)
>     Metadata:
>       creation_time   : 2013-12-16T17:44:42.000000Z
>       handler_name    : GPAC ISO Audio Handler
>     Side data:
>       audio service type: main
> frame= 1372 fps= 14 q=21.0 Lsize=   14369kB time=00:00:45.73
> bitrate=2573.8kbits/s speed=0.483x
> video:11382kB audio:1781kB subtitle:0kB other streams:0kB global
> headers:0kB muxing overhead: 9.155905%
>
> nvidia-smi dmon
> # gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
> # Idx     W     C     C     %     %     %     %   MHz   MHz
>     0    56    49     -    11     4    30    37  6800  1560
>     0    54    49     -    10     4    30    39  6800  1560
>     0    54    49     -    10     4    32    43  6800  1590
>     0    55    49     -    10     4    31    40  6800  1515
>     0    57    49     -    10     4    31    40  6800  1635
> }}}
> getting just near 15 fps instead of 30.
>
> If you check with ffmpeg-4.0.3 (that doesn't have the above mentioned
> commits) you will also see correct utilization even when using temporal-
> aq 1.
> If you use -hwaccel nvdec or don't use -hwaccel at all (software decoding
> and scaling) the problem also doesn't happen.
> Finally, if you use nvidia-cuda-mps to handle the encodes, the problem
> also doesn't show.
>
> Finally, a sample output of running with ffmpeg-4.0.3 interactively while
> already running 16 sessions AND nvidia-smi dmon output
>
> {{{
> ./ffmpeg-4.0.3 -hwaccel cuvid -c:v h264_cuvid -re -i
> bbb_sunflower_1080p_30fps_normal.mp4 -vf scale_npp=w=1280:h=720 -c:v
> h264_nvenc -preset medium -refs 4 -bf 3 -temporal-aq 1 -acodec copy -f
> mpegts -y /dev/null
> ffmpeg version 4.0.3 Copyright (c) 2000-2018 the FFmpeg developers
>   built with gcc 8 (Debian 8.2.0-9)
>   configuration: --enable-runtime-cpudetect --disable-decoder=amrnb
> --disable-decoder=libopenjpeg --disable-mips32r2 --disable-mips32r6
> --disable-mips64r6 --disable-mipsdsp --disable-mipsdspr2 --disable-
> mipsfpu --disable-msa --disable-libopencv --disable-podpages --disable-
> sndio --disable-stripping --enable-libaom --enable-avfilter --enable-
> gcrypt --enable-gnutls --enable-gpl --enable-libass --enable-libbluray
> --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2
> --enable-libfdk-aac --enable-libfontconfig --enable-libfreetype --enable-
> libfribidi --enable-libgme --enable-libgsm --enable-libilbc --enable-
> libkvazaar --enable-libmp3lame --enable-libopencore-amrnb --enable-
> libopencore-amrwb --enable-libopenh264 --enable-libopenjpeg --enable-
> libopenmpt --enable-libopus --enable-libpulse --enable-librubberband
> --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex
> --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-
> libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libx265
> --enable-libxvid --enable-libzvbi --enable-libnpp --enable-cuda-sdk
> --enable-nonfree --enable-opencl --enable-opengl --enable-postproc
> --enable-pthreads --enable-static --disable-shared --enable-version3
> --enable-libwebp --incdir=/usr/include/x86_64-linux-gnu
> --libdir=/usr/lib/x86_64-linux-gnu --prefix=/usr --toolchain=hardened
> --enable-frei0r --enable-chromaprint --enable-libx264 --enable-
> libiec61883 --enable-libdc1394 --enable-vaapi --enable-libmfx --disable-
> altivec --shlibdir=/usr/lib/x86_64-linux-gnu
>   libavutil      56. 14.100 / 56. 14.100
>   libavcodec     58. 18.100 / 58. 18.100
>   libavformat    58. 12.100 / 58. 12.100
>   libavdevice    58.  3.100 / 58.  3.100
>   libavfilter     7. 16.100 /  7. 16.100
>   libswscale      5.  1.100 /  5.  1.100
>   libswresample   3.  1.100 /  3.  1.100
>   libpostproc    55.  1.100 / 55.  1.100
> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
> 'bbb_sunflower_1080p_30fps_normal.mp4':
>   Metadata:
>     major_brand     : isom
>     minor_version   : 1
>     compatible_brands: isomavc1
>     creation_time   : 2013-12-16T17:44:39.000000Z
>     title           : Big Buck Bunny, Sunflower version
>     artist          : Blender Foundation 2008, Janus Bager Kristensen
> 2013
>     comment         : Creative Commons Attribution 3.0 -
> http://bbb3d.renderfarming.net
>     genre           : Animation
>     composer        : Sacha Goedegebure
>   Duration: 00:10:34.53, start: 0.000000, bitrate: 3481 kb/s
>     Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p,
> 1920x1080 [SAR 1:1 DAR 16:9], 2998 kb/s, 30 fps, 30 tbr, 30k tbn, 60 tbc
> (default)
>     Metadata:
>       creation_time   : 2013-12-16T17:44:39.000000Z
>       handler_name    : GPAC ISO Video Handler
>     Stream #0:1(und): Audio: mp3 (mp4a / 0x6134706D), 48000 Hz, stereo,
> fltp, 160 kb/s (default)
>     Metadata:
>       creation_time   : 2013-12-16T17:44:42.000000Z
>       handler_name    : GPAC ISO Audio Handler
>     Stream #0:2(und): Audio: ac3 (ac-3 / 0x332D6361), 48000 Hz,
> 5.1(side), fltp, 320 kb/s (default)
>     Metadata:
>       creation_time   : 2013-12-16T17:44:42.000000Z
>       handler_name    : GPAC ISO Audio Handler
>     Side data:
>       audio service type: main
> Stream mapping:
>   Stream #0:0 -> #0:0 (h264 (h264_cuvid) -> h264 (h264_nvenc))
>   Stream #0:2 -> #0:1 (copy)
> Press [q] to stop, [?] for help
> Output #0, mpegts, to '/dev/null':
>   Metadata:
>     major_brand     : isom
>     minor_version   : 1
>     compatible_brands: isomavc1
>     composer        : Sacha Goedegebure
>     title           : Big Buck Bunny, Sunflower version
>     artist          : Blender Foundation 2008, Janus Bager Kristensen
> 2013
>     comment         : Creative Commons Attribution 3.0 -
> http://bbb3d.renderfarming.net
>     genre           : Animation
>     encoder         : Lavf58.12.100
>     Stream #0:0(und): Video: h264 (h264_nvenc) (Main), cuda, 1280x720
> [SAR 1:1 DAR 16:9], q=-1--1, 2000 kb/s, 30 fps, 90k tbn, 30 tbc (default)
>     Metadata:
>       creation_time   : 2013-12-16T17:44:39.000000Z
>       handler_name    : GPAC ISO Video Handler
>       encoder         : Lavc58.18.100 h264_nvenc
>     Side data:
>       cpb: bitrate max/min/avg: 0/0/2000000 buffer size: 4000000
> vbv_delay: -1
>     Stream #0:1(und): Audio: ac3 (ac-3 / 0x332D6361), 48000 Hz,
> 5.1(side), fltp, 320 kb/s (default)
>     Metadata:
>       creation_time   : 2013-12-16T17:44:42.000000Z
>       handler_name    : GPAC ISO Audio Handler
>     Side data:
>       audio service type: main
> frame= 1106 fps= 30 q=26.0 Lsize=   11893kB time=00:00:36.92
> bitrate=2638.3kbits/s speed=0.999x
> video:9454kB audio:1444kB subtitle:0kB other streams:0kB global
> headers:0kB muxing overhead: 9.133109%
>
> # gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
> # Idx     W     C     C     %     %     %     %   MHz   MHz
>     0    90    55     -    21     9    52    73  6800  1950
>     0    90    55     -    20     8    53    73  6800  1950
>     0    91    55     -    20     9    52    78  6800  1950
>     0    90    55     -    20     8    52    75  6800  1950
>     0    88    55     -    20     8    51    75  6800  1950
>     0    87    55     -    21     8    53    71  6800  1950
>     0    88    55     -    20     8    49    75  6800  1950
>     0    85    55     -    21     8    53    74  6800  1950
>     0    87    55     -    20     8    49    74  6800  1950
>     0    85    55     -    21     9    54    74  6800  1950
>     0    87    55     -    20     8    49    75  6800  1950
>     0    84    55     -    21     9    54    70  6800  1950
> }}}

New description:

 Running multiple hwaccel cuvid/nvenc sessions that utilise temporal-aq or
 spatial-aq AND 3 or more reference frames results in a performance
 degradation since following commits:
 9b82e333b7c4235a3de7ce8d8fe115c53c11f50c
 93d1756af2908150f7c8c0590b9ed246951d474a
 Those commits enabled the use of cuMemcpy2DAsync instead of cuMemcpy2D.
 With this and aq enabled and 3 or more reference frames, performance seems
 to be degraded at around 50% of the nvenc capacity. Maybe it could be a
 driver problem but still, makes ffmpeg problematic on multiple realtime
 encodes scenario. With -hwaccel nvdec this doesn't happen, but since
 -hwaccel nvdec utilises much more VRAM, I cannot run the same amount of
 concurrent sessions.

 To reproduce, I use as input the following file:
 https://download.blender.org/demo/movies/BBB/bbb_sunflower_1080p_30fps_normal.mp4

 Running with following bash script:
 {{{
 #!/bin/bash
 for i in `seq 1 16` ;
 do
 ./ffmpeg-git -nostdin -loglevel error -hwaccel cuvid -c:v h264_cuvid -re
 -i bbb_sunflower_1080p_30fps_normal.mp4 -vf scale_npp=w=1280:h=720 -c:v
 h264_nvenc -preset medium -refs 4 -bf 3 -temporal-aq 1 -acodec copy -f
 mpegts -y /dev/null &
 done
 wait
 echo done

 }}}
 Checking utilization with nvidia-smi you will see very low utilization,
 and if you run one more session interactively you will see that it cannot
 keep encoding at 30 fps, although the utilization of nvenc is very low. If
 you set temporal-aq 0 on same script, you will see much higher
 utilization.

 Sample output of interactive encoding session while already running 16
 sessions and nvidia-smi dmon output:
 {{{
 ./ffmpeg-git -hwaccel cuvid -c:v h264_cuvid -re -i
 bbb_sunflower_1080p_30fps_normal.mp4 -vf scale_npp=w=1280:h=720 -c:v
 h264_nvenc -preset medium -refs 4 -bf 3 -temporal-aq 1 -acodec copy -f
 mpegts -y /dev/null
 ffmpeg version N-92462-g529debc987 Copyright (c) 2000-2018 the FFmpeg
 developers
   built with gcc 8 (Debian 8.2.0-9)
   configuration: --enable-runtime-cpudetect --disable-decoder=amrnb
 --disable-decoder=libopenjpeg --disable-mips32r2 --disable-mips32r6
 --disable-mips64r6 --disable-mipsdsp --disable-mipsdspr2 --disable-mipsfpu
 --disable-msa --disable-libopencv --disable-podpages --disable-sndio
 --disable-debug --enable-libaom --enable-avfilter --enable-gcrypt
 --enable-gnutls --enable-gpl --enable-libass --enable-libbluray --enable-
 libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-
 libfdk-aac --enable-libfontconfig --enable-libfreetype --enable-libfribidi
 --enable-libgme --enable-libgsm --enable-libilbc --enable-libkvazaar
 --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb
 --enable-libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-
 libopus --enable-libpulse --enable-librubberband --enable-libshine
 --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-
 libtesseract --enable-libtheora --enable-libvidstab --enable-libvo-
 amrwbenc --enable-libvorbis --enable-libvpx --enable-libx265 --enable-
 libxvid --enable-libzvbi --enable-libnpp --enable-cuda-sdk --enable-
 nonfree --enable-opencl --enable-opengl --enable-postproc --enable-
 pthreads --enable-static --disable-shared --enable-version3 --enable-
 libwebp --incdir=/usr/include/x86_64-linux-gnu --libdir=/usr/lib/x86_64
 -linux-gnu --prefix=/usr --toolchain=hardened --enable-frei0r --enable-
 chromaprint --enable-libx264 --enable-libiec61883 --enable-libdc1394
 --enable-vaapi --enable-libmfx --disable-altivec
 --shlibdir=/usr/lib/x86_64-linux-gnu
   libavutil      56. 23.101 / 56. 23.101
   libavcodec     58. 39.100 / 58. 39.100
   libavformat    58. 22.100 / 58. 22.100
   libavdevice    58.  6.100 / 58.  6.100
   libavfilter     7. 44.100 /  7. 44.100
   libswscale      5.  4.100 /  5.  4.100
   libswresample   3.  4.100 /  3.  4.100
   libpostproc    55.  4.100 / 55.  4.100
 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
 'bbb_sunflower_1080p_30fps_normal.mp4':
   Metadata:
     major_brand     : isom
     minor_version   : 1
     compatible_brands: isomavc1
     creation_time   : 2013-12-16T17:44:39.000000Z
     title           : Big Buck Bunny, Sunflower version
     artist          : Blender Foundation 2008, Janus Bager Kristensen 2013
     comment         : Creative Commons Attribution 3.0 -
 http://bbb3d.renderfarming.net
     genre           : Animation
     composer        : Sacha Goedegebure
   Duration: 00:10:34.53, start: 0.000000, bitrate: 3481 kb/s
     Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p,
 1920x1080 [SAR 1:1 DAR 16:9], 2998 kb/s, 30 fps, 30 tbr, 30k tbn, 60 tbc
 (default)
     Metadata:
       creation_time   : 2013-12-16T17:44:39.000000Z
       handler_name    : GPAC ISO Video Handler
     Stream #0:1(und): Audio: mp3 (mp4a / 0x6134706D), 48000 Hz, stereo,
 fltp, 160 kb/s (default)
     Metadata:
       creation_time   : 2013-12-16T17:44:42.000000Z
       handler_name    : GPAC ISO Audio Handler
     Stream #0:2(und): Audio: ac3 (ac-3 / 0x332D6361), 48000 Hz, 5.1(side),
 fltp, 320 kb/s (default)
     Metadata:
       creation_time   : 2013-12-16T17:44:42.000000Z
       handler_name    : GPAC ISO Audio Handler
     Side data:
       audio service type: main
 Stream mapping:
   Stream #0:0 -> #0:0 (h264 (h264_cuvid) -> h264 (h264_nvenc))
   Stream #0:2 -> #0:1 (copy)
 Press [q] to stop, [?] for help
 Output #0, mpegts, to '/dev/null':
   Metadata:
     major_brand     : isom
     minor_version   : 1
     compatible_brands: isomavc1
     composer        : Sacha Goedegebure
     title           : Big Buck Bunny, Sunflower version
     artist          : Blender Foundation 2008, Janus Bager Kristensen 2013
     comment         : Creative Commons Attribution 3.0 -
 http://bbb3d.renderfarming.net
     genre           : Animation
     encoder         : Lavf58.22.100
     Stream #0:0(und): Video: h264 (h264_nvenc) (Main), cuda, 1280x720 [SAR
 1:1 DAR 16:9], q=-1--1, 2000 kb/s, 30 fps, 90k tbn, 30 tbc (default)
     Metadata:
       creation_time   : 2013-12-16T17:44:39.000000Z
       handler_name    : GPAC ISO Video Handler
       encoder         : Lavc58.39.100 h264_nvenc
     Side data:
       cpb: bitrate max/min/avg: 0/0/2000000 buffer size: 4000000
 vbv_delay: -1
     Stream #0:1(und): Audio: ac3 (ac-3 / 0x332D6361), 48000 Hz, 5.1(side),
 fltp, 320 kb/s (default)
     Metadata:
       creation_time   : 2013-12-16T17:44:42.000000Z
       handler_name    : GPAC ISO Audio Handler
     Side data:
       audio service type: main
 frame= 1372 fps= 14 q=21.0 Lsize=   14369kB time=00:00:45.73
 bitrate=2573.8kbits/s speed=0.483x
 video:11382kB audio:1781kB subtitle:0kB other streams:0kB global
 headers:0kB muxing overhead: 9.155905%

 nvidia-smi dmon
 # gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
 # Idx     W     C     C     %     %     %     %   MHz   MHz
     0    56    49     -    11     4    30    37  6800  1560
     0    54    49     -    10     4    30    39  6800  1560
     0    54    49     -    10     4    32    43  6800  1590
     0    55    49     -    10     4    31    40  6800  1515
     0    57    49     -    10     4    31    40  6800  1635
 }}}
 getting just near 15 fps instead of 30.

 If you check with ffmpeg-4.0.3 (that doesn't have the above mentioned
 commits) you will also see correct utilization even when using temporal-aq
 1.
 If you use -hwaccel nvdec or don't use -hwaccel at all (software decoding
 and scaling) the problem also doesn't happen.
 Finally, if you use nvidia-cuda-mps to handle the encodes, the problem
 also doesn't show.

 Finally, a sample output of running with ffmpeg-4.0.3 interactively while
 already running 16 sessions AND nvidia-smi dmon output

 {{{
 ./ffmpeg-4.0.3 -hwaccel cuvid -c:v h264_cuvid -re -i
 bbb_sunflower_1080p_30fps_normal.mp4 -vf scale_npp=w=1280:h=720 -c:v
 h264_nvenc -preset medium -refs 4 -bf 3 -temporal-aq 1 -acodec copy -f
 mpegts -y /dev/null
 ffmpeg version 4.0.3 Copyright (c) 2000-2018 the FFmpeg developers
   built with gcc 8 (Debian 8.2.0-9)
   configuration: --enable-runtime-cpudetect --disable-decoder=amrnb
 --disable-decoder=libopenjpeg --disable-mips32r2 --disable-mips32r6
 --disable-mips64r6 --disable-mipsdsp --disable-mipsdspr2 --disable-mipsfpu
 --disable-msa --disable-libopencv --disable-podpages --disable-sndio
 --disable-stripping --enable-libaom --enable-avfilter --enable-gcrypt
 --enable-gnutls --enable-gpl --enable-libass --enable-libbluray --enable-
 libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-
 libfdk-aac --enable-libfontconfig --enable-libfreetype --enable-libfribidi
 --enable-libgme --enable-libgsm --enable-libilbc --enable-libkvazaar
 --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb
 --enable-libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-
 libopus --enable-libpulse --enable-librubberband --enable-libshine
 --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-
 libtesseract --enable-libtheora --enable-libvidstab --enable-libvo-
 amrwbenc --enable-libvorbis --enable-libvpx --enable-libx265 --enable-
 libxvid --enable-libzvbi --enable-libnpp --enable-cuda-sdk --enable-
 nonfree --enable-opencl --enable-opengl --enable-postproc --enable-
 pthreads --enable-static --disable-shared --enable-version3 --enable-
 libwebp --incdir=/usr/include/x86_64-linux-gnu --libdir=/usr/lib/x86_64
 -linux-gnu --prefix=/usr --toolchain=hardened --enable-frei0r --enable-
 chromaprint --enable-libx264 --enable-libiec61883 --enable-libdc1394
 --enable-vaapi --enable-libmfx --disable-altivec
 --shlibdir=/usr/lib/x86_64-linux-gnu
   libavutil      56. 14.100 / 56. 14.100
   libavcodec     58. 18.100 / 58. 18.100
   libavformat    58. 12.100 / 58. 12.100
   libavdevice    58.  3.100 / 58.  3.100
   libavfilter     7. 16.100 /  7. 16.100
   libswscale      5.  1.100 /  5.  1.100
   libswresample   3.  1.100 /  3.  1.100
   libpostproc    55.  1.100 / 55.  1.100
 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from
 'bbb_sunflower_1080p_30fps_normal.mp4':
   Metadata:
     major_brand     : isom
     minor_version   : 1
     compatible_brands: isomavc1
     creation_time   : 2013-12-16T17:44:39.000000Z
     title           : Big Buck Bunny, Sunflower version
     artist          : Blender Foundation 2008, Janus Bager Kristensen 2013
     comment         : Creative Commons Attribution 3.0 -
 http://bbb3d.renderfarming.net
     genre           : Animation
     composer        : Sacha Goedegebure
   Duration: 00:10:34.53, start: 0.000000, bitrate: 3481 kb/s
     Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p,
 1920x1080 [SAR 1:1 DAR 16:9], 2998 kb/s, 30 fps, 30 tbr, 30k tbn, 60 tbc
 (default)
     Metadata:
       creation_time   : 2013-12-16T17:44:39.000000Z
       handler_name    : GPAC ISO Video Handler
     Stream #0:1(und): Audio: mp3 (mp4a / 0x6134706D), 48000 Hz, stereo,
 fltp, 160 kb/s (default)
     Metadata:
       creation_time   : 2013-12-16T17:44:42.000000Z
       handler_name    : GPAC ISO Audio Handler
     Stream #0:2(und): Audio: ac3 (ac-3 / 0x332D6361), 48000 Hz, 5.1(side),
 fltp, 320 kb/s (default)
     Metadata:
       creation_time   : 2013-12-16T17:44:42.000000Z
       handler_name    : GPAC ISO Audio Handler
     Side data:
       audio service type: main
 Stream mapping:
   Stream #0:0 -> #0:0 (h264 (h264_cuvid) -> h264 (h264_nvenc))
   Stream #0:2 -> #0:1 (copy)
 Press [q] to stop, [?] for help
 Output #0, mpegts, to '/dev/null':
   Metadata:
     major_brand     : isom
     minor_version   : 1
     compatible_brands: isomavc1
     composer        : Sacha Goedegebure
     title           : Big Buck Bunny, Sunflower version
     artist          : Blender Foundation 2008, Janus Bager Kristensen 2013
     comment         : Creative Commons Attribution 3.0 -
 http://bbb3d.renderfarming.net
     genre           : Animation
     encoder         : Lavf58.12.100
     Stream #0:0(und): Video: h264 (h264_nvenc) (Main), cuda, 1280x720 [SAR
 1:1 DAR 16:9], q=-1--1, 2000 kb/s, 30 fps, 90k tbn, 30 tbc (default)
     Metadata:
       creation_time   : 2013-12-16T17:44:39.000000Z
       handler_name    : GPAC ISO Video Handler
       encoder         : Lavc58.18.100 h264_nvenc
     Side data:
       cpb: bitrate max/min/avg: 0/0/2000000 buffer size: 4000000
 vbv_delay: -1
     Stream #0:1(und): Audio: ac3 (ac-3 / 0x332D6361), 48000 Hz, 5.1(side),
 fltp, 320 kb/s (default)
     Metadata:
       creation_time   : 2013-12-16T17:44:42.000000Z
       handler_name    : GPAC ISO Audio Handler
     Side data:
       audio service type: main
 frame= 1106 fps= 30 q=26.0 Lsize=   11893kB time=00:00:36.92
 bitrate=2638.3kbits/s speed=0.999x
 video:9454kB audio:1444kB subtitle:0kB other streams:0kB global
 headers:0kB muxing overhead: 9.133109%

 # gpu   pwr gtemp mtemp    sm   mem   enc   dec  mclk  pclk
 # Idx     W     C     C     %     %     %     %   MHz   MHz
     0    90    55     -    21     9    52    73  6800  1950
     0    90    55     -    20     8    53    73  6800  1950
     0    91    55     -    20     9    52    78  6800  1950
     0    90    55     -    20     8    52    75  6800  1950
     0    88    55     -    20     8    51    75  6800  1950
     0    87    55     -    21     8    53    71  6800  1950
     0    88    55     -    20     8    49    75  6800  1950
     0    85    55     -    21     8    53    74  6800  1950
     0    87    55     -    20     8    49    74  6800  1950
     0    85    55     -    21     9    54    74  6800  1950
     0    87    55     -    20     8    49    75  6800  1950
     0    84    55     -    21     9    54    70  6800  1950
 }}}

--

--
Ticket URL: <https://trac.ffmpeg.org/ticket/7582#comment:1>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker


More information about the FFmpeg-trac mailing list