[FFmpeg-user] ffmpeg GPU selection issue

Dennis Mungai dmngaie at gmail.com
Thu Sep 5 01:25:16 EEST 2019


On Wed, 4 Sep 2019 at 18:28, Dennis Mungai <dmngaie at gmail.com> wrote:
>
> On Wed, 4 Sep 2019 at 10:39, Matthew Reus <matthew.reus01 at gmail.com> wrote:
> >
> > Hello
> > I have ubuntu 18.04 server where i have install ffmpeg and compile sdk as
> > well as all requirement of NVIDIA tesla M 60 driver .
> >
> >
> > *1.Issue is whenever i define gpu , both gpu 1 and gpu2 take the process *
> >
> > *2.ffmepg mostof the time shows frame drops and video buffer *
> >
> > *Here is have attached all the output and script *
> >
> > ffmpeg version N-94423-ga0c1970 Copyright (c) 2000-2019 the FFmpeg
> > developers
> >   built with gcc 7 (Ubuntu 7.4.0-1ubuntu1~18.04.1)
> >   configuration: --prefix=/root/ffmpeg_build --pkg-config-flags=--static
> > --extra-cflags=-I/root/ffmpeg_build/include
> > --extra-ldflags=-L/root/ffmpeg_build/lib --extra-libs='-lpthread -lm'
> > --bindir=/root/bin --enable-cuda --enable-cuvid --enable-libnpp
> > --extra-cflags=-I../nv_sdk --extra-ldflags=-L../nv_sdk --enable-cuda-nvcc
> > --enable-nvenc --extra-cflags=-I/usr/local/cuda/include/
> > --extra-ldflags=-L/usr/local/cuda/lib64/ --enable-gpl --enable-libaom
> > --enable-libass --enable-libfdk-aac --enable-vaapi --enable-libfreetype
> > --enable-libmp3lame --enable-libopus --enable-libvorbis --enable-libvpx
> > --enable-libx264 --enable-libx265 --enable-nonfree
> >   libavutil      56. 32.100 / 56. 32.100
> >   libavcodec     58. 55.100 / 58. 55.100
> >   libavformat    58. 30.100 / 58. 30.100
> >   libavdevice    58.  9.100 / 58.  9.100
> >   libavfilter     7. 58.100 /  7. 58.100
> >   libswscale      5.  6.100 /  5.  6.100
> >   libswresample   3.  6.100 /  3.  6.100
> >   libpostproc    55.  6.100 / 55.  6.100
> > Hyper fast Audio and Video encoder
> > usage: ffmpeg [options] [[infile options] -i infile]... {[outfile options]
> > outfile}...
> >
> >
> > *My test script is *
> > *ffmpeg -hwaccel_device 1 -hwaccel auto  -i
> > 'udp://@224.2.2.21:5008?fifo_size=1000000\&overrun_nonfatal
> > <http://224.2.2.21:5008?fifo_size=1000000\&overrun_nonfatal>' -vf
> > "hwupload_cuda,format=yuv420p|cuda,yadif_cuda=0:-1:0,scale_npp=-1:720" -c:v
> > h264_nvenc -gpu 1  -b:v 1800k -c:a aac  -aspect 16:9  -g 50 -b:a 64k -ar
> > 44100 -ac 2 -f flv
> > 'rtmp://admin:netaccess@192.168.0.44:1935/nettv/netBBS11500.stream
> > <http://admin:netaccess@192.168.0.44:1935/nettv/netBBS11500.stream>'
> > </dev/null >/dev/null 2>/var/log/BBs1.log  &*
> >
> >
> > ffmpeg version N-94423-ga0c1970 Copyright (c) 2000-2019 the FFmpeg
> > developers
> >   built with gcc 7 (Ubuntu 7.4.0-1ubuntu1~18.04.1)
> >   configuration: --prefix=/root/ffmpeg_build --pkg-config-flags=--static
> > --extra-cflags=-I/root/ffmpeg_build/include
> > --extra-ldflags=-L/root/ffmpeg_build/lib --extra-libs='-lpthread -lm'
> > --bindir=/root/bin --enable-cuda --enable-cuvid --enable-libnpp
> > --extra-cflags=-I../nv_sdk --extra-ldflags=-L../nv_sdk --enable-cuda-nvcc
> > --enable-nvenc --extra-cflags=-I/usr/local/cuda/include/
> > --extra-ldflags=-L/usr/local/cuda/lib64/ --enable-gpl --enable-libaom
> > --enable-libass --enable-libfdk-aac --enable-vaapi --enable-libfreetype
> > --enable-libmp3lame --enable-libopus --enable-libvorbis --enable-libvpx
> > --enable-libx264 --enable-libx265 --enable-nonfree
> >   libavutil      56. 32.100 / 56. 32.100
> >   libavcodec     58. 55.100 / 58. 55.100
> >   libavformat    58. 30.100 / 58. 30.100
> >   libavdevice    58.  9.100 / 58.  9.100
> >   libavfilter     7. 58.100 /  7. 58.100
> >   libswscale      5.  6.100 /  5.  6.100
> >   libswresample   3.  6.100 /  3.  6.100
> >   libpostproc    55.  6.100 / 55.  6.100
> > [h264 @ 0x559b7b7248c0] SPS unavailable in decode_picture_timing
> > [h264 @ 0x559b7b7248c0] non-existing PPS 0 referenced
> > [h264 @ 0x559b7b7248c0] SPS unavailable in decode_picture_timing
> > [h264 @ 0x559b7b7248c0] non-existing PPS 0 referenced
> > [h264 @ 0x559b7b7248c0] decode_slice_header error
> > [h264 @ 0x559b7b7248c0] no frame!
> > [h264 @ 0x559b7b7248c0] SPS unavailable in decode_picture_timing
> > [h264 @ 0x559b7b7248c0] non-existing PPS 0 referenced
> > [h264 @ 0x559b7b7248c0] SPS unavailable in decode_picture_timing
> > [h264 @ 0x559b7b7248c0] non-existing PPS 0 referenced
> > [h264 @ 0x559b7b7248c0] decode_slice_header error
> > [h264 @ 0x559b7b7248c0] no frame!
> > [h264 @ 0x559b7b7248c0] SPS unavailable in decode_picture_timing
> > [h264 @ 0x559b7b7248c0] non-existing PPS 0 referenced
> > [h264 @ 0x559b7b7248c0] SPS unavailable in decode_picture_timing
> > [h264 @ 0x559b7b7248c0] non-existing PPS 0 referenced
> > [h264 @ 0x559b7b7248c0] decode_slice_header error
> > [h264 @ 0x559b7b7248c0] no frame!
> > [h264 @ 0x559b7b7248c0] mmco: unref short failure
> >     Last message repeated 1 times
> > [h264 @ 0x559b7b7248c0] number of reference frames (0+4) exceeds max (3;
> > probably corrupt input), discarding one
> > Input #0, mpegts, from 'udp://@
> > 224.2.2.21:5008?fifo_size=1000000\&overrun_nonfatal':
> >   Duration: N/A, start: 7352.806033, bitrate: N/A
> >   Program 60
> >     Metadata:
> >       service_name    : BBS TV 1
> >       service_provider:
> >     Stream #0:0[0x3d]: Video: h264 (Main) ([27][0][0][0] / 0x001B),
> > yuv420p(tv, bt470bg, top first), 704x576 [SAR 12:11 DAR 4:3], 25 fps, 50
> > tbr, 90k tbn, 50 tbc
> >     Stream #0:1[0x3e](eng): Audio: mp2 ([4][0][0][0] / 0x0004), 48000 Hz,
> > stereo, s16p, 128 kb/s
> > [rtmp @ 0x559b7b7241c0] Ignoring unsupported var reason
> > [h264 @ 0x559b7b73eb80] Using auto hwaccel type cuda with new device
> > created from 1.
> > Stream mapping:
> >   Stream #0:0 -> #0:0 (h264 (native) -> h264 (h264_nvenc))
> >   Stream #0:1 -> #0:1 (mp2 (native) -> aac (native))
> > Press [q] to stop, [?] for help
> > [h264 @ 0x559b7c007e00] reference picture missing during reorder
> > [h264 @ 0x559b7c007e00] Missing reference picture, default is 65297
> > [h264 @ 0x559b7c024680] mmco: unref short failure
> >     Last message repeated 1 times
> > [h264 @ 0x559b7c024680] number of reference frames (0+4) exceeds max (3;
> > probably corrupt input), discarding one
> > [h264 @ 0x559b7c05d780] mmco: unref short failure
> > [h264 @ 0x559b7c108940] mmco: unref short failure
> > Output #0, flv, to 'rtmp://
> > admin:netaccess at 192.168.0.44:1935/nettv/netBBS11500.stream':
> >   Metadata:
> >     encoder         : Lavf58.30.100
> >     Stream #0:0: Video: h264 (h264_nvenc) (Main) ([7][0][0][0] / 0x0007),
> > cuda, 880x720 [SAR 16:11 DAR 16:9], q=-1--1, 1800 kb/s, 25 fps, 1k tbn, 25
> > tbc
> >     Metadata:
> >       encoder         : Lavc58.55.100 h264_nvenc
> >     Side data:
> >       cpb: bitrate max/min/avg: 0/0/1800000 buffer size: 3600000 vbv_delay:
> > -1
> >     Stream #0:1(eng): Audio: aac (LC) ([10][0][0][0] / 0x000A), 44100 Hz,
> > stereo, fltp, 64 kb/s
> >     Metadata:
> >       encoder         : Lavc58.55.100 aac
> > root at ubuntu:/var/log# 17.0 size=   68167kB time=00:04:54.44
> > bitrate=1896.6kbits/s speed=1.02x
>
>
> Hello there,
>
> Please try this. I've simplified your command a bit for legibility.
> These `<>` have been dropped.
>
> ffmpeg -fflags +genpts -vsync 1 -threads 4 \
> -hwaccel nvdec -hwaccel_device 1 -hwaccel_output_format cuda \
> -i 'udp://@224.2.2.21:5008?fifo_size=1000000\&overrun_nonfatal \
> -vf "yadif_cuda=0:-1:0,scale_npp=-1:720" \
> -c:v h264_nvenc -preset:v llhq -rc:v cbr_ld_hq -gpu 1 -b:v 1800k
> -maxrate:v 1800k -bufsize:v 1800k -r:v 25 -g:v 50 \
> -c:a aac -b:a 64k -ar 44100 -ac 2 \
> -f flv -flags +global_header -map 0 \
> 'rtmp://admin:netaccess@192.168.0.44:1935/nettv/netBBS11500.stream'
>
> A few notes:
>
> 1. Note how we call up the hwaccel method. Please don't set this to
> auto. Judging by your console output, you definitely have nvdec
> available. Use it.
>
> 2. See how we request for a specific texture format output from the
> decoder tied to the hwaccel method. In this case we ask for cuda. That
> way you can skip the unnecessary hwupload parts in your previous
> script. These extra bits will definitely slow you down.
>
> 3. The thread count (-threads 4) is explicitly set to a low value, 4.
> For hwaccels such as nvdec, this is ideal. Very high numbers (~16+)
> may result in decoder initialization failure, with warnings.
>
> 4. On encoder presets: You're using a Maxwell Gen 2 GPU (a Tesla M60).
> Based on your previous command line, I assumed you're targeting
> constant bitrate output. With that in mind, the command above selects
> the low latency high quality preset (-preset:v llhq) whose rate
> control method is overridden to constant bitrate, low latency high
> quality mode (-cbr:v cbr_ld_hq) while adapting your selected GOP size
> and frame rate.
>
> 5. On device selection: This is governed by the -hwaccel_device
> arguments passed to the underlying hwaccel , and for the encoder, the
> -gpu argument takes precedence. Your mistake in the previous command
> was calling up hwupload_cuda without specifying a device to use. Your
> previous arguments resulted in the creation of a random CUDA device in
> the middle of a filter chain, invoking expensive copies to and from
> system memory. And that will definitely slow down the encoder.
>
> As an example, with a single RTX 2080 on my laptop encoding one of the
> C-band satellite capture samples from https://kodi.wiki/view/Samples :
>
> cd ~/test
> time ffmpeg -fflags +genpts -vsync 1 -threads 4 \
> -hwaccel nvdec -hwaccel_device 0 -hwaccel_output_format cuda \
> -i 'test.mkv' \
> -vf "yadif_cuda=0:-1:0,scale_npp=-1:720" \
> -c:v h264_nvenc -preset:v llhq -rc:v cbr_ld_hq -gpu 0 -b:v 1800k
> -maxrate:v 1800k -bufsize:v 1800k -r:v 59.94 -g:v 59.98 \
> -c:a aac -b:a 64k -ar 44100 -ac 2 \
> -f flv -flags +global_header -map 0 'test.flv'
>
> And this runs at a sweet, sweet ~11x speed:
>
> frame=17791 fps=657 q=27.0 Lsize=   68121kB time=00:04:56.92
> bitrate=1879.4kbits/s dup=8895 drop=0 speed=  11x
> video:65225kB audio:2336kB subtitle:0kB other streams:0kB global
> headers:0kB muxing overhead: 0.829441%
> [aac @ 0x561db5b25440] Qavg: 153.261
>
> real    0m27.461s
> user    0m15.690s
> sys    0m1.404s
>
>
> At the very least, on your hardware, you should be getting throughput
> speeds in multiples of ~1x with no drops whatsoever.
>
> Test and report back.
>
> Warm regards,
>
> Dennis.

With proper escapes:

ffmpeg -fflags +genpts -vsync 1 -threads 4 \
-hwaccel nvdec -hwaccel_device 1 -hwaccel_output_format cuda \
-i 'udp://@224.2.2.21:5008?fifo_size=1000000\&overrun_nonfatal \
-vf "yadif_cuda=0:-1:0,scale_npp=-1:720" \
-c:v h264_nvenc -preset:v llhq -rc:v cbr_ld_hq -gpu 1 -b:v 1800k \
-maxrate:v 1800k -bufsize:v 1800k -r:v 25 -g:v 50 \
-c:a aac -b:a 64k -ar 44100 -ac 2 \
-f flv -flags +global_header -map 0 \
'rtmp://admin:netaccess@192.168.0.44:1935/nettv/netBBS11500.stream'

Test and report back, thanks.


More information about the ffmpeg-user mailing list