[FFmpeg-user] Trouble transcoding with cuda

Dennis Mungai dmngaie at gmail.com
Wed Sep 4 18:35:31 EEST 2019


On Wed, 4 Sep 2019 at 07:38, Ray Randomnic <randomnicode at gmail.com> wrote:
>
> Hey,
>
> Sure, any video taken by a Samsung device (such as Note or Galaxy S9 or
> S10) with the HDR10+ setting will do. A sample is posted here:
> http://awakeman.redirectme.net/web/testvideo/sample.mp4
>
> Thanks.
>
> On Tue, Sep 3, 2019 at 10:07 PM Dennis Mungai <dmngaie at gmail.com> wrote:
>
> > On Wed, 4 Sep 2019 at 04:32, Ray Randomnic <randomnicode at gmail.com> wrote:
> > >
> > > Hey folks,
> > >
> > > I'm trying to transcode an HEVC (yuv420p10le) encoded file to H264 using
> > a
> > > GTX 1650 nvenc and having issues with what I assume are the pixel formats
> > > conversions on hardware. My encode speed (in fps) is pretty low (see
> > > below), far lower than I get when transcoding HEVC -> HEVC. ffmpeg
> > version
> > > is N-94578-gd6bd902599-gcff309097a+3 (on a Windows 10 OS, though I don't
> > > think this is relevant). For the purposes of this experiment, let's say
> > I'm
> > > not concerned with lossiness with format conversions.
> > >
> > > I'd like to know what I'm doing wrong and what commands I can issue for
> > the
> > > following:
> > > decode on GPU -> format conversion (if necessary) on GPU -> encode on
> > GPU.
> > > I might not be understanding a few concepts.
> > >
> > > The combination of options that I thought were available and I tried out
> > > are:
> > > - decoder (I mostly left this blank for auto) and encoder (always
> > > h264_nvenc)
> > > - hwaccel
> > > - hwaccel_output_format
> > > - filters (vf):
> > >   - format
> > >   - scale_npp (for format conversion on gpu)
> > >
> > > I have no idea what the options pix_fmt or other filters like colorspace
> > do
> > > for hardware (how is pix_fmt different from hwaccel_output_format?). At
> > > this point I'm kind of stuck. Don't know how to convert formats on the
> > GPU
> > > (I assume the format conversion is happening on the CPU).
> > >
> > > Input details:
> > > ffprobe input.mp4
> > >
> > > Stream #0:0(eng): Video: hevc (Main 10) (hvc1 / 0x31637668),
> > > yuv420p10le(tv, bt2020nc/bt2020/smpte2084), 1920x1080, 24886 kb/s, SAR
> > 1:1
> > > DAR 16:9, 29.99 fps, ...
> > >
> > > Summary of various combinations (- indicates left blank):
> > > test | hwaccel | hwaccel_output_format | filter (vf)              |
> > > encodefps | note
> > > 1    | cuda    | -                     | -                        | X
> > >    | Failed
> > > 2    | cuda    | cuda                  | -                        | X
> > >    | Failed
> > > 3    | cuda    | yuv420p               | -                        | 361
> > >    | Video messed up
> > > 4    | cuda    | cuda                  | format=yuv420p           | X
> > >    | Failed
> > > 5    | cuvid   | cuda                  | format=yuv420p           | 91
> > >   | Not using GPU decode
> > > 6    | cuda    | -                     | format=yuv420p           | 161
> > >    | Not using GPU format conversion
> > > 7    | cuvid   | -                     | format=yuv420p           | 91
> > >   | Not using GPU decode
> > > 8    | cuda    | -                     | scale_npp=format=yuv420p | X
> > >    | Failed
> > > 9    | cuda    | cuda                  | scale_npp=format=yuv420p | X
> > >    | Failed
> > >
> > > I would expect a speed of around test 3 (without the screwed up video).
> > Is
> > > there any way to convert the pixel formats on the hardware without
> > screwing
> > > up the video? On a similar note, I'd love for someone to explain the
> > > failing encodes.
> > >
> > > Here are the details for corresponding encodes:
> > >
> > >    1. ffmpeg -loglevel verbose -hwaccel cuda -i input.mp4 -c:v h264_nvenc
> > >    output.mp4
> > >
> > >    Fails with the following:
> > >
> > >    [graph_1_in_0_1 @ 000001cc9670e4c0] tb:1/48000 samplefmt:fltp
> > >    samplerate:48000 chlayout:0x3
> > >    [hevc @ 000001cc8740fc00] NVDEC capabilities:
> > >    [hevc @ 000001cc8740fc00] format supported: yes, max_mb_count: 262144
> > >    [hevc @ 000001cc8740fc00] min_width: 144, max_width: 8192
> > >    [hevc @ 000001cc8740fc00] min_height: 144, max_height: 8192
> > >    [graph 0 input from stream 0:0 @ 000001cc87420840] w:1920 h:1080
> > >    pixfmt:p010le tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2
> > >    [h264_nvenc @ 000001cc8747fbc0] Loaded Nvenc version 9.0
> > >    [h264_nvenc @ 000001cc8747fbc0] Nvenc initialized successfully
> > >    [h264_nvenc @ 000001cc8747fbc0] 1 CUDA capable devices found
> > >    [h264_nvenc @ 000001cc8747fbc0] [ GPU #0 - < GeForce GTX 1650 > has
> > >    Compute SM 7.5 ]
> > >    [h264_nvenc @ 000001cc8747fbc0] 10 bit encode not supported
> > >    [h264_nvenc @ 000001cc8747fbc0] No NVENC capable devices found
> > >    [h264_nvenc @ 000001cc8747fbc0] Nvenc unloaded
> > >    Error initializing output stream 0:0 -- Error while opening encoder
> > for
> > >    output stream #0:0 - maybe incorrect parameters such as bit_rate,
> > rate,
> > >    width or height
> > >
> > >    2. ffmpeg -loglevel verbose -hwaccel cuda -hwaccel_output_format cuda
> > -i
> > >    input.mp4 -c:v h264_nvenc output.mp4
> > >
> > >    Fails with the following:
> > >
> > >    [graph_1_in_0_1 @ 00000240b7932340] tb:1/48000 samplefmt:fltp
> > >    samplerate:48000 chlayout:0x3
> > >    [hevc @ 00000240b79e37c0] NVDEC capabilities:
> > >    [hevc @ 00000240b79e37c0] format supported: yes, max_mb_count: 262144
> > >    [hevc @ 00000240b79e37c0] min_width: 144, max_width: 8192
> > >    [hevc @ 00000240b79e37c0] min_height: 144, max_height: 8192
> > >    [graph 0 input from stream 0:0 @ 00000240b7937e00] w:1920 h:1080
> > >    pixfmt:cuda tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2
> > >    [h264_nvenc @ 00000240b7483700] Loaded Nvenc version 9.0
> > >    [h264_nvenc @ 00000240b7483700] Nvenc initialized successfully
> > >    [h264_nvenc @ 00000240b7483700] 10 bit encode not supported
> > >    [h264_nvenc @ 00000240b7483700] Provided device doesn't support
> > required
> > >    NVENC features
> > >    [h264_nvenc @ 00000240b7483700] Nvenc unloaded
> > >    Error initializing output stream 0:0 -- Error while opening encoder
> > for
> > >    output stream #0:0 - maybe incorrect parameters such as bit_rate,
> > rate,
> > >    width or height
> > >
> > >    Alright, so it seems that the hardware h264 encoder doesn't support 10
> > >    bit encodes (that's coming from the decoder). So lets try changing the
> > >    format:
> > >
> > >
> > >    3. ffmpeg -loglevel verbose -hwaccel cuda -hwaccel_output_format
> > yuv420p
> > >    -i input.mp4 -c:v h264_nvenc output.mp4
> > >
> > >    Pretty decent encode at ~ 360 fps. Alas, the video is screwed up.
> > Colors
> > >    are weird:
> > >
> > >    [graph_1_in_0_1 @ 00000256c9ac7b40] tb:1/48000 samplefmt:fltp
> > >    samplerate:48000 chlayout:0x3
> > >    [hevc @ 00000256cbb737c0] NVDEC capabilities:
> > >    [hevc @ 00000256cbb737c0] format supported: yes, max_mb_count: 262144
> > >    [hevc @ 00000256cbb737c0] min_width: 144, max_width: 8192
> > >    [hevc @ 00000256cbb737c0] min_height: 144, max_height: 8192
> > >    [graph 0 input from stream 0:0 @ 00000256cbac7e00] w:1920 h:1080
> > >    pixfmt:yuv420p tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2
> > >    [h264_nvenc @ 00000256cb693700] Loaded Nvenc version 9.0
> > >    [h264_nvenc @ 00000256cb693700] Nvenc initialized successfully
> > >    [h264_nvenc @ 00000256cb693700] 1 CUDA capable devices found
> > >    [h264_nvenc @ 00000256cb693700] [ GPU #0 - < GeForce GTX 1650 > has
> > >    Compute SM 7.5 ]
> > >    [h264_nvenc @ 00000256cb693700] supports NVENC
> > >
> > >    Let's use a format filter to change format:
> > >
> > >    4. ffmpeg -loglevel verbose -hwaccel cuda -hwaccel_output_format cuda
> > -i
> > >    input.mp4 -vf format=yuv420p -c:v h264_nvenc output.mp4
> > >
> > >    Fails with the following:
> > >
> > >    [graph_1_in_0_1 @ 0000019390de5c80] tb:1/48000 samplefmt:fltp
> > >    samplerate:48000 chlayout:0x3
> > >    [hevc @ 00000193908675c0] NVDEC capabilities:
> > >    [hevc @ 00000193908675c0] format supported: yes, max_mb_count: 262144
> > >    [hevc @ 00000193908675c0] min_width: 144, max_width: 8192
> > >    [hevc @ 00000193908675c0] min_height: 144, max_height: 8192
> > >    [graph 0 input from stream 0:0 @ 00000193a031ee80] w:1920 h:1080
> > >    pixfmt:cuda tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2
> > >    [auto_scaler_0 @ 00000193b7aee780] w:iw h:ih flags:'bicubic' interl:0
> > >    [Parsed_format_0 @ 00000193908eee80] auto-inserting filter
> > >    'auto_scaler_0' between the filter 'graph 0 input from stream 0:0'
> > and the
> > >    filter 'Parsed_format_0'
> > >    Impossible to convert between the formats supported by the filter
> > 'graph
> > >    0 input from stream 0:0' and the filter 'auto_scaler_0'
> > >    Error reinitializing filters!
> > >    Failed to inject frame into filter network: Function not implemented
> > >    Error while processing the decoded data for stream #0:0
> > >
> > >    5. ffmpeg -loglevel verbose -hwaccel cuvid -hwaccel_output_format cuda
> > >    -i input.mp4 -vf format=yuv420p -c:v h264_nvenc output.mp4
> > >
> > >    Succeeds, but only encodes at around 91 fps, due to, I assume, not
> > using
> > >    GPU decoder. What is the difference between cuvid and cuda hwaccel
> > (why did
> > >    the previous fail and this succeed)? Here is the relevant output:
> > >
> > >    [graph_1_in_0_1 @ 000002152cc3cc00] tb:1/48000 samplefmt:fltp
> > >    samplerate:48000 chlayout:0x3
> > >    [hevc @ 000002152ac33700] Initializing cuvid hwaccel
> > >    [AVHWFramesContext @ 000002152cc3f0c0] Pixel format 'yuv420p10le' is
> > not
> > >    supported
> > >    [hevc @ 000002152ac33700] Error initializing a CUDA frame pool
> > >    cuvid hwaccel requested for input stream #0:0, but cannot be
> > initialized.
> > >    [hevc @ 000002152ac33700] Error parsing NAL unit #2.
> > >    [hevc @ 000002152ac79180] Could not find ref with POC 0
> > >    Error while decoding stream #0:0: Operation not permitted
> > >    [graph 0 input from stream 0:0 @ 000002152d638b80] w:1920 h:1080
> > >    pixfmt:yuv420p10le tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2
> > >    [auto_scaler_0 @ 000002152ca176c0] w:iw h:ih flags:'bicubic' interl:0
> > >    [Parsed_format_0 @ 000002152d3fee40] auto-inserting filter
> > >    'auto_scaler_0' between the filter 'graph 0 input from stream 0:0'
> > and the
> > >    filter 'Parsed_format_0'
> > >    [auto_scaler_0 @ 000002152ca176c0] w:1920 h:1080 fmt:yuv420p10le
> > sar:1/1
> > >    -> w:1920 h:1080 fmt:yuv420p sar:1/1 flags:0x4
> > >    [h264_nvenc @ 000002152ac31800] Loaded Nvenc version 9.0
> > >    [h264_nvenc @ 000002152ac31800] Nvenc initialized successfully
> > >    [h264_nvenc @ 000002152ac31800] 1 CUDA capable devices found
> > >    [h264_nvenc @ 000002152ac31800] [ GPU #0 - < GeForce GTX 1650 > has
> > >    Compute SM 7.5 ]
> > >    [h264_nvenc @ 000002152ac31800] supports NVENC
> > >
> > >    Take out hwaccel_output:
> > >
> > >    6. ffmpeg -loglevel verbose -hwaccel cuda -i in.mp4 -vf format=yuv420p
> > >    -c:v h264_nvenc out.mp4
> > >
> > >    Succeeds, encodes at 161 fps (using both hardware GPU decoder and
> > >    encoder, but I believe the changing of format is happening on the CPU
> > >    between the two stages).
> > >
> > >    [graph_1_in_0_1 @ 0000025491bf2b00] tb:1/48000 samplefmt:fltp
> > >    samplerate:48000 chlayout:0x3
> > >    [hevc @ 0000025491b84900] NVDEC capabilities:
> > >    [hevc @ 0000025491b84900] format supported: yes, max_mb_count: 262144
> > >    [hevc @ 0000025491b84900] min_width: 144, max_width: 8192
> > >    [hevc @ 0000025491b84900] min_height: 144, max_height: 8192
> > >    [graph 0 input from stream 0:0 @ 0000025491c0eec0] w:1920 h:1080
> > >    pixfmt:p010le tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2
> > >    [auto_scaler_0 @ 00000254b747cfc0] w:iw h:ih flags:'bicubic' interl:0
> > >    [Parsed_format_0 @ 000002549203d840] auto-inserting filter
> > >    'auto_scaler_0' between the filter 'graph 0 input from stream 0:0'
> > and the
> > >    filter 'Parsed_format_0'
> > >    [auto_scaler_0 @ 00000254b747cfc0] w:1920 h:1080 fmt:p010le sar:1/1 ->
> > >    w:1920 h:1080 fmt:yuv420p sar:1/1 flags:0x4
> > >    [h264_nvenc @ 00000254920a0f40] Loaded Nvenc version 9.0
> > >    [h264_nvenc @ 00000254920a0f40] Nvenc initialized successfully
> > >    [h264_nvenc @ 00000254920a0f40] 1 CUDA capable devices found
> > >    [h264_nvenc @ 00000254920a0f40] [ GPU #0 - < GeForce GTX 1650 > has
> > >    Compute SM 7.5 ]
> > >    [h264_nvenc @ 00000254920a0f40] supports NVENC
> > >
> > >
> > >    7. ffmpeg -loglevel verbose -hwaccel cuvid -i in.mp4 -vf
> > format=yuv420p
> > >    -c:v h264_nvenc out.mp4
> > >
> > >    Only encoding on GPU, not decoding (91 fps).
> > >
> > >    [graph_1_in_0_1 @ 000002163875b5c0] tb:1/48000 samplefmt:fltp
> > >    samplerate:48000 chlayout:0x3
> > >    [hevc @ 00000216380c3c00] Initializing cuvid hwaccel
> > >    [AVHWFramesContext @ 00000216387fc300] Pixel format 'yuv420p10le' is
> > not
> > >    supported
> > >    [hevc @ 00000216380c3c00] Error initializing a CUDA frame pool
> > >    cuvid hwaccel requested for input stream #0:0, but cannot be
> > initialized.
> > >    [hevc @ 00000216380c3c00] Error parsing NAL unit #2.
> > >    [hevc @ 000002163813d300] Could not find ref with POC 0
> > >    Error while decoding stream #0:0: Operation not permitted
> > >    [graph 0 input from stream 0:0 @ 00000216387594c0] w:1920 h:1080
> > >    pixfmt:yuv420p10le tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2
> > >    [auto_scaler_0 @ 000002164f8a0c40] w:iw h:ih flags:'bicubic' interl:0
> > >    [Parsed_format_0 @ 00000216387593c0] auto-inserting filter
> > >    'auto_scaler_0' between the filter 'graph 0 input from stream 0:0'
> > and the
> > >    filter 'Parsed_format_0'
> > >    [auto_scaler_0 @ 000002164f8a0c40] w:1920 h:1080 fmt:yuv420p10le
> > sar:1/1
> > >    -> w:1920 h:1080 fmt:yuv420p sar:1/1 flags:0x4
> > >    [h264_nvenc @ 0000021638590f40] Loaded Nvenc version 9.0
> > >    [h264_nvenc @ 0000021638590f40] Nvenc initialized successfully
> > >    [h264_nvenc @ 0000021638590f40] 1 CUDA capable devices found
> > >    [h264_nvenc @ 0000021638590f40] [ GPU #0 - < GeForce GTX 1650 > has
> > >    Compute SM 7.5 ]
> > >    [h264_nvenc @ 0000021638590f40] supports NVENC
> > >
> > >    Lets see if I can do format conversion in the GPU (instead of GPU ->
> > CPU
> > >    -> GPU), by using the scale_npp filter.
> > >
> > >    8. ffmpeg -loglevel verbose -hwaccel cuda -i input.mp4 -vf
> > >    scale_npp=format=yuv420p -c:v h264_nvenc output.mp4
> > >
> > >    Fails
> > >
> > >    [graph_1_in_0_1 @ 0000022f3001e080] tb:1/48000 samplefmt:fltp
> > >    samplerate:48000 chlayout:0x3
> > >    [hevc @ 0000022f207d7f40] NVDEC capabilities:
> > >    [hevc @ 0000022f207d7f40] format supported: yes, max_mb_count: 262144
> > >    [hevc @ 0000022f207d7f40] min_width: 144, max_width: 8192
> > >    [hevc @ 0000022f207d7f40] min_height: 144, max_height: 8192
> > >    [graph 0 input from stream 0:0 @ 0000022f3034ee80] w:1920 h:1080
> > >    pixfmt:p010le tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2
> > >    [auto_scaler_0 @ 0000022f47b2d300] w:iw h:ih flags:'bicubic' interl:0
> > >    [Parsed_scale_npp_0 @ 0000022f20c49b40] auto-inserting filter
> > >    'auto_scaler_0' between the filter 'graph 0 input from stream 0:0'
> > and the
> > >    filter 'Parsed_scale_npp_0'
> > >    Impossible to convert between the formats supported by the filter
> > 'graph
> > >    0 input from stream 0:0' and the filter 'auto_scaler_0'
> > >    Error reinitializing filters!
> > >    Failed to inject frame into filter network: Function not implemented
> > >    Error while processing the decoded data for stream #0:0
> > >
> > >
> > >    9. ffmpeg -loglevel verbose -hwaccel cuda -hwaccel_output_format cuda
> > -i
> > >    in.mp4 -vf scale_npp=format=yuv420p -c:v h264_nvenc out.mp4
> > >
> > >    Fails:
> > >
> > >    [graph_1_in_0_1 @ 00000200040adac0] tb:1/48000 samplefmt:fltp
> > >    samplerate:48000 chlayout:0x3
> > >    [hevc @ 00000200747b65c0] NVDEC capabilities:
> > >    [hevc @ 00000200747b65c0] format supported: yes, max_mb_count: 262144
> > >    [hevc @ 00000200747b65c0] min_width: 144, max_width: 8192
> > >    [hevc @ 00000200747b65c0] min_height: 144, max_height: 8192
> > >    [graph 0 input from stream 0:0 @ 00000200040aa8c0] w:1920 h:1080
> > >    pixfmt:cuda tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2
> > >    [Parsed_scale_npp_0 @ 0000020074c75b80] Unsupported input format:
> > p010le
> > >    [Parsed_scale_npp_0 @ 0000020074c75b80] Failed to configure output pad
> > >    on Parsed_scale_npp_0
> > >    Error reinitializing filters!
> > >    Failed to inject frame into filter network: Function not implemented
> > >    Error while processing the decoded data for stream #0:0
> > >
> > >
> > > I'd appreciate any help or pointer in the right direction (even an
> > > alternate mailing list).
> >
> >
> > Hey there,
> >
> > Could you kindly provide a download link to the sample of the input
> > file you're working on?
> > That way we can reproduce what you're seeing here, thanks!

The link you provided for the sample is dead.
I'll try to reproduce this on my end with clips recorded from a Samsung S8.


More information about the ffmpeg-user mailing list