#11655(avcodec:new): Cuda/nvdec hwaccel outputs P016LE instead of P010LE on 10bit video
#11655: Cuda/nvdec hwaccel outputs P016LE instead of P010LE on 10bit video -------------------------------------+------------------------------------- Reporter: nyanmisaka | Type: defect Status: new | Priority: normal Component: avcodec | Version: git- Keywords: cuda nvdec | master nvidia hwaccel | Blocked By: Blocking: | Reproduced by developer: 0 Analyzed by developer: 0 | -------------------------------------+------------------------------------- Summary of the bug: Cuda/nvdec hwaccel outputs P016LE instead of P010LE on 10bit video. How to reproduce: {{{ ffmpeg -hwaccel cuda -hwaccel_output_format cuda \ -i /path/to/10bit-video -an -sn -dn -vf hwdownload,format=p010le -f null - ... [hwdownload @ 0000020612888240] Invalid output format p010le for hwframe download. [Parsed_hwdownload_0 @ 00000206106e7140] Failed to configure output pad on Parsed_hwdownload_0 [vf#0:0 @ 0000020610752240] Error reinitializing filters! }}} Regression caused by https://git.ffmpeg.org/gitweb/ffmpeg.git/commit/30e6effff94c6f4310aa2db57191... For correctness and consistency with the old behavior, a change like this is required in the above commit: {{{ + } else { + frames_ctx->sw_format = sw_desc->comp[0].depth == 10 ? AV_PIX_FMT_P010LE : AV_PIX_FMT_P016LE; + } }}} -- Ticket URL: <https://trac.ffmpeg.org/ticket/11655> FFmpeg <https://ffmpeg.org> FFmpeg issue tracker
#11655: Cuda/nvdec hwaccel outputs P016LE instead of P010LE on 10bit video -------------------------------------+------------------------------------- Reporter: nyanmisaka | Owner: (none) Type: defect | Status: new Priority: normal | Component: avcodec Version: git-master | Resolution: Keywords: cuda nvdec | Blocked By: nvidia hwaccel | Blocking: | Reproduced by developer: 0 Analyzed by developer: 0 | -------------------------------------+------------------------------------- Comment (by Timo R.): That is because nvdec does not output P010. P010 has the pixel data in the least significant bits, while nvdec outputs it in the most significant bits. I.e. it outputs P016 with the 6 lowest bits zeroed out. FFmpeg does not have a "P010 but in the MSB" format, hence P016 is used. -- Ticket URL: <https://trac.ffmpeg.org/ticket/11655#comment:1> FFmpeg <https://ffmpeg.org> FFmpeg issue tracker
#11655: Cuda/nvdec hwaccel outputs P016LE instead of P010LE on 10bit video -------------------------------------+------------------------------------- Reporter: nyanmisaka | Owner: (none) Type: defect | Status: new Priority: normal | Component: avcodec Version: git-master | Resolution: Keywords: cuda nvdec | Blocked By: nvidia hwaccel | Blocking: | Reproduced by developer: 0 Analyzed by developer: 0 | -------------------------------------+------------------------------------- Comment (by nyanmisaka): I thought CUVID and NVDEC differed mainly in the bitstream parser, because I still see P010LE in CUVID. -- Ticket URL: <https://trac.ffmpeg.org/ticket/11655#comment:2> FFmpeg <https://ffmpeg.org> FFmpeg issue tracker
#11655: Cuda/nvdec hwaccel outputs P016LE instead of P010LE on 10bit video -------------------------------------+------------------------------------- Reporter: nyanmisaka | Owner: (none) Type: defect | Status: new Priority: normal | Component: avcodec Version: git-master | Resolution: Keywords: cuda nvdec | Blocked By: nvidia hwaccel | Blocking: | Reproduced by developer: 0 Analyzed by developer: 0 | -------------------------------------+------------------------------------- Comment (by Balling): First of all there are two different types of P010LE. Little endian can be of Intel style and can be of Nvidia style. -- Ticket URL: <https://trac.ffmpeg.org/ticket/11655#comment:3> FFmpeg <https://ffmpeg.org> FFmpeg issue tracker
#11655: Cuda/nvdec hwaccel outputs P016LE instead of P010LE on 10bit video -------------------------------------+------------------------------------- Reporter: nyanmisaka | Owner: (none) Type: defect | Status: new Priority: normal | Component: avcodec Version: git-master | Resolution: Keywords: cuda nvdec | Blocked By: nvidia hwaccel | Blocking: | Reproduced by developer: 0 Analyzed by developer: 0 | -------------------------------------+------------------------------------- Comment (by Timo R.): No, this is actually just a bug in that commit. The situation I described only exists for AV_PIX_FMT_YUV444P16. Since the it cannot simply be used as AV_PIX_FMT_YUV444P10, the layout of the bits is different. But P010 can totally receive P016 data, and likewise can P210 for P216 for 422 output. So only 444 is a problem. -- Ticket URL: <https://trac.ffmpeg.org/ticket/11655#comment:4> FFmpeg <https://ffmpeg.org> FFmpeg issue tracker
#11655: Cuda/nvdec hwaccel outputs P016LE instead of P010LE on 10bit video -------------------------------------+------------------------------------- Reporter: nyanmisaka | Owner: (none) Type: defect | Status: new Priority: normal | Component: avcodec Version: git-master | Resolution: Keywords: cuda nvdec | Blocked By: nvidia hwaccel | Blocking: | Reproduced by developer: 0 Analyzed by developer: 0 | -------------------------------------+------------------------------------- Comment (by nyanmisaka):
And this was finally fixed just recently: #11369 and #11235
What we are discussing here does not involve swscale conversion, but whether the pixel format output by NVDEC hardware is correctly defined. -- Ticket URL: <https://trac.ffmpeg.org/ticket/11655#comment:5> FFmpeg <https://ffmpeg.org> FFmpeg issue tracker
#11655: Cuda/nvdec hwaccel outputs P016LE instead of P010LE on 10bit video -------------------------------------+------------------------------------- Reporter: nyanmisaka | Owner: (none) Type: defect | Status: new Priority: normal | Component: avcodec Version: git-master | Resolution: Keywords: cuda nvdec | Blocked By: nvidia hwaccel | Blocking: | Reproduced by developer: 0 Analyzed by developer: 0 | -------------------------------------+------------------------------------- Comment (by Balling): 420 10 bit video must decode as P010 not P012 or P016. 422 must decode as P012 and 444 must decode as P016. You see, 10 bit 420 video works as follows: 10 bit are used for Y plane and then x2 less bits are used for both Cb and Cr. So together it adds to 10 + 5 = 15 bits. That is P010, that is 16 bits. -- Ticket URL: <https://trac.ffmpeg.org/ticket/11655#comment:6> FFmpeg <https://ffmpeg.org> FFmpeg issue tracker
#11655: Cuda/nvdec hwaccel outputs P016LE instead of P010LE on 10bit video -------------------------------------+------------------------------------- Reporter: nyanmisaka | Owner: Timo | Rothenpieler <timo@…> Type: defect | Status: closed Priority: normal | Component: avcodec Version: git-master | Resolution: fixed Keywords: cuda nvdec | Blocked By: nvidia hwaccel | Blocking: | Reproduced by developer: 0 Analyzed by developer: 0 | -------------------------------------+------------------------------------- Changes (by Timo Rothenpieler <timo@…>): * owner: (none) => Timo Rothenpieler <timo@…> * resolution: => fixed * status: new => closed Comment: In [changeset:"bf5f3f1f2e6ef56b060c454de9d27c6aabf30b78/ffmpeg" bf5f3f1f/ffmpeg]: {{{#!CommitTicketReference repository="ffmpeg" revision="bf5f3f1f2e6ef56b060c454de9d27c6aabf30b78" avcodec/nvdec: fix 10bit output pixel formats Fixes #11655 }}} -- Ticket URL: <https://trac.ffmpeg.org/ticket/11655#comment:7> FFmpeg <https://ffmpeg.org> FFmpeg issue tracker
#11655: Cuda/nvdec hwaccel outputs P016LE instead of P010LE on 10bit video -------------------------------------+------------------------------------- Reporter: nyanmisaka | Owner: Timo | Rothenpieler <timo@…> Type: defect | Status: closed Priority: normal | Component: avcodec Version: git-master | Resolution: fixed Keywords: cuda nvdec | Blocked By: nvidia hwaccel | Blocking: | Reproduced by developer: 0 Analyzed by developer: 0 | -------------------------------------+------------------------------------- Comment (by Timo R.): nvenc/nvdec use P016/P216 for historical reasons. When they were implemented, P012 and P212 didn't exist. So now migration to those is a bit complicated, since changing the output format is a breaking change. And there still is a total lack of support for the 10 and 12 bit formats that nvdec outputs (and nvenc accepts for input) for 4:4:4, so that's always mapped to AV_PIX_FMT_YUV444P16, even though the LSB are always 0 (or ignored in case of nvenc). -- Ticket URL: <https://trac.ffmpeg.org/ticket/11655#comment:8> FFmpeg <https://ffmpeg.org> FFmpeg issue tracker
And there still is a total lack of support for the 10 and 12 bit formats
#11655: Cuda/nvdec hwaccel outputs P016LE instead of P010LE on 10bit video -------------------------------------+------------------------------------- Reporter: nyanmisaka | Owner: Timo | Rothenpieler <timo@…> Type: defect | Status: closed Priority: normal | Component: avcodec Version: git-master | Resolution: fixed Keywords: cuda nvdec | Blocked By: nvidia hwaccel | Blocking: | Reproduced by developer: 0 Analyzed by developer: 0 | -------------------------------------+------------------------------------- Comment (by nyanmisaka): Thanks for the quick update. I think CUVID decoder needs a similar update, as it still outputs P016 for 10-bit video when you specify "-c:v hevc_cuvid". Around line ~200 of "libavcodec/cuviddec.c" that nvdec outputs (and nvenc accepts for input) for 4:4:4, so that's always mapped to AV_PIX_FMT_YUV444P16, even though the LSB are always 0 (or ignored in case of nvenc). I'm aware of these issues as well. Especially for 4:4:4 formats in NVDEC/ENC, they don't use packed Y410/Y416 (AV_PIX_FMT_{XV30,XV36}) formats like D3D11VA/VAAPI uses. But since NVIDIA's Windows driver already supports them recently, I guess it shouldn't be too hard for them to add corresponding formats in CUDA? -- Ticket URL: <https://trac.ffmpeg.org/ticket/11655#comment:9> FFmpeg <https://ffmpeg.org> FFmpeg issue tracker
#11655: Cuda/nvdec hwaccel outputs P016LE instead of P010LE on 10bit video -------------------------------------+------------------------------------- Reporter: nyanmisaka | Owner: Timo | Rothenpieler <timo@…> Type: defect | Status: closed Priority: normal | Component: avcodec Version: git-master | Resolution: fixed Keywords: cuda nvdec | Blocked By: nvidia hwaccel | Blocking: | Reproduced by developer: 0 Analyzed by developer: 0 | -------------------------------------+------------------------------------- Comment (by Timo R.): I won't be adding a whole scale filter into the decoder, that's a horrible hack that simply won't happen. Last time we attempted to add those special pix_fmts it was rejected for being an Nvidia special. Will try again though, but no promises. -- Ticket URL: <https://trac.ffmpeg.org/ticket/11655#comment:10> FFmpeg <https://ffmpeg.org> FFmpeg issue tracker
#11655: Cuda/nvdec hwaccel outputs P016LE instead of P010LE on 10bit video -------------------------------------+------------------------------------- Reporter: nyanmisaka | Owner: Timo | Rothenpieler <timo@…> Type: defect | Status: closed Priority: normal | Component: avcodec Version: git-master | Resolution: fixed Keywords: cuda nvdec | Blocked By: nvidia hwaccel | Blocking: | Reproduced by developer: 0 Analyzed by developer: 0 | -------------------------------------+------------------------------------- Comment (by Timo R.): And I can't see anything wrong in cuviddec.c, it correctly outputs P010 for 10 bit 420. -- Ticket URL: <https://trac.ffmpeg.org/ticket/11655#comment:11> FFmpeg <https://ffmpeg.org> FFmpeg issue tracker
#11655: Cuda/nvdec hwaccel outputs P016LE instead of P010LE on 10bit video -------------------------------------+------------------------------------- Reporter: nyanmisaka | Owner: Timo | Rothenpieler <timo@…> Type: defect | Status: closed Priority: normal | Component: avcodec Version: git-master | Resolution: fixed Keywords: cuda nvdec | Blocked By: nvidia hwaccel | Blocking: | Reproduced by developer: 0 Analyzed by developer: 0 | -------------------------------------+------------------------------------- Comment (by Balling):
I won't be adding a whole scale filter into the decoder, that's a horrible hack that simply won't happen.
That would just slow it down with bitexact result and no benefits? Right? -- Ticket URL: <https://trac.ffmpeg.org/ticket/11655#comment:12> FFmpeg <https://ffmpeg.org> FFmpeg issue tracker
#11655: Cuda/nvdec hwaccel outputs P016LE instead of P010LE on 10bit video -------------------------------------+------------------------------------- Reporter: nyanmisaka | Owner: Timo | Rothenpieler <timo@…> Type: defect | Status: closed Priority: normal | Component: avcodec Version: git-master | Resolution: fixed Keywords: cuda nvdec | Blocked By: nvidia hwaccel | Blocking: | Reproduced by developer: 0 Analyzed by developer: 0 | -------------------------------------+------------------------------------- Comment (by nyanmisaka): Replying to [comment:11 Timo R.]:
And I can't see anything wrong in cuviddec.c, it correctly outputs P010 for 10 bit 420.
Hi Timo, you can use this command line to reproduce the problem in CUVID. And here is the log: {{{ ffmpeg -v quiet -f lavfi -i nullsrc=s=1920x1080,format=p010le \ -c:v hevc_nvenc -vframes 1 -f nut - | ffmpeg -c:v hevc_cuvid -i - -f null - ffmpeg version N-120169-g0fe9f25e76-20250704 Copyright (c) 2000-2025 the FFmpeg developers built with gcc 15.1.0 (crosstool-NG 1.27.0.42_35c1e72) configuration: --prefix=/ffbuild/prefix --pkg-config-flags=--static --pkg-config=pkg-config --cross-prefix=x86_64-w64-mingw32- --arch=x86_64 --target-os=mingw32 --enable-gpl --enable-version3 --disable-debug --disable-w32threads --enable-pthreads --enable-iconv --enable-zlib --enable-libfribidi --enable-gmp --enable-libxml2 --enable-lzma --enable- fontconfig --enable-libharfbuzz --enable-libfreetype --enable-libvorbis --enable-opencl --disable-libpulse --enable-libvmaf --disable-libxcb --disable-xlib --enable-amf --enable-libaom --enable-libaribb24 --enable- avisynth --enable-chromaprint --enable-libdav1d --enable-libdavs2 --enable-libdvdread --enable-libdvdnav --disable-libfdk-aac --enable- ffnvcodec --enable-cuda-llvm --enable-frei0r --enable-libgme --enable- libkvazaar --enable-libaribcaption --enable-libass --enable-libbluray --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librist --enable-libssh --enable-libtheora --enable-libvpx --enable-libwebp --enable-libzmq --enable-lv2 --enable-libvpl --enable-openal --enable- liboapv --enable-libopencore-amrnb --enable-libopencore-amrwb --enable- libopenh264 --enable-libopenjpeg --enable-libopenmpt --enable-librav1e --enable-librubberband --enable-schannel --enable-sdl2 --enable-libsnappy --enable-libsoxr --enable-libsrt --enable-libsvtav1 --enable-libtwolame --enable-libuavs3d --disable-libdrm --enable-vaapi --enable-libvidstab --enable-vulkan --enable-libshaderc --enable-libplacebo --enable-libvvenc --enable-libx264 --enable-libx265 --enable-libxavs2 --enable-libxvid --enable-libzimg --enable-libzvbi --extra-cflags=-DLIBTWOLAME_STATIC --extra-cxxflags= --extra-libs=-lgomp --extra-ldflags=-pthread --extra- ldexeflags= --cc=x86_64-w64-mingw32-gcc --cxx=x86_64-w64-mingw32-g++ --ar=x86_64-w64-mingw32-gcc-ar --ranlib=x86_64-w64-mingw32-gcc-ranlib --nm=x86_64-w64-mingw32-gcc-nm --extra-version=20250704 libavutil 60. 4.101 / 60. 4.101 libavcodec 62. 5.100 / 62. 5.100 libavformat 62. 1.101 / 62. 1.101 libavdevice 62. 0.100 / 62. 0.100 libavfilter 11. 1.100 / 11. 1.100 libswscale 9. 0.100 / 9. 0.100 libswresample 6. 0.100 / 6. 0.100 Input #0, nut, from 'fd:': Metadata: encoder : Lavf62.1.101 Duration: N/A, bitrate: N/A Stream #0:0: Video: hevc (Main 10) (HEVC / 0x43564548), yuv420p10le(tv), 1920x1080 [SAR 1:1 DAR 16:9], 25 tbr, 51200 tbn Metadata: encoder : Lavc62.5.100 hevc_nvenc Stream mapping: Stream #0:0 -> #0:0 (hevc (hevc_cuvid) -> wrapped_avframe (native)) Output #0, null, to 'pipe:': Metadata: encoder : Lavf62.1.101 Stream #0:0: Video: wrapped_avframe, p016le(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 25 fps, 25 tbn Metadata: encoder : Lavc62.5.100 wrapped_avframe [out#0/null @ 000001474eee9ec0] video:0KiB audio:0KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: unknown frame= 1 fps=0.0 q=-0.0 Lsize=N/A time=00:00:00.16 bitrate=N/A speed=7.97x elapsed=0:00:00.02 }}} As you can see, the CUVID decoder is outputting P016 format. {{{ Stream #0:0: Video: wrapped_avframe, p016le(tv, progressive), 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 200 kb/s, 25 fps, 25 tbn }}} -- Ticket URL: <https://trac.ffmpeg.org/ticket/11655#comment:13> FFmpeg <https://ffmpeg.org> FFmpeg issue tracker
participants (1)
-
FFmpeg