[FFmpeg-devel] [PATCH] swscale/input: add rgbaf16 input support
Mark Reid
mindmark at gmail.com
Tue Aug 9 01:07:03 EEST 2022
On Mon, Aug 8, 2022 at 1:59 PM Timo Rothenpieler <timo at rothenpieler.org>
wrote:
> On 08.08.2022 21:39, Mark Reid wrote:
> > On Mon, Aug 8, 2022 at 11:24 AM Timo Rothenpieler <timo at rothenpieler.org
> >
> > wrote:
> >
> >> This is by no means perfect, since at least ddagrab will return scRGB
> >> data with values outside of 0.0f to 1.0f for HDR values.
> >> Its primary purpose is to be able to work with the format at all.
> >>
> >> _Float16 support was available on arm/aarch64 for a while, and with gcc
> >> 12 was enabled on x86 as long as SSE2 is supported.
> >>
> >> If the target arch supports f16c, gcc emits fairly efficient assembly,
> >> taking advantage of it. This is the case on x86-64-v3 or higher.
> >> Without f16c, it emulates it in software using sse2 instructions.
> >> ---
> >>
> >> I am by no means certain this is the correct way to implement this.
> >> Tested it with ddagrab output in that format, and it looks like what I'd
> >> expect.
> >>
> >> Specially the order of arguments is a bit of a mystery. I'd have
> >> expected them to be in order of the planes, so for packed formats, only
> >> the first one would matter.
> >> But a bunch of other packed formats left the first src unused, and so I
> >> followed along, and it ended up working fine.
> >>
> >>
> > Have you looked at the exr decoder half2float.h? It already has f16 to
> f32
> > decoding functions.
> >
>
> For performance, using the compilers native, and potentially hardware
> accelerated, support is probably preferable.
> Though as a no-float16-fallback it's probably not too horrible.
> Just not sure if it's worth the extra effort, given that by the time
> this sees any use at all, gcc 12 will be very common.
>
> Might even think about _Float16 support for exr in that case.
> Would be an interesting benchmark.
>
Having the fallback will likely be required to have this patch accepted,
also this will need fate tests.
+static void rgbaf16ToUV_half_c(uint8_t *_dstU, uint8_t *_dstV,
> + const uint8_t *unused0, const uint8_t
> *src1, const uint8_t *src2,
> + int width, uint32_t *_rgb2yuv)
> +{
> +#if HAVE_FLOAT16
> + const _Float16 *src = (const _Float16*)src1;
> + uint16_t *dstU = (uint16_t*)_dstU;
> + uint16_t *dstV = (uint16_t*)_dstV;
> + int32_t *rgb2yuv = (int32_t*)_rgb2yuv;
> + int32_t ru = rgb2yuv[RU_IDX], gu = rgb2yuv[GU_IDX], bu =
> rgb2yuv[BU_IDX];
> + int32_t rv = rgb2yuv[RV_IDX], gv = rgb2yuv[GV_IDX], bv =
> rgb2yuv[BV_IDX];
> + int i;
> + av_assert1(src1==src2);
> + for (i = 0; i < width; i++) {
> + int r = (lrintf(av_clipf(65535.0f * src[i*8+0], 0.0f, 65535.0f)) +
> + lrintf(av_clipf(65535.0f * src[i*8+4], 0.0f, 65535.0f)))
> >> 1;
> + int g = (lrintf(av_clipf(65535.0f * src[i*8+1], 0.0f, 65535.0f)) +
> + lrintf(av_clipf(65535.0f * src[i*8+5], 0.0f, 65535.0f)))
> >> 1;
> + int b = (lrintf(av_clipf(65535.0f * src[i*8+2], 0.0f, 65535.0f)) +
> + lrintf(av_clipf(65535.0f * src[i*8+6], 0.0f, 65535.0f)))
> >> 1;
> +
> + dstU[i] = (ru*r + gu*g + bu*b + (0x10001<<(RGB2YUV_SHIFT-1))) >>
> RGB2YUV_SHIFT;
> + dstV[i] = (rv*r + gv*g + bv*b + (0x10001<<(RGB2YUV_SHIFT-1))) >>
> RGB2YUV_SHIFT;
> + }
> +#endif
> +}
IF defining out the core of the function is not the best approach here,
specifically for platforms without HAVE_FLOAT16.
I would probably try and put the accelerated half2float conversion in
half2float.h and move that header to libavutil instead.
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
>
More information about the ffmpeg-devel
mailing list