[FFmpeg-devel] [PATCH 1/2] avfilter/vf_zscale: add slice threading

Paul B Mahol onemda at gmail.com
Fri May 31 23:37:55 EEST 2019


On 5/31/19, Pavel Koshevoy <pkoshevoy at gmail.com> wrote:
> On Fri, May 31, 2019 at 2:17 PM Paul B Mahol <onemda at gmail.com> wrote:
>>
>> On 5/31/19, Pavel Koshevoy <pkoshevoy at gmail.com> wrote:
>> > On Fri, May 31, 2019 at 2:03 PM Paul B Mahol <onemda at gmail.com> wrote:
>> >>
>> >> On 5/31/19, Pavel Koshevoy <pkoshevoy at gmail.com> wrote:
>> >> > On Fri, May 31, 2019 at 1:44 PM Pavel Koshevoy <pkoshevoy at gmail.com>
>> >> > wrote:
>
> <snip>
>
>
>> >> >> I've had to use zscale to convert 10-bit 4k60p video from HLG HDR to
>> >> >> SDR
>> >> >> (bt709).   It was ~36x times slower than real time.  What I ended up
>> >> >> doing
>> >> >> to speed it up was to generate CLUT image (16-bit yuv444 65x65x65
>> >> >> sampling
>> >> >> of input color space), lay it out as a 2D image (512x537), and run
>> >> >> it
>> >> >> through zscale to generate the HDR->SDR transform CLUT.  Then I used
>> >> >> the
>> >> >> CLUT instead of zscale for every frame...  that got me to about
>> >> >> ~3.5x
>> >> >> times slower than realtime converting 60fps 10-bit 4k HLG to SDR
>> >> >> (and
>> >> >> I
>> >> >> don't know any assembly, so I didn't attempt to optimize the CLUT
>> >> >> trilinear optimization with SIMD, so maybe it could be faster
>> >> >> still).
>> >> >> I
>> >> >> then ported to CUDA and was able to convert 4k60p HLG->SDR faster
>> >> >> than
>> >> >> realtime on a Pascal GPU.
>> >> >>
>> >> >
>> >> > I meant trilinear interpolation
>> >> >
>> >> >
>> >> >> So, I'm not sure that adding slice threading to zscale is the best
>> >> >> optimization for it.  I think capturing the effect of zscale in a
>> >> >> CLUT
>> >> >> would be a more significant optimization.
>> >> >>
>> >> >> Just my 2 cents, hope this helps.
>> >>
>> >> Your logic is completely flawed.
>> >> You can not rescale images with LUT tables.
>> >
>> >
>> > I was not resizing the image from 4K to 1080p ... the output was till
>> > 4K.  I was converting from 10-bit in whatever HDR input colorspace
>> > (HLG, or HDR10), to 8-bit SDR output colorspace.  You most definitely
>> > can approximate that transformation with a CLUT.
>> >
>>
>> Seen lut3d filter?
>
> lut3d works with RGB images, my input and output are all YUV  (P010
> actually)
> also, lut3d requires a file parameter, not great for my use case.  I
> could generate a CLUT with zscale and dump it to disk so I could
> initialize lut3d with it, but I hope you see how inconvenient that is
> from API view point.

lut3d could work with any colorspace, its just current limitation.

>
>
>> > Since zscale is capable of resizing and colorspace conversion --
>> > perhaps this functionality should be split into separate filters so
>> > each can be otpimized differently.
>>
>> You logic is completely flawed yet again.
>> zscale is wrapper around another library.
>
> I know, zimg, C++11.

I think there is option for zimg and available for zscale to speed up
some extremly slow conversions.

Dunno what version of zscale you used, but see agamma option. (This is
pure guessing on my side)


More information about the ffmpeg-devel mailing list