[FFmpeg-devel] [PATCH 08/11] avcodec/v210enc: add AVX-512 10-bit line pack function

James Darnley jdarnley at obe.tv
Fri Nov 10 15:32:26 EET 2017

On 2017-11-09 20:42, Martin Vignali wrote:
> I doesn't want to block this patch, but
> like you say (in your previous version), that this version is not faster,
> i'm not sure, it's interesting to apply it.
> You already made "real" avx512 version for other funcs, in order to check
> the rest of yours patchs.

I will not apply any of the new AVX-512/ZMM function patches because
they need proper testing in a real world situation.  Sorry but I don't
have time to see whether these few naive length extensions are better.
I have my own work to see whether AVX-512/ZMM provides a speed-up.  If
that pans out then FFmpeg will benefit because some of the work will
trickle back to it.

I mentioned previously that using ZMM registers will cause the CPU to
reduce its frequency.

Gramner said on IRC that a user should spend 20-30% of time in
AVX-512/ZMM code for it to be a net gain in speed.
>From ffmpeg-devel IRC on 2017-10-26
> https://lists.ffmpeg.org/pipermail/ffmpeg-devel-irc/2017-October/004622.html
> [18:49:26 CEST] <Gramner> J_Darnley: be aware that using zmm registers induces significant frequency drops which reduces performance of everything else, so if you want to use 512-bit vectors you better go all in on it to make up for it. you probably want to spend at least 20-30% of overall runtime in avx-512 code
> [18:50:00 CEST] <Gramner> the alternative is to stay in 256-bit mode and just leverage new instructions and opmasks

This means any cycles you might save by using longer registers, fewer
instructions, better instructions, whatever, will be lost because the
frequency drops meaning it takes longer to execute overall.

I don't have time to perform that sort of in-depth testing.

I will post the checkasm benchmark results for the 3 patches though.

> $ ./tests/checkasm/checkasm --bench --test=v210enc
> benchmarking with native FFmpeg timers
> nop: 26.0
> checkasm: using random seed 3018512312
> SSSE3:
>  - v210enc.planar_pack [OK]
> AVX:
>  - v210enc.planar_pack [OK]
> AVX2:
>  - v210enc.planar_pack [OK]
> AVX-512:
>  - v210enc.planar_pack [OK]
> checkasm: all 6 tests passed
> v210_planar_pack_8_c: 1726.5
> v210_planar_pack_8_ssse3: 308.5
> v210_planar_pack_8_avx: 313.5
> v210_planar_pack_8_avx2: 213.5
> v210_planar_pack_10_c: 1424.0
> v210_planar_pack_10_ssse3: 301.0
> v210_planar_pack_10_avx2: 227.5
> v210_planar_pack_10_avx512: 229.5

More information about the ffmpeg-devel mailing list