[FFmpeg-devel] [PATCH v2 2/3] avfilter/x86/vf_exposure: add ff_exposure_avx2
Wu Jianhua
toqsxw at outlook.com
Sat Nov 20 22:42:27 EET 2021
James Almer<mailto:jamrial at gmail.com>:
On 11/4/2021 1:18 AM, Wu Jianhua wrote:
>> Performance data(Less is better):
>> exposure_sse: 500491
>You reported a better result in the first patch.
For they are tested on different baseline, I think it might be better to only compare these two values.
>> exposure_avx2: 449122
> This looks like a really low speed up for a function that processes
> twice the amount of floats per loop.
> >
> > Signed-off-by: Wu Jianhua <jianhua.wu at intel.com>
> > ---
> > libavfilter/x86/vf_exposure.asm | 15 +++++++++++++++
> > libavfilter/x86/vf_exposure_init.c | 6 ++++++
> > 2 files changed, 21 insertions(+)
> >
> > diff --git a/libavfilter/x86/vf_exposure.asm b/libavfilter/x86/vf_exposure.asm
> > index 3351c6fb3b..f271167805 100644
> > --- a/libavfilter/x86/vf_exposure.asm
> > +++ b/libavfilter/x86/vf_exposure.asm
> > @@ -36,11 +36,21 @@ cglobal exposure, 2, 2, 4, ptr, length, black, scale
> > VBROADCASTSS m1, xmm1
> > %endif
> >
> > +%if cpuflag(fma3) || cpuflag(fma4)
> Remove the fma4 check if you're not using it.
No problem. Avx2 flag is only initialized with fma3, so the fma4 is redundant indeed.
> > + mulps m0, m0, m1 ; black * scale
> > +%endif
> > +
> > .loop:
> > +%if cpuflag(fma3) || cpuflag(fma4)
> > + mova m2, m0
> > + vfmsub231ps m2, m1, [ptrq]
> > + movu [ptrq], m2
> Have you tried to not use FMA for this and just kept the sub + mul even
> for AVX2 and see how it performs?
Yeah. Definitely. I have had sufficient tests before. The first version is kept sub + mul
for AVX2. After that, I keep trying to find a way out to speed up it further. Using FMA
here would be faster than sub + mul indeed, precisely, improving by 4%-10% approximately.
Not that much better, but still an optimal way I found at the present.
>> +%else
>> movu m2, [ptrq]
>> subps m2, m2, m0
>> mulps m2, m2, m1
>> movu [ptrq], m2
>> +%endif
>> add ptrq, mmsize
>> sub lengthq, mmsize/4
>>
>> @@ -52,4 +62,9 @@ cglobal exposure, 2, 2, 4, ptr, length, black, scale
>> %if ARCH_X86_64
>> INIT_XMM sse
>> EXPOSURE
>> +
>> +%if HAVE_AVX2_EXTERNAL
>> +INIT_YMM avx2
>> +EXPOSURE
>> +%endif
>> %endif
>> diff --git a/libavfilter/x86/vf_exposure_init.c b/libavfilter/x86/vf_exposure_init.c
>> index de1b360f6c..80dae6164e 100644
>> --- a/libavfilter/x86/vf_exposure_init.c
>> +++ b/libavfilter/x86/vf_exposure_init.c
>> @@ -24,6 +24,7 @@
>> #include "libavfilter/exposure.h"
>>
>> void ff_exposure_sse(float *ptr, int length, float black, float scale);
>> +void ff_exposure_avx2(float *ptr, int length, float black, float scale);
>>
>> av_cold void ff_exposure_init_x86(ExposureContext *s)
> > {
>> @@ -32,5 +33,10 @@ av_cold void ff_exposure_init_x86(ExposureContext *s)
>> #if ARCH_X86_64
>> if (EXTERNAL_SSE(cpu_flags))
>> s->exposure_func = ff_exposure_sse;
>> +
>> +#if HAVE_AVX2_EXTERNAL
> No need for this preprocessor check.
Got it. I’ll update it.
Thanks for your review.
Jianhua
More information about the ffmpeg-devel
mailing list