[FFmpeg-devel] [PATCH 2/2] vp9: sse2/ssse3/avx 16bpp loopfilter x86 simd.

Henrik Gramner henrik at gramner.com
Wed Sep 30 20:04:40 CEST 2015


On Wed, Sep 30, 2015 at 7:26 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> On Wed, Sep 30, 2015 at 11:54 AM, Henrik Gramner <henrik at gramner.com> wrote:
>> +%macro FILTER_STEP 6-10 "", "", "", 0 ; tmp, reg, mask, shift, dst, \
>> > +                                      ; src/sub1, sub2, add1, add2,
>> dont_store
>> > +    psrlw               %1, %2, %4
>> > +%ifnidn %7, ""
>> > +    psubw               %2, %6
>> > +%endif
>> > +    psubw               %1, %6                      ; abs->delta
>> > +%ifnidn %7, ""
>> > +    psubw               %2, %7
>> > +%endif
>> > +    pand                %1, reg_%3                  ; apply mask
>> > +%ifnidn %7, ""
>> > +    paddw               %2, %8
>> > +%endif
>> > +%if %10 == 1
>> > +    paddw               %6, %1                      ; delta->abs
>> > +%else
>> > +    paddw               %1, %6                      ; delta->abs
>> > +%endif
>> > +%ifnidn %7, ""
>> > +    paddw               %2, %9
>> > +%endif
>> > +%if %10 != 1
>> > +    mova              [%5], %1
>> > +%endif
>> > +%endmacro
>>
>> Is there a reason for not merging most of those %ifs to make it more
>> readable?
>>
>
> Pairing. I can remove if you don't like it.

OOE on x86 pretty much always handles reordering of instructions in
cases like this just fine so code readability is in my opinion
preferable over trying to optimally order instructions manually. OOE
is actually ridiculously efficient on modern x86 CPUs and will reorder
instructions all over the place anyway.


More information about the ffmpeg-devel mailing list