[FFmpeg-devel] [PATCH] move h264 loopfilter strength code to yasm

Ronald S. Bultje rsbultje
Sat Sep 25 16:57:26 CEST 2010


Hi,

On Sep 24, 2010, at 9:40 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Fri, Sep 24, 2010 at 07:33:11PM -0400, Ronald S. Bultje wrote:
>> 
>> On Sep 24, 2010, at 5:33 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>>> On Fri, Sep 24, 2010 at 05:10:35PM -0400, Ronald S. Bultje wrote:
>>> [...]
>>>> -                        "psubw          (%2), %%mm1 \n"
>>>> -                        "psubw         8(%2), %%mm2 \n"
>>>> -                        "psubw       160(%2), %%mm3 \n"
>>>> -                        "psubw       168(%2), %%mm4 \n"
>>>> +                        "psubw          (%3), %%mm1 \n"
>>>> +                        "psubw         8(%3), %%mm2 \n"
>>>> +                        "psubw       160(%3), %%mm3 \n"
>>>> +                        "psubw       168(%3), %%mm4 \n"
>>>>                        "packsswb      %%mm2, %%mm1 \n"
>>>>                        "packsswb      %%mm4, %%mm3 \n"
>>>>                        "paddb         %%mm6, %%mm1 \n"
>>>> @@ -111,26 +111,28 @@
>>>>                        "por           %%mm1, %%mm0 \n"
>>>>                        "pshufw $0x4E, %%mm0, %%mm1 \n"
>>>>                        "pminub        %%mm1, %%mm0 \n"
>>>> -                        ::"r"(d_idx),
>>>> -                          "r"(ref[0]+b_idx),
>>>> -                          "r"(mv[0]+b_idx)
>>>> +                        ::"r"(ref[0]+b_idx),
>>>> +                          "r"(ref[0]+b_idx+d_idx),
>>>> +                          "r"(mv[0]+b_idx),
>>>> +                          "r"(mv[0]+b_idx+d_idx)
>>> 
>>> this doesnt look correct
>>> 
>>> and patches should be tested before submitting ideally, i tend to review
>>> based on the assumtion that the code has been tested
>>> (like operands and constraints i dont need to check if they match because
>>> it wouldnt work would they not match)
>> 
>> Yeah, I over-enthusiastically screwed up here, sorry. First patch should still be ok, I'll ask on gcc-list how to write a constant without the $. Without that, it'll be hard to get the last 10 cycles off, I'm affraid...
> 
> try %a0 and %c0 with "i" it produces a constant without $
> %n0 will produce a negated one

I will try this monday, thanks for looking it up. (Once I have a patch, we can test on clang, icc etc.)

Ronald



More information about the ffmpeg-devel mailing list