[FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD

Paul B Mahol onemda at gmail.com
Sun Dec 3 22:13:12 EET 2017


On 12/3/17, Paul B Mahol <onemda at gmail.com> wrote:
> On 12/3/17, Martin Vignali <martin.vignali at gmail.com> wrote:
>> Maybe the problem come from the skip part :
>>
>> +INIT_XMM ssse3
>>> +cglobal hflip_byte, 3, 5, 3, src, dst, w, x, v
>>> +    mova    m0, [pb_flip_byte]
>>> +    mov     xq, 0
>>> +    mov     wd, dword wm
>>> +    sub     wq, 2 * mmsize
>>> +    cmp     wq, mmsize
>>> +    jl .skip
>>> +
>>> +    .loop0:
>>> +        neg     xq
>>> +        movu    m1, [srcq + xq -     mmsize + 1]
>>> +        movu    m2, [srcq + xq - 2 * mmsize + 1]
>>> +        pshufb  m1, m0
>>> +        pshufb  m2, m0
>>> +        neg     xq
>>> +        movu    [dstq + xq         ], m1
>>> +        movu    [dstq + xq + mmsize], m2
>>> +        add     xq, mmsize * 2
>>> +        cmp     xq, wq
>>> +        jl .loop0
>>> +
>>> +.skip:
>>> +    add     wq, 2 * mmsize
>>>
>>
>> ==> use xq instead of wq ?
>
> Nope.
>
>>
>>
>>> +    .loop1:
>>> +        neg    xq
>>> +        mov    vb, [srcq + xq]
>>> +        neg    xq
>>> +        mov    [dstq + xq], vb
>>> +        add    xq, 1
>>> +        cmp    xq, wq
>>> +        jl .loop1
>>> +RET
>>> +
>>> +cglobal hflip_short, 3, 5, 3, src, dst, w, x, v
>>> +    mova    m0, [pb_flip_short]
>>> +    mov     xq, 0
>>> +    mov     wd, dword wm
>>> +    add     wq, wq
>>> +    sub     wq, 2 * mmsize
>>> +    cmp     wq, mmsize
>>> +    jl .skip
>>> +
>>> +    .loop0:
>>> +        neg     xq
>>> +        movu    m1, [srcq + xq -     mmsize + 2]
>>> +        movu    m2, [srcq + xq - 2 * mmsize + 2]
>>> +        pshufb  m1, m0
>>> +        pshufb  m2, m0
>>> +        neg     xq
>>> +        movu    [dstq + xq         ], m1
>>> +        movu    [dstq + xq + mmsize], m2
>>> +        add     xq, mmsize
>>> +        cmp     xq, wq
>>> +        jl .loop0
>>> +
>>> +.skip:
>>> +    add     wq, 2 * mmsize
>>>
>>
>>
>> ==> same here ?
>
> Nope, This is for case when width is not multiple of mmsize.
>

Can I get final verdict? I would like to move to other things.


More information about the ffmpeg-devel mailing list