[FFmpeg-devel] [PATCH] avfilter: add hflip x86 SIMD
Paul B Mahol
onemda at gmail.com
Sun Dec 3 21:36:01 EET 2017
On 12/3/17, Martin Vignali <martin.vignali at gmail.com> wrote:
>>
>> In any case, if clang or gcc can generate better code, then the hand
>> written version needs to be optimized to be as fast or faster.
>>
>>
>>
> Quick test : pass checkasm (but probably only because width = 256)
> hflip_byte_c: 26.4
> hflip_byte_ssse3: 20.4
>
>
> INIT_XMM ssse3
> cglobal hflip_byte, 3, 5, 2, src, dst, w, x, v, src2
> mova m0, [pb_flip_byte]
> xor xq, xq ; <======
> mov wd, dword wm
> sub wq, mmsize * 2
> ;remove the cmp here <======
> jl .skip
>
> .loop0: ; process two xmm in the loop
> neg xq
> movu m1, [srcq + xq - mmsize + 1]
> movu m2, [srcq + xq - mmsize * 2 + 1] <======
> pshufb m1, m0
> pshufb m2, m0 <======
> neg xq
> movu [dstq + xq], m1
> movu [dstq + xq + mmsize], m2 <======
> add xq, mmsize * 2 <======
> cmp xq, wq
> jl .loop0
> RET ; add RET here
>
> ; MISSING one xmm process if need
>
> .skip:
> add wq, mmsize
> .loop1:
> neg xq
> mov vb, [srcq + xq]
> neg xq
> mov [dstq + xq], vb
> add xq, 1
> cmp xq, wq
> jl .loop1
> RET
So what is wrong now?
More information about the ffmpeg-devel
mailing list