[FFmpeg-devel] [PATCH 1/6] x86: huffyuvdsp: port mmx add_bytes to yasm

James Almer jamrial at gmail.com
Thu May 29 21:16:45 CEST 2014


On 29/05/14 2:37 PM, Christophe Gisquet wrote:
> +.1:
> +    mova    m0, [dstq + sizeq]
> +    mova    m1, [srcq + sizeq]
> +    mova    m2, [dstq + sizeq + mmsize]
> +    mova    m3, [srcq + sizeq + mmsize]
> +    paddb   m1, m0
> +    paddb   m3, m2
> +    mova   [dstq + sizeq], m1
> +    mova   [dstq + sizeq + mmsize], m3
> +    add  sizeq, 2*mmsize
> +    jl .1

Why not instead something like

    mova    m0, [dstq + sizeq]
    mova    m1, [dstq + sizeq + mmsize]
    paddb   m0, [srcq + sizeq]
    paddb   m1, [srcq + sizeq + mmsize]
    mova   [dstq + sizeq], m0
    mova   [dstq + sizeq + mmsize], m1

Didn't bench, but i assume it should be faster, and similar stuff is 
already being done in lavu's float_dsp.asm


More information about the ffmpeg-devel mailing list