[FFmpeg-devel] [PATCH v3 1/2][GSoC 2024] libavcodec/x86/vvc: Add AVX2 DMVR SAD functions for VVC
Stone Chen
chen.stonechen at gmail.com
Sun May 19 17:24:03 EEST 2024
On Sat, May 18, 2024 at 11:33 AM Ronald S. Bultje <rsbultje at gmail.com>
wrote:
> Hi,
>
> On Tue, May 14, 2024 at 4:40 PM Stone Chen <chen.stonechen at gmail.com>
> wrote:
>
>> + vvc_sad_8:
>> + .loop_height:
>> + movu xm0, [src1q]
>> + movu xm1, [src2q]
>> + MIN_MAX_SAD xm2, xm0, xm1
>> + vpmovzxwd m1, xm1
>> + vpaddd m3, m1
>>
> [..]
>
>> + vvc_sad_16_128:
>> + .loop_height:
>>
> [..]
>
>> + .loop_width:
>> + movu xm0, [src1q]
>> + movu xm1, [src2q]
>> + MIN_MAX_SAD xm2, xm0, xm1
>> + vpmovzxwd m1, xm1
>> + vpaddd m3, m1
>>
>
Hi Ronald,
> Wouldn't it be more efficient if the main loops did a full register worth
> at a time?
>
> vpbroadcastd m4, [pw_1]
> loop:
> movu m0, [src1q]
> movu m1, [src2q]
> MIN_MAX_SAD m2, m0, m1
> pmaddwd m1, m4
> paddd m3, m1
>
> (And then for w8, load 2 rows per iteration using movu xmN, [row0] and
> vinserti128 mN, [row1], 1.)
>
> Ronald
>
Thank you, I didn't know about the pmaddwd instruction, using it is
definitely more efficient!
Stone
More information about the ffmpeg-devel
mailing list