[FFmpeg-devel] [PATCH] SSE2 and SSSE3 versions of h264 biweight prediction code (biweight_h264_pixels_tab)

Ronald S. Bultje rsbultje
Tue Aug 3 20:56:22 CEST 2010


Hi,

On Tue, Aug 3, 2010 at 2:27 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
> On Fri, Jul 30, 2010 at 7:47 PM, Eli Friedman <eli.friedman at gmail.com> wrote:
>> On Thu, Jul 29, 2010 at 11:15 AM, Eli Friedman <eli.friedman at gmail.com> wrote:
>>> On Thu, Jul 29, 2010 at 9:23 AM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
>>>> On Thu, Jul 29, 2010 at 12:32 AM, Eli Friedman <eli.friedman at gmail.com> wrote:
>>>>> Patch attached. ?Loosely based off of the MMX2 version. ?Around 1%
>>>>> faster overall on a test file on my Mobile Core i5.
>>>> [..]
>>>>> +cglobal h264_biweight_8x8_ssse3, 7, 7, 8
>>>>> + ? ?BIWEIGHT_SSSE3_SETUP
>>>>> + ? ?mov ? ? ? ?r3, 4
>>>>> +
>>>>> +.nextrow
>>>>> + ? ?BIWEIGHT_SSSE3_OP r2
>>>>> + ? ?movh ? ? ? [r0], m0
>>>>> + ? ?movhps ? ? [r0+r2], m0
>>>>> + ? ?lea ? ? ? ?r0, [r0+r2*2]
>>>>> + ? ?lea ? ? ? ?r1, [r1+r2*2]
>>>>> + ? ?dec ? ? ? ?r3
>>>>> + ? ?jnz .nextrow
>>>>> + ? ?REP_RET
>>>>
>>>> You have several unused r%d regs here, maybe you want to use lea r4,
>>>> [r2*2] and then use add r0/r1, r4 instead of lea, that should result
>>>> in slightly smaller code. Same for h264_biweight_8x8_sse2.
>>>
>>> Will do.
>>
>> Done in attached.
>>
>>>>> +%macro BIWEIGHT_SSSE3_OP 1
>>>>> + ? ?movh ? ? ? m0, [r0]
>>>>> + ? ?movh ? ? ? m1, [r1]
>>>>> + ? ?movh ? ? ? m2, [r0+%1]
>>>>> + ? ?movh ? ? ? m3, [r1+%1]
>>>>> + ? ?punpcklbw ?m0, m1
>>>>> + ? ?punpcklbw ?m2, m3
>>>>
>>>> If you don't use m1/m3 afterwards, you can IIRC just punpcklbw m0,
>>>> [r0+%1] and same for the line below.
>>>
>>> I don't have appropriate alignment for the 8x8 case, but I suppose I
>>> can do it in the 16x16 case.
>>
>> Done in attached; saves one instruction per 16 pixels in the 16x16
>> case. ?(I could fiddle with it to remove another load, but I doubt the
>> speed would be significantly different.)
>
> Ping.

Latest version looked good to me, I had no further comments. I'd say,
give it a day or so for possibly Jason/Michael/Loren/anyone to comment
on, and apply if there's no more comments (since it's clearly faster =
better than current).

Thanks!

Ronald



More information about the ffmpeg-devel mailing list