[FFmpeg-devel] [PATCH] MMX2/SSSE3 VC1 loop filter

David Conrad lessen42
Mon Jul 5 23:25:45 CEST 2010


On Jul 5, 2010, at 5:10 PM, Jason Garrett-Glaser wrote:

> On Mon, Jul 5, 2010 at 2:04 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
>> Hi,
>> 
>> On Mon, Jul 5, 2010 at 5:02 PM, Jason Garrett-Glaser
>> <darkshikari at gmail.com> wrote:
>>> On Mon, Jul 5, 2010 at 1:30 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
>>>> On Mon, Jul 5, 2010 at 1:44 AM, David Conrad <lessen42 at gmail.com> wrote:
>>>>> +%macro STORE_4_WORDS_SSE4 6
>>>>> +    pextrw %1, %5, %6+0
>>>>> +    pextrw %2, %5, %6+1
>>>>> +    pextrw %3, %5, %6+2
>>>>> +    pextrw %4, %5, %6+3
>>>>> +%endmacro
>>>> [..]
>>> 
>>> I don't recall pextrw being SSE4...
>> 
>> Awesome, I should be able to use that for VP8 simple H loopfilter as
>> well (it's mmxext/sse2).
> 
> Make sure it's actually faster; do benches on a few CPUs.  pextr/pinsr
> are not always as fast as one wants.

Indeed, iirc pextrw to a register (i.e. pre-sse4) was slower than doing lots of shift+mov



More information about the ffmpeg-devel mailing list