[FFmpeg-devel] [PATCH] VP8 MMX optimizations (MC and IDCT dc_add)

Frank Barchard fbarchard
Fri Jun 25 12:25:12 CEST 2010


>On Thu, Jun 24, 2010 at 6:33 PM, Jason Garrett-Glaser <
darkshikari at gmail.com> wrote:

> > Now with 8x8 intra pred modes and non-broken line endings.  Did I
> > mention this makes h264 faster too?
> >
> > Dark Shikari
> >
>
> And one more SSSE3 function, because pshufb is amazing.
>

yes, but 4 cycles on atom.

On Tue, Jun 22, 2010 at 3:29 PM, Michael Niedermayer <michaelni at gmx.at>
 wrote:
>> +    punpcklbw mm2, mm6
>> +
>>> +.nextrow
>> +    ; first tap
>> +    pshufw    mm3, mm7, 0x0                ; splat first coeff

> are you sure all these pshufw are faster than reading them from a table?

Ronald's pshufw follow unpacks, so the tables would be twice as big.
On Atom, cache and memory are dirt slow, so keeping the tables small makes
sense.



More information about the ffmpeg-devel mailing list