[FFmpeg-devel] vc1dsp: introduce cases for 8x8 and 16x16

Ronald S. Bultje rsbultje at gmail.com
Sun Apr 20 19:21:09 CEST 2014


Hi,

On Sun, Apr 20, 2014 at 12:16 PM, Christophe Gisquet <
christophe.gisquet at gmail.com> wrote:

> I noticed the 16x16 partitions were actually using 4 calls to the 8x8 MC
> code.
>

Patch OK.


> Note: I tried to at least unroll vertically the MMX code in the 16x16
> case, but that somehow slowed the decoder to its original speed. I
> didn't bother further because of the aforementioned reason.


Vertically? Maybe the code size is just too big (cache issue), I mean, 16
lines is quite a lot to unroll (I'm assuming you tried to unroll to 16, you
didn't specify so I may be wrong). Afair even h264 doesn't unroll much
vertically, just horizontally. I wouldn't expect much speed gain beyond an
unroll by 2 for instruction pairing anyway, so I guess no v unroll should
be fine for w=16.

Ronald


More information about the ffmpeg-devel mailing list