[FFmpeg-devel] [PATCH] VP8 luma(16) inner-MB H/V loopfilter MMX/SSE2

Eli Friedman eli.friedman
Sun Jul 11 21:52:22 CEST 2010


On Sun, Jul 11, 2010 at 8:53 AM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> You'll notice that the sse2 is significantly slower here, my rough
> guess is that this is because of my shitty CPU which pretty much
> emulates xmm-ops through mmx-ops, so it doesn't add a lot of benefit
> other than not having to setup the loop for doing the second 8 pixels,
> combined with the added complexity of a 8x16 transpose before the
> actual filter. I'm betting that on an actual sse2-supporting CPU
> (Jason?), this would still be faster, but we might want to put this
> under a FF_MM_SSE2_NOT_SHITTY flag or something along those lines. If
> you think my code is shitty, comments are welcome also. ;-).

On my Mobile Core i5, the SSE2 version has the expected performance
gain vs. the mmxext version (55% of the time for the vertical version,
65% of the time for the horizontal version).

-Eli



More information about the ffmpeg-devel mailing list