[FFmpeg-devel] [PATCH] VP8 luma(16) inner-MB H/V loopfilter MMX/SSE2

Michael Niedermayer michaelni
Mon Jul 19 20:19:29 CEST 2010


On Sun, Jul 18, 2010 at 02:21:12PM -0400, Ronald S. Bultje wrote:
> Hi,
> 
> On Sun, Jul 11, 2010 at 2:47 PM, Loren Merritt <lorenm at u.washington.edu> wrote:
> > On Sun, 11 Jul 2010, Michael Niedermayer wrote:
> >> On Sun, Jul 11, 2010 at 04:52:04PM +0000, Loren Merritt wrote:
> >>> On Sun, 11 Jul 2010, Ronald S. Bultje wrote:
> >>>> You'll notice that the sse2 is significantly slower here, my rough
> >>>> guess is that this is because of my shitty CPU which pretty much
> >>>> emulates xmm-ops through mmx-ops, so it doesn't add a lot of benefit
> >>>> other than not having to setup the loop for doing the second 8 pixels,
> >>>> combined with the added complexity of a 8x16 transpose before the
> >>>> actual filter. I'm betting that on an actual sse2-supporting CPU
> >>>> (Jason?), this would still be faster, but we might want to put this
> >>>> under a FF_MM_SSE2_NOT_SHITTY flag or something along those lines. If
> >>>> you think my code is shitty, comments are welcome also. ;-).
> >>>
> >>> Rather than special-casing most of the functions, we at x264 declared
> >>> that
> >>> Core1 doesn't have sse2, and changed the cpuid parser accordingly.
> >>> If you want to support the few cases where sse2 is slightly faster than
> >>> mmx, I recommend picking a different flag for that and applying it only
> >>> when you've tested on Core1, so that FF_MM_SSE2 can be trusted to dwim in
> >>> the usual case.
> >>>
> >>> --Loren Merritt
> >>
> >>> ?cpuid.c | ? 14 +++++++++++++-
> >>> ?1 file changed, 13 insertions(+), 1 deletion(-)
> >>> 7ba0916766645e2de9330e9ba8f30d815da14c91 ?cpuid.diff
> >>
> >> do we have any float SSE2 code that this could affect negatively?
> >> if not iam ok with this patch
> >
> > ff_lpc_compute_autocorr_sse2
> 
> Attached patch implements FF_MM_SSE2/3SLOW for this purpose.
[...]
> @@ -108,13 +112,25 @@
>              rval |= FF_MM_MMX2;
>      }
>  
> +    if (!strncmp(vendor.c, "GenuineIntel", 12) &&
> +        family == 6 && (model == 9 || model == 13 || model == 14)) {
> +        /* 6/9 (pentium-m "banias"), 6/13 (pentium-m "dothan"), and 6/14 (core1 "yonah")
> +         * theoretically support sse2, but it's usually slower than mmx,
> +         * so let's just pretend they don't. */

> +        if (rval & FF_MM_SSE2) rval |= FF_MM_SSE2SLOW;
> +        if (rval & FF_MM_SSE3) rval |= FF_MM_SSE3SLOW;
> +        rval &= ~(FF_MM_SSE2|FF_MM_SSE3);

if (rval & FF_MM_SSE2) rval ^= FF_MM_SSE2SLOW | FF_MM_SSE2;
...

ok otherwise

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Many that live deserve death. And some that die deserve life. Can you give
it to them? Then do not be too eager to deal out death in judgement. For
even the very wise cannot see all ends. -- Gandalf
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 190 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100719/d1863501/attachment.pgp>



More information about the ffmpeg-devel mailing list