[Ffmpeg-devel] a little optim for a SSE version of H263_LOOP_FILTER

Michael Niedermayer michaelni
Sun Nov 5 17:33:16 CET 2006


Hi

On Sun, Nov 05, 2006 at 04:50:10PM +0100, Guillaume POIRIER wrote:
> Hi,
> 
> On 11/4/06, skal <skal65535 at orange.fr> wrote:
> >
> >  Hi everybody,
> >
> > in case, it seems to me a SSE version of
> > H263_LOOP_FILTER is possible by replacing
> >       "psubusb %%mm4, %%mm2           \n\t"\
> >       "movq %%mm2, %%mm3              \n\t"\
> >       "psubusb %%mm4, %%mm3           \n\t"\
> >       "psubb %%mm3, %%mm2             \n\t"\
> > at dsputil_mmx.c:587 (fresh cvs), by:
> >       "psubusb %%mm4, %%mm2           \n\t"\
> >       "pminub %%mm4, %%mm2           \n\t"\
> >
> > +maybe a little re-org of the loop (mm3 is gone).
> 
> Please send patch, I'll try to benchmark the speed change.
> 
> Note that movq is very slow on P4, so any code that removes
> mov(q|dqu|..) provides an interesting speed-up.

why dont you try to replace all reg, reg movq by pshufw? if theres a 
speed up then we could make movq a macro which expends depending on
cpu type to movq or pshufw $11100100b, ...


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In the past you could go to a library and read, borrow or copy any book
Today you'd get arrested for mere telling someone where the library is




More information about the ffmpeg-devel mailing list