[FFmpeg-devel] [PATCH 5/5] pp: add SSE2 deInterlaceInterpolateCubic().

Michael Niedermayer michaelni at gmx.at
Sun Nov 18 01:14:34 CET 2012


On Sat, Nov 17, 2012 at 11:14:11PM +0100, Clément Bœsch wrote:
> On Sat, Nov 17, 2012 at 03:59:17PM +0100, Michael Niedermayer wrote:
> > On Sat, Nov 17, 2012 at 01:07:13PM +0100, Clément Bœsch wrote:
> > > 2124 decicycles in deInterlaceInterpolateCubic_C, 67100774 runs, 8090 skips
> > > 458 decicycles in deInterlaceInterpolateCubic_MMX2, 67107146 runs, 1718 skips
> > > 382 decicycles in deInterlaceInterpolateCubic_SSE2, 67107086 runs, 1778 skips
> > > ---
> > >  libpostproc/postprocess_template.c | 25 ++++++++++++++++++++++---
> > >  1 file changed, 22 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/libpostproc/postprocess_template.c b/libpostproc/postprocess_template.c
> > > index dc63032..0729e8f 100644
> > > --- a/libpostproc/postprocess_template.c
> > > +++ b/libpostproc/postprocess_template.c
> > > @@ -1497,13 +1497,30 @@ static inline void RENAME(deInterlaceInterpolateLinear)(uint8_t src[], int strid
> > >   */
> > >  static inline void RENAME(deInterlaceInterpolateCubic)(uint8_t src[], int stride)
> > >  {
> > > -#if TEMPLATE_PP_MMXEXT || TEMPLATE_PP_3DNOW
> > > +#if TEMPLATE_PP_SSE2 || TEMPLATE_PP_MMXEXT || TEMPLATE_PP_3DNOW
> > >      src+= stride*3;
> > >      __asm__ volatile(
> > >          "lea (%0, %1), %%"REG_a"                \n\t"
> > >          "lea (%%"REG_a", %1, 4), %%"REG_d"      \n\t"
> > >          "lea (%%"REG_d", %1, 4), %%"REG_c"      \n\t"
> > >          "add %1, %%"REG_c"                      \n\t"
> > > +#if TEMPLATE_PP_SSE2
> > > +        "pxor %%xmm7, %%xmm7                    \n\t"
> > > +#define REAL_DEINT_CUBIC(a,b,c,d,e)\
> > > +        "movq " #a ", %%xmm0                    \n\t"\
> > > +        "movq " #b ", %%xmm1                    \n\t"\
> > > +        "movq " #d ", %%xmm2                    \n\t"\
> > > +        "movq " #e ", %%xmm3                    \n\t"\
> > > +        "pavgb %%xmm2, %%xmm1                   \n\t"\
> > > +        "pavgb %%xmm3, %%xmm0                   \n\t"\
> > > +        "punpcklbw %%xmm7, %%xmm0               \n\t"\
> > > +        "punpcklbw %%xmm7, %%xmm1               \n\t"\
> > > +        "psubw %%xmm1, %%xmm0                   \n\t"\
> > > +        "psraw $3, %%xmm0                       \n\t"\
> > > +        "psubw %%xmm0, %%xmm1                   \n\t"\
> > > +        "packuswb %%xmm1, %%xmm1                \n\t"\
> > > +        "movlps %%xmm1, " #c "                  \n\t"
> > > +#else //TEMPLATE_PP_SSE2
> > 
> > the code should be re structured to run these filters on larger blocks
> > that is at least 16pixel or the whole width
> > 
> 
> I don't feel like doing such thing soon, so feel free to do it :)
> 
> > but until then this should be ok but the sse registers should be added
> > to the clobber list
> > 
> 
> Added, new patch attached.

should be ok

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Freedom in capitalist society always remains about the same as it was in
ancient Greek republics: Freedom for slave owners. -- Vladimir Lenin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20121118/d68138a7/attachment.asc>


More information about the ffmpeg-devel mailing list