[FFmpeg-devel] [PATCH 5/5] pp: add SSE2 deInterlaceInterpolateCubic().

Clément Bœsch ubitux at gmail.com
Sun Nov 18 16:47:25 CET 2012


On Sun, Nov 18, 2012 at 01:14:34AM +0100, Michael Niedermayer wrote:
> On Sat, Nov 17, 2012 at 11:14:11PM +0100, Clément Bœsch wrote:
> > On Sat, Nov 17, 2012 at 03:59:17PM +0100, Michael Niedermayer wrote:
> > > On Sat, Nov 17, 2012 at 01:07:13PM +0100, Clément Bœsch wrote:
> > > > 2124 decicycles in deInterlaceInterpolateCubic_C, 67100774 runs, 8090 skips
> > > > 458 decicycles in deInterlaceInterpolateCubic_MMX2, 67107146 runs, 1718 skips
> > > > 382 decicycles in deInterlaceInterpolateCubic_SSE2, 67107086 runs, 1778 skips
> > > > ---
> > > >  libpostproc/postprocess_template.c | 25 ++++++++++++++++++++++---
> > > >  1 file changed, 22 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/libpostproc/postprocess_template.c b/libpostproc/postprocess_template.c
> > > > index dc63032..0729e8f 100644
> > > > --- a/libpostproc/postprocess_template.c
> > > > +++ b/libpostproc/postprocess_template.c
> > > > @@ -1497,13 +1497,30 @@ static inline void RENAME(deInterlaceInterpolateLinear)(uint8_t src[], int strid
> > > >   */
> > > >  static inline void RENAME(deInterlaceInterpolateCubic)(uint8_t src[], int stride)
> > > >  {
> > > > -#if TEMPLATE_PP_MMXEXT || TEMPLATE_PP_3DNOW
> > > > +#if TEMPLATE_PP_SSE2 || TEMPLATE_PP_MMXEXT || TEMPLATE_PP_3DNOW
> > > >      src+= stride*3;
> > > >      __asm__ volatile(
> > > >          "lea (%0, %1), %%"REG_a"                \n\t"
> > > >          "lea (%%"REG_a", %1, 4), %%"REG_d"      \n\t"
> > > >          "lea (%%"REG_d", %1, 4), %%"REG_c"      \n\t"
> > > >          "add %1, %%"REG_c"                      \n\t"
> > > > +#if TEMPLATE_PP_SSE2
> > > > +        "pxor %%xmm7, %%xmm7                    \n\t"
> > > > +#define REAL_DEINT_CUBIC(a,b,c,d,e)\
> > > > +        "movq " #a ", %%xmm0                    \n\t"\
> > > > +        "movq " #b ", %%xmm1                    \n\t"\
> > > > +        "movq " #d ", %%xmm2                    \n\t"\
> > > > +        "movq " #e ", %%xmm3                    \n\t"\
> > > > +        "pavgb %%xmm2, %%xmm1                   \n\t"\
> > > > +        "pavgb %%xmm3, %%xmm0                   \n\t"\
> > > > +        "punpcklbw %%xmm7, %%xmm0               \n\t"\
> > > > +        "punpcklbw %%xmm7, %%xmm1               \n\t"\
> > > > +        "psubw %%xmm1, %%xmm0                   \n\t"\
> > > > +        "psraw $3, %%xmm0                       \n\t"\
> > > > +        "psubw %%xmm0, %%xmm1                   \n\t"\
> > > > +        "packuswb %%xmm1, %%xmm1                \n\t"\
> > > > +        "movlps %%xmm1, " #c "                  \n\t"
> > > > +#else //TEMPLATE_PP_SSE2
> > > 
> > > the code should be re structured to run these filters on larger blocks
> > > that is at least 16pixel or the whole width
> > > 
> > 
> > I don't feel like doing such thing soon, so feel free to do it :)
> > 
> > > but until then this should be ok but the sse registers should be added
> > > to the clobber list
> > > 
> > 
> > Added, new patch attached.
> 
> should be ok
> 

Applied.

-- 
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20121118/6365b637/attachment.asc>


More information about the ffmpeg-devel mailing list