[FFmpeg-devel] [PATCH] H.264: x264 SSE2 iDCT functions

Jason Garrett-Glaser darkshikari
Fri Jan 2 21:37:11 CET 2009


> a random idea: (untested and ignore if slower)
>
> movd      "block[ 0]", %%mm0    //  0 0 X D
> punpcklwd "block[16]", %%mm0    //  x X d D
> paddsw           "32", %%mm0
> psraw              $6, %%mm0
> punpcklwd       %%mm0, %%mm0    //  d d D D
> pxor            %%mm1, %%mm1    //  0 0 0 0
> psubw           %%mm0, %%mm1    // -d-d-D-D
> packuswb        %%mm1, %%mm0    // -d-d-D-D d d D D
> pshufw   $0xFA, %%mm0, %%mm1    // -d-d-d-d-D-D-D-D
> punpcklwd       %%mm0, %%mm0    //  d d d d D D D D
>
>
> except that, patch ok

1.5 clocks faster in i16x16 idct... barely worth it, but still better,
so I'll keep it.

Patch attached.

Dark Shikari
-------------- next part --------------
A non-text attachment was scrubbed...
Name: x264_idct.diff
Type: text/x-diff
Size: 13381 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090102/fa825555/attachment.diff>



More information about the ffmpeg-devel mailing list