[FFmpeg-devel] [PATCH] H.264: x264 SSE2 iDCT functions

Michael Niedermayer michaelni
Fri Jan 2 20:51:20 CET 2009


On Fri, Jan 02, 2009 at 02:03:48PM -0500, Jason Garrett-Glaser wrote:
> $subject
> 
> Benchmarks:
> 
> Cathedral:
> idct_add16: 293 -> 282 clocks
> idct_add16intra: 343 -> 257 clocks
> 
> "300" sample (contains almost no i16x16 blocks so I didn't test add16intra):
> idct_add16: 518 -> 433
> 
> Higher benefit is due to higher bitrate, most likely.
> 
> idct_DC was ommitted from idct_add16 because the extra branching logic
> turned out to make it significantly slower (the branching becomes much
> more complicated and less likely as *both* 4x4 DCT blocks have to be
> DC-only for it to work).
> 
> x264 iDCT code was modified to add a stride parameter, required for ffh264.
> 
> x86util.asm was included from x264 in full for simplicity's sake and
> ease of use for adding future x264 assembly that uses it.

[...]

> Index: libavcodec/x86/h264dsp_mmx.c
> ===================================================================
> --- libavcodec/x86/h264dsp_mmx.c	(revision 16408)
> +++ libavcodec/x86/h264dsp_mmx.c	(working copy)

> @@ -472,6 +472,79 @@
>      }
>  }
>  
> +#ifdef HAVE_YASM
> +static void ff_h264_idct_dc_add8_mmx2(uint8_t *dst, int16_t *block, int stride)
> +{

> +    int dc0 = (block[ 0] + 32) >> 6;
> +    int dc1 = (block[16] + 32) >> 6;
> +    __asm__ volatile(
> +        "movd          %0, %%mm0 \n\t"
> +        "movd          %1, %%mm2 \n\t"
> +        "pshufw $0, %%mm0, %%mm0 \n\t"
> +        "pshufw $0, %%mm2, %%mm2 \n\t"
> +        "pxor       %%mm1, %%mm1 \n\t"
> +        "pxor       %%mm3, %%mm3 \n\t"
> +        "psubw      %%mm0, %%mm1 \n\t"
> +        "psubw      %%mm2, %%mm3 \n\t"
> +        "packuswb   %%mm2, %%mm0 \n\t"
> +        "packuswb   %%mm3, %%mm1 \n\t"
> +        ::"r"(dc0),
> +          "r"(dc1)
> +    );

a random idea: (untested and ignore if slower)

movd      "block[ 0]", %%mm0    //  0 0 X D
punpcklwd "block[16]", %%mm0    //  x X d D
paddsw           "32", %%mm0
psraw              $6, %%mm0
punpcklwd       %%mm0, %%mm0    //  d d D D 
pxor            %%mm1, %%mm1    //  0 0 0 0
psubw           %%mm0, %%mm1    // -d-d-D-D
packuswb        %%mm1, %%mm0    // -d-d-D-D d d D D
pshufw   $0xFA, %%mm0, %%mm1    // -d-d-d-d-D-D-D-D
punpcklwd       %%mm0, %%mm0    //  d d d d D D D D


except that, patch ok

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Awnsering whenever a program halts or runs forever is
On a turing machine, in general impossible (turings halting problem).
On any real computer, always possible as a real computer has a finite number
of states N, and will either halt in less than N cycles or never halt.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090102/74e5d575/attachment.pgp>



More information about the ffmpeg-devel mailing list