[FFmpeg-devel] [PATCH] move H264 IDCT to yasm

Alexander Strange astrange
Tue Sep 7 05:31:35 CEST 2010


On Sep 6, 2010, at 5:00 PM, Ronald S. Bultje wrote:

> Hi,
> 
> this patch moves H264 IDCT (the LGPL part) to yasm. Performance for
> most loopy parts is improved quite a bit because gcc is completely
> retarded when it comes to setting up loops (I'm not joking here), some
> up to 50%. Performance for one particular function (intra16_mmx2) is
> mildly worse (a few cycles) and I don't quite understand why, the code
> is identical. This might be related to alignment (gcc aligns the parts
> that it jmps to using nops, I don't yet know how to do that in yasm),
> otherwise I don't really know. Let me know if you want detailed
> performance statistics for each function.
> 
> Ronald
> <yamsify-h264_idct.patch>

> +cglobal h264_idct_add16intra_mmx2, 5, 7, 0
> +    xor          r5, r5
> +.nextblock
> +%ifdef PIC;f660-f7f9=199=256+144+9=409 (mine), theirs=1e70-2034=

What's with the comment?



More information about the ffmpeg-devel mailing list