[FFmpeg-devel] Patch: Inline asm fixes for Intel compiler on Windows

Matt Oliver protogonoi at gmail.com
Sat Mar 29 05:18:44 CET 2014


OK here is a slightly different approach to fixing the missing CLTD
instruction support for icl inline asm. After looking into the situation
further I noticed that CLTD is only used in 2 places and in both places it
is for generating a sign mask.

Since on most modern processors CLTD (CDQ technically) is a 2 clock cycle
instruction then a sign mask can alternately be created using an arithmetic
right shift (which is always a 1 clock cycle instruction). Although this
often requires an extra mov instruction to backup the register contents
this extra mov is again 1 clock cycle so the net difference is nothing. So
on most current processors replacing it has zero net difference (in fact
the uops for cdq are often an mov/sar anyway). Its only different on older
processors such as AMDs K8/K9/Jaguar and PentiumM etc. that actually have a
1 clock cycle cdq. To try and rectify I renamed some registers to optimize
performance through removing pipeline stalls and allowing throughput
optimization so that the mov/sar can be dual issued on any processor that
supports it. This should reduce times by 1 clock cycle.

So the net performance difference depends on processor but it will either
be zero diff or potentially 1 clock cycle faster (changing it in MASK_ABS
removes the fixed eax/edx register requirements and allows the compiler to
arrange things as it wants. This allows it to remove a mov or 2 aswell).
Since we are talking 1 clock cycle here then benchmarking shows no real
world difference between the 2 approaches. However this second approach
does work on icl whereas the previous does not. So I want technically call
this an optimization as it has no real world performance diff but it does
fix icl inline asm support.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Optimize-inline-asm-to-avoid-cdq-instruction.patch
Type: application/octet-stream
Size: 3009 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140329/80870cd8/attachment.obj>


More information about the ffmpeg-devel mailing list