[FFmpeg-devel] [PATCH 3/4] avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter

James Darnley jdarnley at obe.tv
Wed Dec 7 15:07:03 EET 2016


On 2016-12-07 11:07, Carl Eugen Hoyos wrote:
> 2016-12-05 19:32 GMT+01:00 James Darnley <jdarnley at obe.tv>:
> 
>>  - sse2: 2.47x (170 vs.  69 cycles)
>>  - avx:  2.47x (170 vs.  69 cycles)
> 
> Please elaborate on why this was committed.

Because writing it cost almost zero time.  All it needed was writing the
dsp pointer assignment.  Preventing the function from being created
(with more %ifs) would have required another patch set being sent
through review.

Because a few instructions using 3 operand form should be quicker.  The
fact that it doesn't show is no doubt down to the out of order execution
managing to do the moves earlier than written.

Because it is future proof.  Someone may write a better AVX or a new
instruction version of the macros used.  A CPU may appear which
deprecates all SIMD without the VEX prefix.  FFmpeg may allow disabling
of old instruction sets without disabling new ones.

These last three reasons are each more unlikely than the previous.

And now for some more detailed stats, collected for 50 runs, each with
500k calls to the function in question:
> sse2: min: 687, max: 774, mean: 690.041, stddev: 12.155
> avx:  min: 681, max: 721, mean: 685.469, stddev: 9.083



More information about the ffmpeg-devel mailing list