[FFmpeg-devel] [PATCH 4/5] avcodec/h264: add avx 8-bit h264_idct_add

James Darnley jdarnley at obe.tv
Thu Apr 6 18:34:25 EEST 2017

On 2017-04-05 05:44, James Almer wrote:
> On 4/4/2017 10:53 PM, James Darnley wrote:
>> Haswell:
>>  - 1.11x faster (522±0.4 vs. 469±1.8 decicycles) compared with mmxext
>> Skylake-U:
>>  - 1.21x faster (671±5.5 vs. 555±1.4 decicycles) compared with mmxext
> Again, you should add an SSE2 version first, then an AVX one if it's
> measurably faster than the SSE2 one.

On a Yorkfield sse2 is barely faster: 1.02x faster (728±2.1 vs. 710±3.9
decicycles).  So 1 or 2 cycles

On a Skylake-U sse2 is most of the speedup: 1.15x faster (661±2.2 vs
573±1.9).  Then avx gains a mere 3 cycles: 547±0.5

On a Haswell sse2 provides only half the speedup:
 - sse2: 1.06x faster (525±2.5 vs 497±1.0 decicycles)
 - avx:  1.06x faster (497±1.0 vs 468±1.2 decicycles)

(All on 64-bit Linux)

On Nehalem and 64-bit Windows sse2 is slower:  0.92x faster (597±3.0 vs.
650±9.3 decicycles)

And on that note I should probably recheck the deblock patches I pushed
a little while ago.

So...  SSE2 for this function, yay or nay?

More information about the ffmpeg-devel mailing list