[FFmpeg-devel] swscale/rgb2rgb : add X86_64 SIMD (SSSE3 and AVX2) for shuffly_bytes func

Carl Eugen Hoyos ceffmpeg at gmail.com
Sun Mar 18 20:07:54 EET 2018


2018-03-18 19:04 GMT+01:00, Paul B Mahol <onemda at gmail.com>:
> On 3/18/18, Carl Eugen Hoyos <ceffmpeg at gmail.com> wrote:
>> 2018-03-18 18:20 GMT+01:00, Paul B Mahol <onemda at gmail.com>:
>>> On 3/18/18, Carl Eugen Hoyos <ceffmpeg at gmail.com> wrote:
>>>> 2018-03-18 17:46 GMT+01:00, Martin Vignali <martin.vignali at gmail.com>:
>>>>> 2018-03-18 17:37 GMT+01:00 Paul B Mahol <onemda at gmail.com>:
>>>>>
>>>>>> On 3/18/18, Nicolas George <george at nsup.org> wrote:
>>>>>> > Martin Vignali (2018-03-18):
>>>>>> >> I run the test again with a bigger width (512 instead of 128)
>>>>>> >> This is my result :
>>>>>> >> shuffle_bytes_0321_c: 128.6
>>>>>> >> shuffle_bytes_0321_ssse3: 41.6
>>>>>> >> shuffle_bytes_0321_avx2: 23.4
>>>>>> >
>>>>>> > IIUC, these benchmarks are expressed in CPU cycles. But what James
>>>>>> > says
>>>>>> > is that it can cause the CPU frequency to be throttled: if that
>>>>>> > happens,
>>>>>> > less cycles can use more time, and even worse, cause other unrelated
>>>>>> > to
>>>>>> > take more time. A benchmark in actual time and typical use case
>>>>>> > would
>>>>>> > be
>>>>>> > needed to decide.
>>>>>>
>>>>>> Yes, always also test overall with typical code usecase.
>>>>
>>>> +1
>>>>
>>>>> I tested it using a "benchmark" command line, who test two shuffle func
>>>>> ./ffmpeg -benchmark -f lavfi -i rgbtestsrc=size=3840x2160:duration=10
>>>>> -vf
>>>>> format=argb,format=rgba -f null -
>>>>>
>>>>> With the patch :
>>>>> bench: utime=3.611s
>>>>> With only SSSE 3 (disable AVX2 part), i have similar result.
>>>>
>>>> Indicating James' original comment that the avx2 optimization
>>>> makes no sense is correct?
>>>
>>> You are almost always wrong.
>>
>> I tend to agree but I wonder how you know that I am wrong here:
>> What in above mail indicates that avx2 has an advantage over
>> ssse3?
>
> It might work with new CPUs much better.

"might" indicates that I am wrong?
Your reasoning is not much better than mine...

Carl Eugen


More information about the ffmpeg-devel mailing list