[FFmpeg-devel] [PATCH] x86/intmath: allow the source operand for icc/icl ff_ctzll to be a memory location

James Almer jamrial at gmail.com
Sun Oct 25 18:46:00 CET 2015


On 10/25/2015 1:39 PM, Matt Oliver wrote:
> On 26 October 2015 at 00:43, Hendrik Leppkes <h.leppkes at gmail.com> wrote:
> 
>> On Sun, Oct 25, 2015 at 12:30 PM, Matt Oliver <protogonoi at gmail.com>
>> wrote:
>>> On 25 October 2015 at 22:16, Hendrik Leppkes <h.leppkes at gmail.com>
>> wrote:
>>>
>>>> On Sun, Oct 25, 2015 at 11:26 AM, Matt Oliver <protogonoi at gmail.com>
>>>> wrote:
>>>>> On 25 October 2015 at 12:22, Ganesh Ajjanagadde <gajjanag at mit.edu>
>>>> wrote:
>>>>>
>>>>>> On Sat, Oct 24, 2015 at 7:03 PM, James Almer <jamrial at gmail.com>
>> wrote:
>>>>>>> On 10/24/2015 7:48 PM, Ganesh Ajjanagadde wrote:
>>>>>>>> On Sat, Oct 24, 2015 at 6:08 PM, James Almer <jamrial at gmail.com>
>>>> wrote:
>>>>>>>>> This gives the compiler some flexibility
>>>>>>>>>
>>>>>>>>> Signed-off-by: James Almer <jamrial at gmail.com>
>>>>>>>>> ---
>>>>>>>>>  libavutil/x86/intmath.h | 2 +-
>>>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>> diff --git a/libavutil/x86/intmath.h b/libavutil/x86/intmath.h
>>>>>>>>> index 7881e3c..10d3e7f 100644
>>>>>>>>> --- a/libavutil/x86/intmath.h
>>>>>>>>> +++ b/libavutil/x86/intmath.h
>>>>>>>>> @@ -36,7 +36,7 @@ static av_always_inline av_const int
>>>>>> ff_ctzll_x86(long long v)
>>>>>>>>>  {
>>>>>>>>>  #   if ARCH_X86_64
>>>>>>>>>      uint64_t c;
>>>>>>>>> -    __asm__("bsfq %1,%0" : "=r" (c) : "r" (v));
>>>>>>>>> +    __asm__ ("bsfq %1,%0" : "=r" (c) : "rm" (v));
>>>>>>>>>      return c;
>>>>>>>>>  #   else
>>>>>>>>>      return ((uint32_t)v == 0) ? _bit_scan_forward((uint32_t)(v
>>>>
>>>>>> 32)) + 32 : _bit_scan_forward((uint32_t)v);
>>>>>>>>> --
>>>>>>>>> 2.6.2
>>>>>>>>
>>>>>>>> This question may be silly, but can you clarify when this asm
>> code is
>>>>>>>> used instead of __builtin_ctzll? For instance, I think on
>> GNU/Linux,
>>>>>>>> x86-64, sufficiently modern GCC (even with asm enabled), we should
>>>>>>>> always use the builtin: the compiler knows best what to do with
>> its
>>>>>>>> builtin.
>>>>>>>
>>>>>>> This is ICC/ICL, not GCC.
>>>>>>
>>>>>> Ah, now I recall that _bit_scan_forward64 was not always available on
>>>>>> ICC. Anyway, this is just a heads up to whoever uses ICC/ICL and the
>>>>>> like: you may want to find out to which versions this built in
>>>>>> applies.
>>>>>
>>>>>
>>>>> bit_scan_forward64 isnt available on ICL at all, hence the use of asm.
>>>>>
>>>>
>>>> Intels intrinsic guide lists _BitScanForward64 as the intrinsic to use,
>>>> however.
>>>>
>>>>
>> https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=Other&expand=373
>>>
>>>
>>>  _BitScanForward64 is a msvc intrinsic only available on windows
>> platforms
>>> so not usable with ICC. The native asm implementation also performs
>> better
>>> with ICL hence its use.
>>
>> The "Intel(R) C++ Compiler User and Reference Guide" would care to
>> disagree with you.
>>
> 
> Well ill be... I dont have ICC on hand to check it but ill take Intels word
> for it. Its interesting that Intel decided to support that intrinsic as
> opposed to just extending there own version to 64b (i.e.
> _bit_scan_forward64). _BitScanForward64 as far as I know was introduced by
> msvc and made available in "winnt.h" hence the different naming convention
> to other intrinsics. That said the inline asm version is still preferable
> with Intel as the intrinsic passes the result via pointer which when tested
> with msvc11 and 12 crt's resulted in horrible performance hits as the
> compiler didnt optimise out the memory access in order to keep the result
> in register (works fine with newer ICL 16 and msvcrt14 though).

Tried ICC 13 x86_64 from https://gcc.godbolt.org/ and it doesn't recognize
_BitScanForward64, or _BitScanForward for that matter.

> 
> Using tzcnt would be the more interesting option. Changing the asm to tzcnt
> works with ICL. But also the use of the tzcnt intrinsic (_tzcnt_u64) would
> be possible with both intel and msvc. I dont have anything older than
> Haswell or Piledriver to double check but I have seen its use in several
> other projects without issue so in theory it shouldnt be a problem.
> 
> If tzcnt is preferable I can make up a patch to convert the intel and msvc
> versions to use the required intrinsic.



More information about the ffmpeg-devel mailing list