[FFmpeg-devel] [PATCH] libavutil: add x86 optimized av_popcount
jamrial at gmail.com
Wed Feb 25 18:25:50 CET 2015
On 25/02/15 9:41 AM, Ronald S. Bultje wrote:
> On Tue, Feb 24, 2015 at 8:05 PM, James Almer <jamrial at gmail.com> wrote:
>> +#if HAVE_FAST_POPCNT
>> +#if AV_GCC_VERSION_AT_LEAST(4,5)
>> +#ifndef av_popcount
>> + #define av_popcount __builtin_popcount
>> +#endif /* av_popcount */
>> +#if HAVE_FAST_64BIT
>> +#ifndef av_popcount64
>> + #define av_popcount64 __builtin_popcountll
>> +#endif /* av_popcount64 */
>> +#endif /* HAVE_FAST_64BIT */
>> +#endif /* AV_GCC_VERSION_AT_LEAST(4,5) */
>> +#endif /* HAVE_FAST_POPCNT */
> Is this just to get the sse4 popcnt instruction if we compile with
> -mcpu=sse4? The slightly odd thing is that we're using a built-in, yet
> configure still does an arch/cpu check. I'd expect the built-in/compiler to
> do that for us based on -mcpu, and we could always unconditionally use this
> (as long as gcc >= 4.5); alternatively, you could use inline asm and then
> have the configure check (HAVE_FAST_POPCNT). But doing both seems a little
> odd. I have no objection to it, patch is still fine, just odd.
I purposely made the checks for gcc 4.5 and in configure for cpus that support popcnt
because otherwise __builtin_popcount (at least gcc's) is slower than our generic
av_popcount_c function from lavu/common.h.
When the CPU supports popcnt the builtin becomes a single inlined instruction.
I tried the __asm__ approach, but the code generated by the builtin seemed better.
And I agree it looks odd and maybe way too specific, which is why i said i can add
this to a new header in the x86/ folder instead.
Patch attached. I don't have clang so i can't test it, nor i know how to check for a
version that supports the builtin.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 2230 bytes
Desc: not available
More information about the ffmpeg-devel