[FFmpeg-devel] [PATCH] libavutil: add x86 optimized av_popcount

Clément Bœsch u at pkh.me
Wed Feb 25 16:43:51 CET 2015


On Tue, Feb 24, 2015 at 10:05:24PM -0300, James Almer wrote:
> Signed-off-by: James Almer <jamrial at gmail.com>
> ---
> I decided to go the configure route since other features (cmov, clz) also do
> it , but if prefered this could instead be done with a new intmath.h header 
> in the x86/ folder containing something like
> 
> #if defined(__GNUC__) && defined(__POPCNT__)
>     #define av_popcount   __builtin_popcount
> #if ARCH_X86_64
>     #define av_popcount64 __builtin_popcountll
> #endif
> #endif
> 
> For a cleaner compile time check.
> 
>  configure           | 12 ++++++++++--
>  libavutil/intmath.h | 13 +++++++++++++
>  2 files changed, 23 insertions(+), 2 deletions(-)
> 

For the record, the builtin implementation looks like this here:

0000000000000000 <av_popcount_c>:
   0:   89 f8                   mov    %edi,%eax
   2:   d1 e8                   shr    %eax
   4:   25 55 55 55 55          and    $0x55555555,%eax
   9:   29 c7                   sub    %eax,%edi
   b:   89 fa                   mov    %edi,%edx
   d:   c1 ef 02                shr    $0x2,%edi
  10:   81 e2 33 33 33 33       and    $0x33333333,%edx
  16:   81 e7 33 33 33 33       and    $0x33333333,%edi
  1c:   8d 04 17                lea    (%rdi,%rdx,1),%eax
  1f:   89 c2                   mov    %eax,%edx
  21:   c1 ea 04                shr    $0x4,%edx
  24:   01 d0                   add    %edx,%eax
  26:   25 0f 0f 0f 0f          and    $0xf0f0f0f,%eax
  2b:   89 c2                   mov    %eax,%edx
  2d:   c1 ea 08                shr    $0x8,%edx
  30:   01 d0                   add    %edx,%eax
  32:   89 c2                   mov    %eax,%edx
  34:   c1 ea 10                shr    $0x10,%edx
  37:   01 d0                   add    %edx,%eax
  39:   83 e0 3f                and    $0x3f,%eax
  3c:   c3                      retq   
  3d:   0f 1f 00                nopl   (%rax)

0000000000000040 <popcount_gcc>:
  40:   48 83 ec 08             sub    $0x8,%rsp
  44:   89 ff                   mov    %edi,%edi
  46:   e8 00 00 00 00          callq  4b <popcount_gcc+0xb>
  4b:   48 83 c4 08             add    $0x8,%rsp
  4f:   c3                      retq   

0000000000000040 <popcount_clang>:
  40:   89 f8                   mov    %edi,%eax
  42:   d1 e8                   shr    %eax
  44:   25 55 55 55 55          and    $0x55555555,%eax
  49:   29 c7                   sub    %eax,%edi
  4b:   89 f8                   mov    %edi,%eax
  4d:   25 33 33 33 33          and    $0x33333333,%eax
  52:   c1 ef 02                shr    $0x2,%edi
  55:   81 e7 33 33 33 33       and    $0x33333333,%edi
  5b:   01 c7                   add    %eax,%edi
  5d:   89 f8                   mov    %edi,%eax
  5f:   c1 e8 04                shr    $0x4,%eax
  62:   01 f8                   add    %edi,%eax
  64:   25 0f 0f 0f 0f          and    $0xf0f0f0f,%eax
  69:   69 c0 01 01 01 01       imul   $0x1010101,%eax,%eax
  6f:   c1 e8 18                shr    $0x18,%eax
  72:   c3                      retq   

We might see relevant "optimizations" for our reference code.

[...]

-- 
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20150225/40a6b840/attachment.asc>


More information about the ffmpeg-devel mailing list