[FFmpeg-devel] [PATCH] Optimisations for av_log2 and integer clip functions
Wed Jan 13 23:21:42 CET 2010
2010/1/13 M?ns Rullg?rd <mans at mansr.com>:
> Jason Garrett-Glaser <darkshikari at gmail.com> writes:
>>> +#define av_log2 av_log2
>>> +static inline av_const int av_log2(unsigned int v)
>>> + ? ?return v? 31 - __builtin_clz(v) : 0;
>>> +#ifndef av_log2_16bit
>>> +#define av_log2_16bit av_log2
>> Won't ^31 be faster? ?"31 - X" requires an extra mov on x86.
> Maybe. ?The subtraction might play nicer with the way it's used in
> e.g. golomb.h. ?I'd be surprised if gcc could figure out such bit
> magic by itself.
>> Also, __builtin_clz/ctz maps to bsr/bsf, which are extraordinarily
>> slow on Athlons.
> Fabulous. ?So what shall we do? ?List the CPUs with good clz support
> in configure like we do with cmov et al?
For BSR (BSF is similar but not always identical):
PPro/P2/P3/PM: 2 uops
Core 2: 2/1 (latency/recip throughput)
Core i7: 3/1
Pentium 4: 4/2
Pentium 4E: 16/4 (WHAT THE FUCKETY FUCK?!)
Atom: 16/? (DIE INTEL DIE)
Via Nano: 3/2
Athlon K7: 9/9
Athlon K8 (A64): 10/10
Athlon K10 (Phenom): 4/3 (note: has SSE4a's LZCNT which is very
similar and is 2/1)
Isn't it awesome?
More information about the ffmpeg-devel