[FFmpeg-devel] Pipeline: H.264 speed improvements

Guillaume POIRIER poirierg
Wed Dec 24 01:53:17 CET 2008


On Wed, Dec 24, 2008 at 1:46 AM, Jason Garrett-Glaser
<darkshikari at gmail.com> wrote:
> On Tue, Dec 23, 2008 at 7:40 PM, Guillaume POIRIER <poirierg at gmail.com> wrote:
>> Hello,
>> On Wed, Dec 24, 2008 at 12:02 AM, Jason Garrett-Glaser
>> <darkshikari at gmail.com> wrote:
>>> For ARM this can be special-cased.  Intel CPUs have a 1-3 cycle CLZ
>>> (depends on the CPU) but on AMD chips this can cost >10 cycles, so a
>>> table is generally preferred on x86.
>> The PPC970 (aka G5) has a 2 cycle latency for cntlzw and can do 2 of
>> these per cycle.
>> The PPC7450 (aka G4) has a 1 cycle latency.
>> Note that to the best of my knowledge, there's no PPC inline assembly
>> in FFmpeg, so this information is quite theoretical, all the most
>> since I never wrote a single PPC function in assembly.
> Couldn't the gcc intrinsic __builtin_clz() be used?  AFAIK it's
> supported by GCCs quite far back (I know 3.4 supports it).

I guess it could it's indeed documented as expending to cntlzw on PPC.
Thanks for pointing this to me. I didn't know that such an intrinsic

Only a very small fraction of our DNA does anything; the rest is all
comments and ifdefs.

P. J. O'Rourke  - "Cleanliness becomes more important when godliness
is unlikely."

More information about the ffmpeg-devel mailing list