[FFmpeg-devel] [PATCH] Optimization of original IFF codec

Michael Niedermayer michaelni
Sun Apr 25 23:48:53 CEST 2010


On Sun, Apr 25, 2010 at 01:49:54PM +0200, Sebastian Vater wrote:
> Hey to all!
> 
> I have a new (and my first git patch ;-)) ready for optimizing the stuff.
> I also did move some if's out of a critical loop (checking whether we
> have 8 bit or 32 bit output, as well as interleaved).
> 

> This elimitates most of the inner-loop branches and thus reduces stalls
> because of wrong branch prediction, which is quite expensive.

the code, checks pix_fmt and codec_tag once per row of bits or less often
this is not inside the innermost loop, nor changing thus not wrongly predicted
and its not expensive either.


> 
> Michael Niedermayer a ?crit :
> > amongth all these optimizations, i am wondering how much faster things become
> > does that inline speed the code up?
> > does the changing to unsigned?
> > you can test easily by using the START/STOP_TIMER makros
> >   
> I was relooking at that piece of code again and just found that the
> division is not required at all.
> 

> Unsigned changes because it allows to assume the compiler that it can
> replace * 8 with << 3.

no, *8 and <<3 are identical operations for signed numbers as well
the difference is with divisions


> > (buf_size * 8 + bps - 1) / bps
> > could be done outside the loop
> >   
> Fixed.
> > and the 2 loops look like they could be done as one loop
> >   
> Fixed.
> > that loop then can be unrolled by a factor of 4 and its inside for the
> > uint8_t type case be implemented like:
> >     v= lut[get_bits(&gb, 4)];
> >     AV_WN32A(dst+b, AV_RN32A(dst+b) | v);
> >   
> The thing is that type can be both uint8_t and uint32_t. It's a #define
> macro which gets the type (uint8_t or uint32_t) passed by.
> 
> So not fixed yet because I'm unsure here, if those two lines can be done
> with dst being uint32_t also.

they can, and it will speed the uint8 case up significantly

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Avoid a single point of failure, be that a person or equipment.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100425/8e16a53c/attachment.pgp>



More information about the ffmpeg-devel mailing list