[FFmpeg-cvslog] r9504 - trunk/libavcodec/bitstream.c

Aurelien Jacobs aurel
Fri Jul 6 17:27:54 CEST 2007


On Fri, 6 Jul 2007 17:06:22 +0200
Michael Niedermayer <michaelni at gmx.at> wrote:

> Hi
> 
> On Fri, Jul 06, 2007 at 04:30:50PM +0200, Aurelien Jacobs wrote:
> > On Fri,  6 Jul 2007 16:14:41 +0200 (CEST)
> > aurel <subversion at mplayerhq.hu> wrote:
> > 
> > > Author: aurel
> > > Date: Fri Jul  6 16:14:41 2007
> > > New Revision: 9504
> > > 
> > > Log:
> > > simplify ff_copy_bits: merge 2 test branches
> > 
> > It seems ff_copy_bits was written this way for speed reason.
> > It would be easy to simplify it, but this could hurt speed.
> > 
> > I placed some START/STOP TIMER at the begining/end of ff_copy_bits.
> > Here are the numbers I get:
> > 
> > 15600 dezicycles in ff_copy_bits, 1 runs, 0 skips
> > 11815 dezicycles in ff_copy_bits, 2 runs, 0 skips
> > 9695 dezicycles in ff_copy_bits, 4 runs, 0 skips
> > 9357 dezicycles in ff_copy_bits, 8 runs, 0 skips
> > 9465 dezicycles in ff_copy_bits, 16 runs, 0 skips
> > 8726 dezicycles in ff_copy_bits, 32 runs, 0 skips
> > 10435 dezicycles in ff_copy_bits, 61 runs, 3 skips
> > 10420 dezicycles in ff_copy_bits, 124 runs, 4 skips
> > 10612 dezicycles in ff_copy_bits, 249 runs, 7 skips
> > 10358 dezicycles in ff_copy_bits, 502 runs, 10 skips
> > 9204 dezicycles in ff_copy_bits, 1011 runs, 13 skips
> > 10244 dezicycles in ff_copy_bits, 2034 runs, 14 skips
> > 9484 dezicycles in ff_copy_bits, 4081 runs, 15 skips
> > 8250 dezicycles in ff_copy_bits, 8175 runs, 17 skips
> > 6108 dezicycles in ff_copy_bits, 16367 runs, 17 skips
> > 
> > Now if I simplify the function further using the attached patch:
> > 
> > 15750 dezicycles in ff_copy_bits, 1 runs, 0 skips
> > 12415 dezicycles in ff_copy_bits, 2 runs, 0 skips
> > 10635 dezicycles in ff_copy_bits, 4 runs, 0 skips
> > 10417 dezicycles in ff_copy_bits, 8 runs, 0 skips
> > 9914 dezicycles in ff_copy_bits, 16 runs, 0 skips
> > 9368 dezicycles in ff_copy_bits, 32 runs, 0 skips
> > 9491 dezicycles in ff_copy_bits, 63 runs, 1 skips
> > 12255 dezicycles in ff_copy_bits, 126 runs, 2 skips
> > 12125 dezicycles in ff_copy_bits, 251 runs, 5 skips
> > 11608 dezicycles in ff_copy_bits, 504 runs, 8 skips
> > 13245 dezicycles in ff_copy_bits, 1014 runs, 10 skips
> > 12574 dezicycles in ff_copy_bits, 2038 runs, 10 skips
> > 11837 dezicycles in ff_copy_bits, 4085 runs, 11 skips
> > 9908 dezicycles in ff_copy_bits, 8178 runs, 14 skips
> > 7013 dezicycles in ff_copy_bits, 16370 runs, 14 skips
> > 
> > The difference don't seem very significant but it's slightly slower.
> > 
> > Now if I try to write bytes instead of words to simplify a bit more:
> > 
> > 26470 dezicycles in ff_copy_bits, 1 runs, 0 skips
> > 21670 dezicycles in ff_copy_bits, 2 runs, 0 skips
> > 19535 dezicycles in ff_copy_bits, 4 runs, 0 skips
> > 19517 dezicycles in ff_copy_bits, 8 runs, 0 skips
> > 18604 dezicycles in ff_copy_bits, 16 runs, 0 skips
> > 17125 dezicycles in ff_copy_bits, 32 runs, 0 skips
> > 19126 dezicycles in ff_copy_bits, 63 runs, 1 skips
> > 20532 dezicycles in ff_copy_bits, 126 runs, 2 skips
> > 20529 dezicycles in ff_copy_bits, 252 runs, 4 skips
> > 21121 dezicycles in ff_copy_bits, 506 runs, 6 skips
> > 20705 dezicycles in ff_copy_bits, 1015 runs, 9 skips
> > 20402 dezicycles in ff_copy_bits, 2039 runs, 9 skips
> > 18575 dezicycles in ff_copy_bits, 4085 runs, 11 skips
> > 15700 dezicycles in ff_copy_bits, 8180 runs, 12 skips
> > 10729 dezicycles in ff_copy_bits, 16372 runs, 12 skips
> > 
> > The difference seems too big for such a small simplification.
> > 
> > So I would personnaly use the simplification in the proposed patch.
> > What do you think about it ?
> > 
> > PS: note that in my tests, the length parameter varied between 6
> > and 700. The speed difference would probably be more important
> > if ff_copy_bits() is used with bigger length.
> 
> use higher bitrate, big slices, data partitioning and 2 threads
> or some combination of that ...
> 
> [...]
> 
> > Index: libavcodec/bitstream.c
> > ===================================================================
> > --- libavcodec/bitstream.c	(r??vision 9504)
> > +++ libavcodec/bitstream.c	(copie de travail)
> > @@ -69,16 +69,7 @@
> >  
> >      if(length==0) return;
> >  
> > -    if(words < 16 || put_bits_count(pb)&7){
> >          for(i=0; i<words; i++) put_bits(pb, 16, be2me_16(srcw[i]));
> > -    }else{
> > -        for(i=0; put_bits_count(pb)&31; i++)
> > -            put_bits(pb, 8, src[i]);
> > -        flush_put_bits(pb);
> > -        memcpy(pbBufPtr(pb), src+i, 2*words-i);
> > -        skip_put_bytes(pb, 2*words-i);
> > -    }
> > -
> >      put_bits(pb, bits, be2me_16(srcw[words])>>(16-bits));
> 
> hmm what about placing the simplification under #ifdef CONFIG_SMALL
> or puttin something like || ENABLE_SMALL in the if() ?

I like this idea a lot :-)
I intend to apply attached patch, but I first need to get ENABLE_SMALL
defined by configure.

Aurel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: simplify-ff-copy-bits-2.diff
Type: text/x-diff
Size: 470 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-cvslog/attachments/20070706/34575c24/attachment.diff>



More information about the ffmpeg-cvslog mailing list