[FFmpeg-devel] [PATCH/RFC] ALT_BITSTREAMREADER?optimizations?for the platforms with strict alignment

Michael Niedermayer michaelni
Wed Dec 10 17:42:33 CET 2008


On Wed, Dec 10, 2008 at 01:54:37AM +0200, Siarhei Siamashka wrote:
> On Tuesday 09 December 2008, Michael Niedermayer wrote:
> > On Tue, Dec 09, 2008 at 12:17:03AM +0200, Siarhei Siamashka wrote:
> > > On Monday 08 December 2008, Michael Niedermayer wrote:
> > > > On Mon, Dec 08, 2008 at 12:12:41PM +0200, Siarhei Siamashka wrote:
> >
> > [...]
> >
> > > > > Index: libavcodec/svq1dec.c
> > > > > ===================================================================
> > > > > --- libavcodec/svq1dec.c	(revision 16028)
> > > > > +++ libavcodec/svq1dec.c	(working copy)
> > > > > @@ -195,7 +195,7 @@
> > > > >
> > > > >  #define SVQ1_CALC_CODEBOOK_ENTRIES(cbook)\
> > > > >        codebook = (const uint32_t *) cbook[level];\
> > > > > -      bit_cache = get_bits (bitbuf, 4*stages);\
> > > > > +      bit_cache = get_bits_long (bitbuf, 4*stages);\
> > > > >        /* calculate codebook entries for this vector */\
> > > > >        for (j=0; j < stages; j++) {\
> > > > >          entries[j] = (((bit_cache >> (4*(stages - j - 1))) & 0xF) +
> > > > > 16*j) << (level + 1);\
> > > >
> > > > this code (not only the patch) looks like it needs some work, and such
> > > > work (cleanup) would likely make the 17bit problem go away as a side
> > > > effect.
> > >
> > > Is anybody going to do this cleanup in the near future? I just wke onder
> > > whether you are suggesting me to do this work or whatever.
> >
> > at the risk of a flamewar with you, yes i suggest you do clean these 4
> > lines up instead of making the code alot slower.
> 
> ok, thanks, that's the first thing I wanted to clarify.
> 
> What do you think would be a proper cleanup in this case? I'm sorry, I still
> can't always understand you correctly with just vague hints and this is what
> may become a source of a flamewar.
> 
> This code can be changed either for better readability or for
> better performance.
> 

> In the former case, just a simple loop doing calls to 'get_bits(bitbuf, 4)'
> would be a solution.

this was my original idea


> 
> In the latter case, we want to have as little UPDATE_CACHE operations as
> possible. As the maximum value for 'stages' is 6, we don't need to read more
> than 24 bits and a single UPDATE_CACHE operation is enough for the case when
> MIN_CACHE_BITS is 25 or 32. Original code apparently aimed at minimizing
> bitstream reader related overhead and tried to get '4*stages' bits at once
> (and I can't say that I have something against this solution). Then the code
> proceeds with processing this data in the icky looking loop (this is what
> could be simplified for sure). The only real problem with this code is that
> it does not take into account bitstream readers with MIN_CACHE_BITS set to 17.

it could be done like:

if(MIN_CACHE_BITS < 24 && stages>4){
    for (j=0; j < 4; j++) {
        *entries++ = ((bit_cache >> 28) + 16*j) << (level + 1);\
        bit_cache << 4;
    }
    stages-=4;
    UPDATE_CACHE /skip /getbits whatever
}
for (j=0; j < stages; j++) {
    *entries++ = ((bit_cache >> 28) + 16*j) << (level + 1);\
    bit_cache << 4;
}


> 
> Currently bitstream reader provides functions:
> 'get_bit1' ensures 1 bit of data
> 'get_bits_long' ensures up to 32 bits of data
> 'get_bits' is specified to guarantee only up to 17 bits, but in reality is 
> implemented to provide 25 on all the platforms with unmodified ffmpeg
> sources.
> 
> I suggest introducing a complete selection of functions: read 1 bit, read up
> to 9 bits, read up to 17 bits, read up to 25 bits, read up to 32 bits. And
> codec implementors could use appropriate ones according to their requirements.
> Bitstream reader implementation can take advantage of having less bits to
> provide in some cases. For 32-bit x86 platform with ALT_BITSTREAM_READER,
> these special fast cases are 1 and 25 bits (implemented with reading int8_t or
> int32_t). Little endian platforms with strict alignment would benefit from all
> of these functions (the versions with less bits guaranteed result in faster
> and more compact the code). 64-bit x86 systems could probably get a fast 32
> bit variant as well.

my oppinion about this, is that i like to first see a benchmark proofing
an actual speed gain for some codec on some cpu.


> 
> As for this get_bits call in SVQ1_CALC_CODEBOOK_ENTRIES macro, the implementor
> probably really wants to use the variant, which provides up to 25 bits.

well, IMHO the if() as shown above should be faster than a always slower
get_bits25

> 
> > You dont have to of course but iam sure you will unerstand that i wont
> > accept a workaround instead of a proper fix.
> > And i doubt you would if you where making the decission and someone
> > else did propose it.
> 
> Sorry, I have probably written too much. Anyway, to make a short summary:
> please tell me what kind of modifications to this code you would like to
> see and I'll do it.

the one that is cleanest & fastest, i did not benchmark things so i do not
know which that is ...

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The misfortune of the wise is better than the prosperity of the fool.
-- Epicurus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20081210/b0112f5c/attachment.pgp>



More information about the ffmpeg-devel mailing list