[FFmpeg-devel] [PATCH] Further optimization of base64 decode using AV_WB32.

Michael Niedermayer michaelni at gmx.at
Sat Jan 21 22:14:07 CET 2012


On Sat, Jan 21, 2012 at 09:29:02PM +0100, Reimar Döffinger wrote:
> On Sat, Jan 21, 2012 at 08:47:59PM +0100, Michael Niedermayer wrote:
> > On Sat, Jan 21, 2012 at 06:53:53PM +0100, Reimar Döffinger wrote:
> > > On Sat, Jan 21, 2012 at 06:39:29PM +0100, Reimar Döffinger wrote:
> > > > On Sat, Jan 21, 2012 at 06:30:48PM +0100, Reimar Döffinger wrote:
> > > > > On Sat, Jan 21, 2012 at 06:13:19PM +0100, Reimar Döffinger wrote:
> > > > > > On Sat, Jan 21, 2012 at 05:56:32PM +0100, Michael Niedermayer wrote:
> > > > > > > On Sat, Jan 21, 2012 at 05:52:27PM +0100, Reimar Döffinger wrote:
> > > > > > > > This is somewhat questionable.
> > > > > > > > The biggest issue is that av_bswap32 is not replaced
> > > > > > > > with our asm version on gcc 4.5 or newer.
> > > > > > > > This causes gcc to generate horrible code that is slower
> > > > > > > > than the unoptimized variant.
> > > > > > > > Old:                                  248852 decicycles
> > > > > > > > New with gcc's attempt at av_bswap32: 256576 decicycles
> > > > > > > > New with our bswap32:                 200260 decicycles
> > > > > > > [...]
> > > > > > > > diff --git a/libavutil/x86/bswap.h b/libavutil/x86/bswap.h
> > > > > > > > index 52ffb4d..aa39d97 100644
> > > > > > > > --- a/libavutil/x86/bswap.h
> > > > > > > > +++ b/libavutil/x86/bswap.h
> > > > > > > > @@ -37,7 +37,7 @@ static av_always_inline av_const unsigned av_bswap16(unsigned x)
> > > > > > > >  }
> > > > > > > >  #endif /* !AV_GCC_VERSION_AT_LEAST(4,1) */
> > > > > > > >  
> > > > > > > > -#if !AV_GCC_VERSION_AT_LEAST(4,5)
> > > > > > > > +#if 1 || !AV_GCC_VERSION_AT_LEAST(4,5)
> > > > > > > >  #define av_bswap32 av_bswap32
> > > > > > > >  static av_always_inline av_const uint32_t av_bswap32(uint32_t x)
> > > > > > > >  {
> > > > > > > 
> > > > > > > also make sure -cpu/arch/tune is set so gcc is allowed to use bswap
> > > > > > > (its 486+) so not possible for gcc to use on strict x86
> > > > > > 
> > > > > > It is a x86_64 build, so I'd hope that gcc will not try to "optimize"
> > > > > > of 486 on that...
> > > > > 
> > > > > gcc version is actually 4.6.2 and it fails to use the bswap instruction
> > > > > regardless whether I use no extra options, -march=native, -m32, -m32
> > > > > -march=native.
> > > > > In all cases the code without our inline bswap is significantly slower
> > > > > (ca. 20%).
> > > > > I have no idea where the claim that gcc would recognize the bswap comes
> > > > > from (hm, I haven't tested if the << 8 confuses it though, will now).
> > > > 
> > > > Yes, only completely removing the shift fixes it.
> > > > One option would be to make the table 16 bit to avoid that shift.
> > > > However my tests show that even though this saves the shift instruction
> > > > the code does not become any faster in 64 bit mode and only maybe 2% in
> > > > 32 bit mode (except of course for unbreaking the compiler), so it
> > > > seems quite wasteful.
> > > 
> > > This works, too, it actually seems about 2% faster than with out bswap
> > > asm (I assume better scheduling):
> > 
> > what effect does this have on ARM and or (lets randomly pick) mpeg2
> > decoding speed ?
> 
> On my beagleboard it compiles to the rev instruction, just like our
> inline asm for example.
> For decoding speed, a quick test using "time" and the fate MPEG2 sample would
> indicate no change, but the accuracy of that test is quite useless.
> Why, what are the options to decide between?

well, current code, builtin and asm for various functions and compiler
versions and architectures.

iam happy with any change that doesnt slow any real world relevant
case down

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I do not agree with what you have to say, but I'll defend to the death your
right to say it. -- Voltaire
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120121/357b0fd7/attachment.asc>


More information about the ffmpeg-devel mailing list