[Ffmpeg-devel] Re: [MPlayer-dev-eng] [PATCH] mp3 decoding performance on ARM (mpegaudiodec.c)

Siarhei Siamashka siarhei.siamashka
Wed Aug 23 21:07:00 CEST 2006


On Sunday 20 August 2006 22:27, Michael Niedermayer wrote:

First sorry for a late replay, was quite busy for the last few days.

> > Also there seems to be some minor portability problem in this decoder
> > as arm and x86  builds produce different results when decoding mp3
> > using ffmp3 (and this problem is not related to my patch). But both wav
> > files play ok if you listen to them.
>
> how different are they? +-1 differences or something significant?

Not much different generally, here is a part of 'cmp -b -l arm.wav x86.wav'
output:
...
  563685 211 M-^I 210 M-^H
  563693 302 M-B  303 M-C
  563705 247 M-'  250 M-(
  563711 133 [    132 Z
  563715 200 M-^@ 177 ^?
  563721  27 ^W    30 ^X
  563729 305 M-E  304 M-D
  563731 160 p    161 q
  563757  33 ^[    32 ^Z
  563771  71 9     70 8
  563779  45 %     44 $
  563781 323 M-S  324 M-T
  563795 334 M-\  333 M-[
  563799  67 7     66 6
...

Maybe it could happen because of some differences in floating point precision
for some constants calculation on initialization? I'll try to dump values of
all constants to check if this is true.

> > The next improvement can be inline asm for MACS (it does not seem to
> > be used though) and MULS macros. I have a patch for them too and it
> > reduces decoding time for another 5-8 seconds, but it requires the
> > availablity of armv5 edsp instructions (and mplayer currently requires
> > only armv4 architecture + can be configured to use iwmmx on intel
> > xscale).
> >
> > For more noticeable performance optimizations I guess it is better to
> > benchmark mp3 decoding with valgrind (callgrind tool), find what takes
> > the most time and focus on optimizing it (and hope that performance
> > bottlenecks are the same for x86 and arm). At least I can verify if the
> > code generated by the compiler is efficient or not in that parts.
>
> well, a few ideas
> * change code so that multiplies need exactly 32bit >> (not easy)
> * look at the *dct functions, compare against other OSS decoders and port
>   the fastest to lavc (easy)
> * do the same with *synth_filter()

Thanks for the ideas, I'll try them a bit later. Anyway, seems like the
discussion already moved to ffmpeg mailing list so it is better to continue
there :)




More information about the ffmpeg-devel mailing list