[FFmpeg-devel] [PATCH] Optimization of AMR NB and WB decoders for MIPS
vitor1001 at gmail.com
Wed May 23 20:34:18 CEST 2012
On 05/23/2012 06:54 PM, Babic, Nedeljko wrote:
> Hi Vitor,
>> av_always_inline in a .c file? I don't think it can work...
> you are right. I'll change this.
>> Here and in several other places: in general, it is a good idea to do
>> your loops also in ASM. That way, you can be sure that the compiler
>> cannot slow-down a otherwise good code.
> Good point. However, compilers are generally doing good job optimizing this
> kind of code for MIPS. Anyhow, loops are left in C only in places where
> compilers can’t do a lot of damage, or where it would be too costly to do the
> loops in ASM (for example because of constraint on number of named registers
> that can be used in inline assembly)
Would you mind compiling these files with -S and send the ASM output?
One useless instruction in a inner loop can be rather costly.
>> If you do such unrolling, please say in the documentation of the
>> function (in acelp_filter.h) that n should be a multiple of eight or
>> any other thing that it requires. Also, please note that this code is
>> not only used for AMR, but also for other vocoders. Can you check if
>> the other uses respect the multiple-of-eight constraint?
> We checked if unrolling can be done (regarding multiple-of-eight constraint) in all
> the places where this function is called. The same goes for all the function where
> we done loop unrolling.
> Comment will be put in documentation of the function. Should I specify that the
> comment is just for MIPS optimizations?
>> Hmm, am I missing something or you are doing all this to save a single
>> pre-calculation of (1 + lsp[lp_order - 1])?
> You are correct. We decided to optimize this function (in respect to our profiling
> results)and this was the only meaningful thing to do with our instruction set. We got
> some speed up on this function and decided to leave it in the code taking in account
> document optimization.txt (section: When is an optimization justified? ).
Can you attach the ASM compiler output of the C version and yours? I
don't really understand why your is faster. Maybe the compiler is to
dump to see that lsp and qa does not alias...
More information about the ffmpeg-devel