[FFmpeg-devel] [PATCH] Optimization of AMR NB and WB decoders for MIPS

Thu May 24 18:30:37 CEST 2012

>> Hi Vitor,
>>
>>> av_always_inline in a .c file? I don't think it can work...
>> you are right. I'll change this.
>>
>>> Here and in several other places: in general, it is a good idea to do
>>> your loops also in ASM. That way, you can be sure that the compiler
>>> cannot slow-down a otherwise good code.
>> Good point. However, compilers are generally doing good job optimizing this
>> kind of code for MIPS. Anyhow, loops are left in C only in places where
>> compilers can’t do a lot of damage, or where it would be too costly to do the
>> loops in ASM (for example because of constraint on number of named registers
>> that can be used in inline assembly)
>
>Would you mind compiling these files with -S and send the ASM output? 
>One useless instruction in a inner loop can be rather costly.
In attached file acelp_filters_mips.S you can see what compiler generates for loop
in question.
Compiler adds only counter increment in the loop: "addiu	$11,$11,1". 
Taking in consideration that architecture for which this was optimized has dual 
out-of-order instruction issue and that FPU has separate in-order dual-issue pipeline 
decoupled from integer pipeline this doesn't bring cost in performance. 
On the other hand I have to admit that for older architectures it would be better to 
write entire loop in ASM and counter incrementing would be eliminated.
So my question is do you want me to rewrite loops in ASM for cases like this?

>
>>> If you do such unrolling, please say in the documentation of the
>>> function (in acelp_filter.h) that n should be a multiple of eight or
>>> any other thing that it requires. Also, please note that this code is
>>> not only used for AMR, but also for other vocoders. Can you check if
>>> the other uses respect the multiple-of-eight constraint?
>> We checked if unrolling can be done (regarding multiple-of-eight constraint) in all
>> the places where this function is called. The same goes for all the function where
>> we done loop unrolling.
>> Comment will be put in documentation of the function. Should I specify that the
>> comment is just for MIPS optimizations?
>>
>>> Hmm, am I missing something or you are doing all this to save a single
>>> pre-calculation of (1 + lsp[lp_order - 1])?
>> You are correct. We decided to optimize this function (in respect to our profiling
>> results)and this was the only meaningful thing to do with our instruction set. We got
>> some speed up on this function and decided to leave it in the code taking in account
>> document optimization.txt (section: When is an optimization justified? ).
>
>Can you attach the ASM compiler output of the C version and yours? I 
>don't really understand why your is faster. Maybe the compiler is to 
>dump to see that lsp and qa does not alias...
In attached files lsp_c.S and lsp_mips.S are ASM compiler outputs for
C version and our version respectively.
I didn't explained main reason for this optimization correctly. As you can
see in ASM outputs, compiler doesn't use FP MADD instructions although it can
benefit from them in this case. That is the main reason for this optimization.

Toolchain used is Sourcery G++ 4.5.1.

-Nedeljko
________________________________________
From: Vitor Sessak [vitor1001 at gmail.com]
Sent: Wednesday, May 23, 2012 20:34
To: Babic, Nedeljko
Cc: FFmpeg development discussions and patches; Lukac, Zeljko
Subject: Re: [FFmpeg-devel] [PATCH] Optimization of AMR NB and WB decoders for MIPS

On 05/23/2012 06:54 PM, Babic, Nedeljko wrote:
> Hi Vitor,
>
>> av_always_inline in a .c file? I don't think it can work...
> you are right. I'll change this.
>
>> Here and in several other places: in general, it is a good idea to do
>> your loops also in ASM. That way, you can be sure that the compiler
>> cannot slow-down a otherwise good code.
> Good point. However, compilers are generally doing good job optimizing this
> kind of code for MIPS. Anyhow, loops are left in C only in places where
> compilers can’t do a lot of damage, or where it would be too costly to do the
> loops in ASM (for example because of constraint on number of named registers
> that can be used in inline assembly)

Would you mind compiling these files with -S and send the ASM output?
One useless instruction in a inner loop can be rather costly.

>> If you do such unrolling, please say in the documentation of the
>> function (in acelp_filter.h) that n should be a multiple of eight or
>> any other thing that it requires. Also, please note that this code is
>> not only used for AMR, but also for other vocoders. Can you check if
>> the other uses respect the multiple-of-eight constraint?
> We checked if unrolling can be done (regarding multiple-of-eight constraint) in all
> the places where this function is called. The same goes for all the function where
> we done loop unrolling.
> Comment will be put in documentation of the function. Should I specify that the
> comment is just for MIPS optimizations?
>
>> Hmm, am I missing something or you are doing all this to save a single
>> pre-calculation of (1 + lsp[lp_order - 1])?
> You are correct. We decided to optimize this function (in respect to our profiling
> results)and this was the only meaningful thing to do with our instruction set. We got
> some speed up on this function and decided to leave it in the code taking in account
> document optimization.txt (section: When is an optimization justified? ).

Can you attach the ASM compiler output of the C version and yours? I
don't really understand why your is faster. Maybe the compiler is to
dump to see that lsp and qa does not alias...

-Vitor
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: acelp_filters_mips.S
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120524/b2751b10/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lsp_mips.S
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120524/b2751b10/attachment-0001.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lsp_c.S
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20120524/b2751b10/attachment-0002.ksh>