[FFmpeg-devel] [PATCH/RFC] Add some dsputil functions useful for AAC decoder

Robert Swain robert.swain
Sun Sep 20 15:18:20 CEST 2009


Hello,

2009/9/20 M?ns Rullg?rd <mans at mansr.com>:
> Michael Niedermayer <michaelni at gmx.at> writes:
>> On Fri, Sep 18, 2009 at 11:11:55PM +0100, Mans Rullgard wrote:
>>> This patch adds a few dsputil functions that can be used in the AAC
>>> decoder.
>>>
>>> With trivial NEON versions of these functions, the AAC decoder gets
>>> ~1.6x faster on Cortex-A8, and better NEON code will push that even
>>> further.
>>>
>>> I will readily admit that some of the names in this patch are rubbish,
>>> so please suggest something better. ?Other enhancements are obviously
>>> welcome too.
>> [...]
>>
>>> diff --git a/libavcodec/dsputil.h b/libavcodec/dsputil.h
>>> index d9d7d16..61252f5 100644
>>> --- a/libavcodec/dsputil.h
>>> +++ b/libavcodec/dsputil.h
>>> @@ -397,6 +397,14 @@ typedef struct DSPContext {
>>> ? ? ?/* assume len is a multiple of 8, and arrays are 16-byte aligned */
>>> ? ? ?void (*int32_to_float_fmul_scalar)(float *dst, const int *src, float mul, int len);
>>> ? ? ?void (*vector_clipf)(float *dst /* align 16 */, const float *src /* align 16 */, float min, float max, int len /* align 16 */);
>>> + ? ?void (*vector_fmul_scalar)(float *dst, const float *src, float mul,
>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? int len);
>>> + ? ?void (*vector_fmul_scalar_vp[2])(float *dst, const float *src,
>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? const float **vp, float mul, int len);
>>> + ? ?void (*vp_fmul_scalar[2])(float *dst, const float **vp,
>>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?float mul, int len);

vp means vector pair? How common are these operations?

>>> + ? ?float (*scalarproduct_float)(const float *v1, const float *v2, int len);
>>> + ? ?void (*butterflies_float)(float *v1, float *v2, int len);

[...]

>> also, without seeing how these all are used i do have the feeling that
>> they maybe are too small primitives and that bigger chunks of aac code
>> should be optimized to increase flexibility and reduce call overhead ...

Why would optimising a larger chunk of code increase flexibility?

> See attached patch.

len can be calculated just inside the for () loop over i.

>> and i would suggest to only optimize code when it matters speedwise and
>> not when the code just makes up <1% of the cpu time, alex reply made
>> me think that this may apply to some code in there ...
>
> 1.6x speedup matters to me.

+1. But, what effect on performance does each function (or function
type) permit?

Regards,
Rob



More information about the ffmpeg-devel mailing list