[FFmpeg-devel] [PATCH] Move MLP's dot product to DSPContext

Tue Apr 21 05:32:03 CEST 2009

On Tue, Apr 21, 2009 at 12:29 AM, Jason Garrett-Glaser
<darkshikari at gmail.com> wrote:
> 2009/4/20 Ramiro Polla <ramiro.polla at gmail.com>:
>> On Mon, Apr 20, 2009 at 9:40 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
>>> On Mon, Apr 20, 2009 at 02:29:09AM -0300, Ramiro Polla wrote:
>>>> On Mon, Apr 20, 2009 at 12:14 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
>>>> > On Sun, Apr 19, 2009 at 10:10:05PM -0300, Ramiro Polla wrote:
>>>> >> Attached file move MLP's dot product to DSPContext. The filter order
>>>> >> is a maximum of 8, and in the rematrix stage it's a maximum of 5+2
>>>> >> channels for MLP and 7+0 channels for TrueHD, so it all fits in 8
>>>> >> (hopefully) optimized functions.
>>>> >
>>>> > the functions are too small, the call overhead is too much
>>>> > 1-8 multiplicatons and 1-8 additions is not enough ...
>>>>
>>>> I thought that would happen too, but strangely there was a speedup.
>>>
>>> you wrote the whole function in asm() and that was slower?
>>
>> Attached are three asm variants: sse2, sse4, and altivec.
>>
>> Here are the benchmarks:

[...]

>> - on x86_64 (can't run sse4)
>> current: ?2070ms
>> array of functions in dspcontext:
>> c ? ? ?: ?2600ms (badly vectorized)
>> c ? ? ?: ?1920ms (not vectorized)
>> sse2 ? : ?2450ms
>> inlined in mlpdec.c:
>> c ? ? ?: ?2800ms (badly vectorized)
>> c ? ? ?: ?1980ms (not vectorized)
>> sse2 ? : ?2450ms
>
> Have you tried benching it on a 64-bit system with SSE4?

No. I don't have access to any.

Ramiro Polla