[FFmpeg-devel] [PATCH] Move MLP's dot product to DSPContext

Ramiro Polla ramiro.polla
Wed Apr 29 06:15:14 CEST 2009


Hi,

On Tue, Apr 21, 2009 at 12:31 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Mon, Apr 20, 2009 at 11:01:10PM -0300, Ramiro Polla wrote:
>> On Mon, Apr 20, 2009 at 9:40 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> > On Mon, Apr 20, 2009 at 02:29:09AM -0300, Ramiro Polla wrote:
>> >> On Mon, Apr 20, 2009 at 12:14 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> >> > On Sun, Apr 19, 2009 at 10:10:05PM -0300, Ramiro Polla wrote:
>> >> >> Attached file move MLP's dot product to DSPContext. The filter order
>> >> >> is a maximum of 8, and in the rematrix stage it's a maximum of 5+2
>> >> >> channels for MLP and 7+0 channels for TrueHD, so it all fits in 8
>> >> >> (hopefully) optimized functions.
>> >> >
>> >> > the functions are too small, the call overhead is too much
>> >> > 1-8 multiplicatons and 1-8 additions is not enough ...
>> >>
>> >> I thought that would happen too, but strangely there was a speedup.
>> >
>> > you wrote the whole function in asm() and that was slower?
>>
>> Attached are three asm variants: sse2, sse4, and altivec.
>
> 1. i meant non SIMD asm :)
> If one wanted to do this in SIMD, it should do several channels
> at once, or FIR & IIR at once or several blocks at once, then
> SIMD should be faster but as is its not SIMD friendly
>
> 2. i mean the whole outer function not the 1-8 arithemtic ops one
> this one as said is too small, the call overhead will kill it when
> you try the same code (that is asm not gcc deoptiranomized C)
>
> ahh and note, gcc and 64 operations -> very poor code, naive asm
> will be much faster at least it was that way in the past ...

After some of my latest commits (specially the one that cut down the
buffer sizes), mlp in general got faster, but gcc decided to vectorize
filter_channels() in x86_64, so the current speeds for 7.1.thd are
now:
ppc   : 9750 ms
x86_32: 3585 ms
x86_64: 2546 ms
x86_64: 2142 ms (-fno-tree-vectorize)

After 0001-mlpdec-Move-MLP-s-filter_channel-to-dsputils.patch:

ppc   : 9220 ms
x86_32: 3046 ms
x86_64: 2504 ms
x86_64 with -fno-tree-vectorize was not measured because of a harddisk
failure (and I won't be able to test again for a while), but IIRC it
was something around 2100 ms.

After 0002-mlpdec-x86_-32-64-optimized-mlp_filter_channel.patch:

x86_32: 2951 ms
x86_64: 1897 ms

After 0003-Check-for-SSE4.patch and
0004-mlpdec-sse4-optimized-mlp_filter_channel.patch:
x86_32: 2985 ms

The sse4 code is around the same speed as x86_32 (and slower than
x86_64). Unless it can be well optmized further I don't think it's
worth spending more time on it. Also I decided to drop the mmx2 code
since it was always slower. Unless someone has a good idea to optimize
it much more.

rematrix_channels() is slower when moved into dsputils, and more
annoying to optimize in assembly because of if (matrix_noise_shift).
It showed a great speedup when optimized with altivec though. (This
part is not done yet).

Ramiro Polla - looking for a way to bring a harddisk back to life
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-mlpdec-Move-MLP-s-filter_channel-to-dsputils.patch
Type: text/x-diff
Size: 8455 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090429/84c8c5a7/attachment.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-mlpdec-x86_-32-64-optimized-mlp_filter_channel.patch
Type: text/x-diff
Size: 12127 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090429/84c8c5a7/attachment-0001.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-Check-for-SSE4.patch
Type: text/x-diff
Size: 1875 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090429/84c8c5a7/attachment-0002.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0004-mlpdec-sse4-optimized-mlp_filter_channel.patch
Type: text/x-diff
Size: 6931 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090429/84c8c5a7/attachment-0003.patch>



More information about the ffmpeg-devel mailing list