[FFmpeg-devel] [PATCH] Move MLP's dot product to DSPContext

Ramiro Polla ramiro.polla
Wed May 13 22:03:03 CEST 2009


Hi,

On Wed, Apr 29, 2009 at 9:58 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Wed, Apr 29, 2009 at 01:15:14AM -0300, Ramiro Polla wrote:
[...]
>> +void ff_mlp_filter_channel_x86_64(int32_t *firbuf, const int32_t *fircoeff, int firorder,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?int32_t *iirbuf, const int32_t *iircoeff, int iirorder,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?unsigned int filter_shift, int32_t mask, int blocksize,
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?int32_t *sample_buffer)
>> +{
>> + ? ?void *firjump = ff_mlp_firtable_x86_64[firorder];
>> + ? ?void *iirjump = ff_mlp_iirtable_x86_64[iirorder];
>> +
>> + ? ?blocksize = -blocksize;
>> +
>> + ? ?__asm__ volatile(
>> + ? ? ? ?"1: ? ? ? ? ? ? ? ? ? ? ? ?\n\t"
>> + ? ? ? ?"xor ? ? %%rsi ? ? ?, %%rsi\n\t"
>> + ? ? ? ?"jmp ? ?*%[firjump] ? ? ? ?\n\t"
>> + ? ? ? ?MUL64("%[firbuf]", "%[fircoeff]", 0x1c, ff_mlp_firorder_x86_64_8)
>> + ? ? ? ?MUL64("%[firbuf]", "%[fircoeff]", 0x18, ff_mlp_firorder_x86_64_7)
>> + ? ? ? ?MUL64("%[firbuf]", "%[fircoeff]", 0x14, ff_mlp_firorder_x86_64_6)
>> + ? ? ? ?MUL64("%[firbuf]", "%[fircoeff]", 0x10, ff_mlp_firorder_x86_64_5)
>> + ? ? ? ?MUL64("%[firbuf]", "%[fircoeff]", 0x0c, ff_mlp_firorder_x86_64_4)
>> + ? ? ? ?MUL64("%[firbuf]", "%[fircoeff]", 0x08, ff_mlp_firorder_x86_64_3)
>> + ? ? ? ?MUL64("%[firbuf]", "%[fircoeff]", 0x04, ff_mlp_firorder_x86_64_2)
>> + ? ? ? ?MUL64("%[firbuf]", "%[fircoeff]", 0x00, ff_mlp_firorder_x86_64_1)
>> + ? ? ? ?MANGLE(ff_mlp_firorder_x86_64_0)":\n\t"
>> + ? ? ? ?"jmp ? ?*%[iirjump] ? ? ? ?\n\t"
>> + ? ? ? ?MUL64("%[iirbuf]", "%[iircoeff]", 0x0c, ff_mlp_iirorder_x86_64_4)
>> + ? ? ? ?MUL64("%[iirbuf]", "%[iircoeff]", 0x08, ff_mlp_iirorder_x86_64_3)
>> + ? ? ? ?MUL64("%[iirbuf]", "%[iircoeff]", 0x04, ff_mlp_iirorder_x86_64_2)
>> + ? ? ? ?MUL64("%[iirbuf]", "%[iircoeff]", 0x00, ff_mlp_iirorder_x86_64_1)
>
> you probably could put some of the coeffs in registers

Added the 3 first FIR coeffs until gcc started complaining that there
were no more free regs.

>> + ? ? ? ?MANGLE(ff_mlp_iirorder_x86_64_0)":\n\t"
>
>> + ? ? ? ?"mov ? ? %%rsi ? ? ?, %%rax\n\t"
>
> useless

Removed.

>> + ? ? ? ?"shr ? ? %%cl ? ? ? , %%rax\n\t"
>> +
>> + ? ? ? ?"mov ? ? %%rax ? ? ?, %%rdx\n\t"
>> + ? ? ? ?"add ? ?(%[sample]) , %%rax\n\t"
>> + ? ? ? ?"and ? ? %[mask] ? ?, %%rax\n\t"
>> + ? ? ? ?"sub ? ? ? ? ? ? ?$4, ?%[firbuf]\n\t"
>> + ? ? ? ?"sub ? ? ? ? ? ? ?$4, ?%[iirbuf]\n\t"
>
> these 2 buffers can apparently be merged simplifying addressing

Merged, and coeffs too.

>> + ? ? ? ?"mov ? ? %%eax ? ? ?, (%[firbuf])\n\t"
>> + ? ? ? ?"mov ? ? %%eax ? ? ?, (%[sample])\n\t"
>
> this looks mildly redundant ...

I tried removing firbuf and instead using *sample directly, but this
led to slower code.

I also tried switching sample_buffer from
[MAX_BLOCKSIZE][MAX_CHANNELS] to [MAX_CHANNELS][MAX_BLOCKSIZE] so that
I could access the members more closely, but this also led to slower
code overall.

I renamed the MUL macros as per Mans' suggestion, and reworked most of
the asm code (32-bit now has keeps some pointers in registers and is
much faster). I also removed the attempt to manually schedule MUL32
because it led to uglier code and Dark_Shikari suggested it wouldn't
do much good because of out-of-order execution anyways.

Order of patches:
include_mlp_h.diff
join_states_coeffs.diff
x86_filter.diff

speedup:
32-bit: 12.59%
64-bit:  9.98%

I haven't pursued sse4 anymore because the x86_32 code is very close
in speed, and I have other work to do.

Ramiro Polla
-------------- next part --------------
A non-text attachment was scrubbed...
Name: include_mlp_h.diff
Type: text/x-patch
Size: 674 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090513/c3075103/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: join_states_coeffs.diff
Type: text/x-patch
Size: 4794 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090513/c3075103/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: x86_filter.diff
Type: text/x-patch
Size: 11834 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090513/c3075103/attachment-0002.bin>



More information about the ffmpeg-devel mailing list