[FFmpeg-devel] [PATCH] MMX2/SSSE3 VC1 loop filter
David Conrad
lessen42
Tue Jul 6 05:34:23 CEST 2010
On Jul 5, 2010, at 4:30 PM, Ronald S. Bultje wrote:
> Hi,
>
> On Mon, Jul 5, 2010 at 1:44 AM, David Conrad <lessen42 at gmail.com> wrote:
>> Updated to patch cleanly, compile, and added mmx/sse2 versions
> [..]
>> +SECTION_RODATA
>> +pw_4: times 8 dw 4
>> +pw_5: times 8 dw 5
>
> cextern pw_4, pw_5 (i.e. use the ones in dsputil_mmx.c) maybe?
This doesn't cause non-PIC problems on x86-64? I remember something similar causing a problem in libvpx.
Well, if it's problematic vp8dsp.asm has it already so changed.
>> +; low, high (src), zero
>> +%macro UNPACK2 4
>> + mova m%2, m%3
>> + punpckh%1 m%3, m%4
>> + punpckl%1 m%2, m%4
>> +%endmacro
>
> duplicate of SBUTTERFLY in x86util.asm, maybe?
Not quite conceptually: this is intended to expand one vector from 8-bit to 16-bit, whereas SBUTTERFLY is intended to merge 2 vectors. The SWAP of SBUTTERFLY in particular will send the constant 0 vector all over the place.
I could implement it by calling SBUTTERFLY but IMO it should be a different macro.
>> +%macro STORE_4_WORDS_MMX 6
>> + movd %6, %5
>> +%if mmsize==16
>> + psrldq %5, 4
>> +%else
>> + psrlq %5, 32
>> +%endif
>> + mov %1, %6w
>> + shr %6, 16
>> + mov %2, %6w
>> + movd %6, %5
>> + mov %3, %6w
>> + shr %6, 16
>> + mov %4, %6w
>> +%endmacro
>
> For VP8 H loopfilter, I save the neighbouring two rows (p1/q1) and
> write the four out as dwords using movd at once from the mm register,
> have you tried that (I'm not asking you to rewrite it if you didn't),
> and if so, is it faster?
I just tried doing it this way, and it seems to be a little slower. I'll try using STORE_4_WORDS_MMX in vp8's simple loop filter later, but not soon.
> (I suppose this isn't very practical because of the SSE4 version below...)
>
>> +%macro STORE_4_WORDS_SSE4 6
>> + pextrw %1, %5, %6+0
>> + pextrw %2, %5, %6+1
>> + pextrw %3, %5, %6+2
>> + pextrw %4, %5, %6+3
>> +%endmacro
> [..]
>
>> +%macro VC1_H_LOOP_FILTER 1-2
>> + movq m0, [r0 -4]
>> + movq m1, [r0+ r1-4]
>> + movq m2, [r0+2*r1-4]
>> + movq m3, [r0+ r3-4]
>> +%if %1 > 4
>> + movq m4, [r4 -4]
>> + movq m5, [r4+ r1-4]
>> + movq m6, [r4+2*r1-4]
>> + movq m7, [r4+ r3-4]
>> + punpcklbw m0, m1
>> + punpcklbw m2, m3
>> + punpcklbw m4, m5
>> + punpcklbw m6, m7
>> + SWAP 1, 2
>> + SWAP 2, 4
>> + SWAP 3, 6
>> + SBUTTERFLY wd, 0, 1, 4
>> + SBUTTERFLY wd, 2, 3, 4
>> + SBUTTERFLY dq, 0, 2, 4
>> + SBUTTERFLY dq, 1, 3, 4
>> +%else
>> + SBUTTERFLY bw, 0, 1, 4
>> + SBUTTERFLY bw, 2, 3, 4
>> + SBUTTERFLY wd, 0, 2, 4
>> + SBUTTERFLY wd, 1, 3, 4
>> +%endif
>
> TRANSPOSE4x4W, TRANSPOSE4x4B?
Fixed.
>> +cglobal vc1_h_loop_filter8_sse4, 3,5,8
>
> Should this (and others like it) be under #ifdef X86_64? I got compile
> errors if I tried to use xmm8-15 on x86_32.
What compiler errors are you getting? This shouldn't use xmm8+ ever.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: textmate stdin E0V4hx.txt
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100705/49a4ba13/attachment.txt>
More information about the ffmpeg-devel
mailing list