[FFmpeg-devel] [PATCH 2/2] x86/vf_w3fdif: simplify w3fdif_simple_high

James Almer jamrial at gmail.com
Sun Oct 11 19:17:55 CEST 2015


On 10/11/2015 4:31 AM, Paul B Mahol wrote:
> On 10/11/15, James Almer <jamrial at gmail.com> wrote:
>> Signed-off-by: James Almer <jamrial at gmail.com>
>> ---
>>  libavfilter/x86/vf_w3fdif.asm | 16 +++++++---------
>>  1 file changed, 7 insertions(+), 9 deletions(-)
>>
>> diff --git a/libavfilter/x86/vf_w3fdif.asm b/libavfilter/x86/vf_w3fdif.asm
>> index f02319b..f2001a4 100644
>> --- a/libavfilter/x86/vf_w3fdif.asm
>> +++ b/libavfilter/x86/vf_w3fdif.asm
>> @@ -103,13 +103,11 @@ REP_RET
>>
>>  %if ARCH_X86_64
>>
>> -cglobal w3fdif_simple_high, 5, 9, 9, 0, work_line, in_lines_cur0,
>> in_lines_adj0, coef, linesize
>> +cglobal w3fdif_simple_high, 5, 9, 8, 0, work_line, in_lines_cur0,
>> in_lines_adj0, coef, linesize
>>      movq                  m2, [coefq]
>>      DEFINE_ARGS    work_line, in_lines_cur0, in_lines_adj0, in_lines_cur1,
>> linesize, offset, in_lines_cur2, in_lines_adj1, in_lines_adj2
>> -    SPLATW                m0, m2, 0
>> -    SPLATW                m1, m2, 1
>> +    pshufd                m0, m2, q0000
>>      SPLATW                m2, m2, 2
>> -    SBUTTERFLY            wd, 0, 1, 7
>>      pxor                  m7, m7
>>      mov              offsetq, 0
>>      mov       in_lines_cur2q, [in_lines_cur0q+gprsize*2]
>> @@ -124,23 +122,23 @@ cglobal w3fdif_simple_high, 5, 9, 9, 0, work_line,
>> in_lines_cur0, in_lines_adj0,
>>      movh                                   m4, [in_lines_cur1q+offsetq]
>>      punpcklbw                              m3, m7
>>      punpcklbw                              m4, m7
>> -    SBUTTERFLY                             wd, 3, 4, 8
>> +    SBUTTERFLY                             wd, 3, 4, 1
>>      pmaddwd                                m3, m0
>> -    pmaddwd                                m4, m1
>> +    pmaddwd                                m4, m0
>>      movh                                   m5, [in_lines_adj0q+offsetq]
>>      movh                                   m6, [in_lines_adj1q+offsetq]
>>      punpcklbw                              m5, m7
>>      punpcklbw                              m6, m7
>> -    SBUTTERFLY                             wd, 5, 6, 8
>> +    SBUTTERFLY                             wd, 5, 6, 1
>>      pmaddwd                                m5, m0
>> -    pmaddwd                                m6, m1
>> +    pmaddwd                                m6, m0
>>      paddd                                  m3, m5
>>      paddd                                  m4, m6
>>      movh                                   m5, [in_lines_cur2q+offsetq]
>>      movh                                   m6, [in_lines_adj2q+offsetq]
>>      punpcklbw                              m5, m7
>>      punpcklbw                              m6, m7
>> -    SBUTTERFLY                             wd, 5, 6, 8
>> +    SBUTTERFLY                             wd, 5, 6, 1
>>      pmaddwd                                m5, m2
>>      pmaddwd                                m6, m2
>>      paddd                                  m3, m5
>> --
>> 2.6.0
>>
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel at ffmpeg.org
>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
> 
> Cant this now be used on x32?

Even though i got it down to eight xmm regs, we still have only seven grps
to work with on x86_32. 
The function has seven pointers plus the offset variable used as part of
effective addresses, which means they can't be accessed directly from
stack for this purpose, something that can be done with linesize.

So it will need some changes, like constant movs of grps to and from stack
to get it working.



More information about the ffmpeg-devel mailing list