[FFmpeg-devel] [PATCH 2/2] x86/vf_w3fdif: simplify w3fdif_simple_high

Hendrik Leppkes h.leppkes at gmail.com
Thu Jan 7 03:13:55 CET 2016


On Mon, Oct 12, 2015 at 1:21 AM, James Almer <jamrial at gmail.com> wrote:
> On 10/11/2015 3:11 PM, Ronald S. Bultje wrote:
>> Hi,
>>
>> On Sun, Oct 11, 2015 at 1:17 PM, James Almer <jamrial at gmail.com> wrote:
>>
>>> On 10/11/2015 4:31 AM, Paul B Mahol wrote:
>>>> On 10/11/15, James Almer <jamrial at gmail.com> wrote:
>>>>> Signed-off-by: James Almer <jamrial at gmail.com>
>>>>> ---
>>>>>  libavfilter/x86/vf_w3fdif.asm | 16 +++++++---------
>>>>>  1 file changed, 7 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/libavfilter/x86/vf_w3fdif.asm
>>> b/libavfilter/x86/vf_w3fdif.asm
>>>>> index f02319b..f2001a4 100644
>>>>> --- a/libavfilter/x86/vf_w3fdif.asm
>>>>> +++ b/libavfilter/x86/vf_w3fdif.asm
>>>>> @@ -103,13 +103,11 @@ REP_RET
>>>>>
>>>>>  %if ARCH_X86_64
>>>>>
>>>>> -cglobal w3fdif_simple_high, 5, 9, 9, 0, work_line, in_lines_cur0,
>>>>> in_lines_adj0, coef, linesize
>>>>> +cglobal w3fdif_simple_high, 5, 9, 8, 0, work_line, in_lines_cur0,
>>>>> in_lines_adj0, coef, linesize
>>>>>      movq                  m2, [coefq]
>>>>>      DEFINE_ARGS    work_line, in_lines_cur0, in_lines_adj0,
>>> in_lines_cur1,
>>>>> linesize, offset, in_lines_cur2, in_lines_adj1, in_lines_adj2
>>>>> -    SPLATW                m0, m2, 0
>>>>> -    SPLATW                m1, m2, 1
>>>>> +    pshufd                m0, m2, q0000
>>>>>      SPLATW                m2, m2, 2
>>>>> -    SBUTTERFLY            wd, 0, 1, 7
>>>>>      pxor                  m7, m7
>>>>>      mov              offsetq, 0
>>>>>      mov       in_lines_cur2q, [in_lines_cur0q+gprsize*2]
>>>>> @@ -124,23 +122,23 @@ cglobal w3fdif_simple_high, 5, 9, 9, 0, work_line,
>>>>> in_lines_cur0, in_lines_adj0,
>>>>>      movh                                   m4, [in_lines_cur1q+offsetq]
>>>>>      punpcklbw                              m3, m7
>>>>>      punpcklbw                              m4, m7
>>>>> -    SBUTTERFLY                             wd, 3, 4, 8
>>>>> +    SBUTTERFLY                             wd, 3, 4, 1
>>>>>      pmaddwd                                m3, m0
>>>>> -    pmaddwd                                m4, m1
>>>>> +    pmaddwd                                m4, m0
>>>>>      movh                                   m5, [in_lines_adj0q+offsetq]
>>>>>      movh                                   m6, [in_lines_adj1q+offsetq]
>>>>>      punpcklbw                              m5, m7
>>>>>      punpcklbw                              m6, m7
>>>>> -    SBUTTERFLY                             wd, 5, 6, 8
>>>>> +    SBUTTERFLY                             wd, 5, 6, 1
>>>>>      pmaddwd                                m5, m0
>>>>> -    pmaddwd                                m6, m1
>>>>> +    pmaddwd                                m6, m0
>>>>>      paddd                                  m3, m5
>>>>>      paddd                                  m4, m6
>>>>>      movh                                   m5, [in_lines_cur2q+offsetq]
>>>>>      movh                                   m6, [in_lines_adj2q+offsetq]
>>>>>      punpcklbw                              m5, m7
>>>>>      punpcklbw                              m6, m7
>>>>> -    SBUTTERFLY                             wd, 5, 6, 8
>>>>> +    SBUTTERFLY                             wd, 5, 6, 1
>>>>>      pmaddwd                                m5, m2
>>>>>      pmaddwd                                m6, m2
>>>>>      paddd                                  m3, m5
>>>>> --
>>>>> 2.6.0
>>>>>
>>>>> _______________________________________________
>>>>> ffmpeg-devel mailing list
>>>>> ffmpeg-devel at ffmpeg.org
>>>>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>>>>
>>>>
>>>> Cant this now be used on x32?
>>>
>>
>> Add to the data pointers directly (in_lines_cur0q and work_lineq). Then sub
>> all other curXq/adjXq from cur0q (on 32bit only) before the loop and you
>> have to adds (on 32bit) instead of one (on 64bit), but one reg less
>> (offset), making it 7, which means it works.
>>
>> Ronald
>
> Ah, like it's being done in PACK_6CH from swr's audio_convert.asm
> For complex_high some stack ab/use will be needed (see PACK_8CH), but it should
> be doable.
> This way w3fdif will be able to fully dethrone yadif :P

Are you still working on w3fdif_simple_high for 32bit?
I would be interested in that. Otherwise I might try to do it myself,
although it sounds like a lot of #if'ery


More information about the ffmpeg-devel mailing list