[FFmpeg-devel] [PATCH 1/5] avutil: add pixelutils API

James Almer jamrial at gmail.com
Sat Aug 2 23:25:01 CEST 2014


On 02/08/14 6:13 PM, Clément Bœsch wrote:
> On Sat, Aug 02, 2014 at 04:29:39PM -0300, James Almer wrote:
>> On 02/08/14 3:20 PM, Clément Bœsch wrote:
>>> +    psrlq       m0, m6, 32
>>> +    paddw       m6, m0
>>> +    psrlq       m0, m6, 16
>>> +    paddw       m6, m0
>>> +    movd        eax, m6
>>> +    movzx       eax, ax
>>
>> You could use the HADDW macro here.
>>
> 
> error: undefined symbol `pw_1' (first use)
> 
> sounds somehow constraining. I'll keep my version until you benchmark to
> prove me HADDW is faster on an old MMX cpu ;)

I have no idea if it's faster, nor a way to test that for that matter.
It's four instructions instead of six, but pmaddwd + memory operand is probably 
not fast enough on old cpus.

> 
>>> +;-------------------------------------------------------------------------------
>>> +; int ff_pixelutils_sad_8x8_mmxext(const uint8_t *src1, ptrdiff_t stride1,
>>> +;                                  const uint8_t *src2, ptrdiff_t stride2);
>>> +;-------------------------------------------------------------------------------
>>> +INIT_MMX mmxext
>>> +cglobal pixelutils_sad_8x8, 4,4,0, src1, stride1, src2, stride2
>>> +    pxor        m2, m2
>>> +%rep 4
>>> +    mova        m0, [src1q]
>>> +    mova        m1, [src1q + stride1q]
>>> +    psadbw      m0, [src2q]
>>> +    psadbw      m1, [src2q + stride2q]
>>> +    paddw       m2, m0
>>> +    paddw       m2, m1
>>> +    lea         src1q, [src1q + 2*stride1q]
>>> +    lea         src2q, [src2q + 2*stride2q]
>>> +%endrep
>>> +    movd        eax, m2
>>> +    RET
>>
>> Adding sad16x16 mmxext should be a matter of using add instead of lea, changing 
>> the %rep amount, and using 8 instead of stride[12]q for the mova and psadbw.
>>
> 
> Yeah right, added. Thanks.
> 
>>> --- /dev/null
>>> +++ b/libavutil/x86/pixelutils.h
>>> @@ -0,0 +1,26 @@
>>> +/*
>>> + * This file is part of FFmpeg.
>>> + *
>>> + * FFmpeg is free software; you can redistribute it and/or
>>> + * modify it under the terms of the GNU Lesser General Public
>>> + * License as published by the Free Software Foundation; either
>>> + * version 2.1 of the License, or (at your option) any later version.
>>> + *
>>> + * FFmpeg is distributed in the hope that it will be useful,
>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>>> + * Lesser General Public License for more details.
>>> + *
>>> + * You should have received a copy of the GNU Lesser General Public
>>> + * License along with FFmpeg; if not, write to the Free Software
>>> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
>>> + */
>>> +
>>> +#ifndef AVUTIL_X86_PIXELUTILS_H
>>> +#define AVUTIL_X86_PIXELUTILS_H
>>> +
>>> +#include "libavutil/pixelutils.h"
>>> +
>>> +void ff_pixelutils_init_x86(AVPixelUtils *s);
>>
>> This prototype should be in libavutil/pixelutils.h
>> No need to make a whole new header just for it.
>>
> 
> No, libavutil/pixelutils.h is public, I don't want to have private
> prototypes in it.

Right, forgot it was public. I had lavc dsp stuff in mind when i said that.

> 
>> Maybe you could add a quick test for these functions? Look at lavc/motion-test.c and 
>> lavu/float-dsp.c
> 
> Added.
> 
> I'll resubmit a patchset in a moment.
> 
> 
> 
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 



More information about the ffmpeg-devel mailing list