[FFmpeg-devel] [PATCH] dnxhd get_pixels_4x8_sym sse

Baptiste Coudurier baptiste.coudurier
Fri Dec 12 01:21:57 CET 2008


Michael Niedermayer wrote:
> On Wed, Dec 10, 2008 at 06:22:15PM -0800, Baptiste Coudurier wrote:
>> Michael Niedermayer wrote:
>>> On Wed, Dec 10, 2008 at 05:39:05PM -0800, Baptiste Coudurier wrote:
>>>> Hi,
>>>>
>>>> $subject, I don't polute dsputil more since this is used only by dnxhd
>>>> encoder AFAIK.
>>>
>>> [...]
>>>
>>> also isnt it 8x4 instead of 4x8 (that being a seperate fix unrelated to this
>>> patch of course) i just noticed it ...
>> Well it's 4 rows of 8 pixels, I guess it depends on how you see it.
> 
> of course
> i just think the width x height variant is more common in libavcodec and
> various ITU/MPEG specs than height x width
> 
> but besides the bikeshed of which to take, it should be consistant throughout
> libav*

Ok, changed and applied separately.

>> [...]
>>
>> Index: libavcodec/dnxhdenc.c
>> ===================================================================
>> --- libavcodec/dnxhdenc.c	(revision 16051)
>> +++ libavcodec/dnxhdenc.c	(working copy)
> 
>> @@ -30,6 +30,7 @@
>>  #include "dnxhdenc.h"
>>  
>>  int dct_quantize_c(MpegEncContext *s, DCTELEM *block, int n, int qscale, int *overflow);
>> +static void dnxhd_get_pixels_4x8(DCTELEM *restrict block, const uint8_t *pixels, int line_size);
>>  
>>  #define LAMBDA_FRAC_BITS 10
>>  
> 
> maybe functions should be reordered to avoid this? (that of course is a
> seperate issue ...)

Yes, of course, just moved the function separately.

> [...]
>> +static void get_pixels_4x8_sym_sse2(DCTELEM *block, const uint8_t *pixels, int line_size)
>> +{
>> +    __asm__ volatile(
>> +        "pxor %%xmm7,      %%xmm7       \n\t"
>> +        "movq (%0),        %%xmm0       \n\t"
>> +        "movq (%0, %2),    %%xmm1       \n\t"
>> +        "movq (%0, %2,2),  %%xmm2       \n\t"
>> +        "movq (%0, %3),    %%xmm3       \n\t"
>> +        "punpcklbw %%xmm7, %%xmm0       \n\t"
>> +        "punpcklbw %%xmm7, %%xmm1       \n\t"
>> +        "punpcklbw %%xmm7, %%xmm2       \n\t"
>> +        "punpcklbw %%xmm7, %%xmm3       \n\t"
>> +        "movdqa %%xmm0,      (%1)       \n\t"
>> +        "movdqa %%xmm1,    16(%1)       \n\t"
>> +        "movdqa %%xmm2,    32(%1)       \n\t"
>> +        "movdqa %%xmm3,    48(%1)       \n\t"
>> +        "movdqa %%xmm3 ,   64(%1)       \n\t"
>> +        "movdqa %%xmm2 ,   80(%1)       \n\t"
>> +        "movdqa %%xmm1 ,   96(%1)       \n\t"
>> +        "movdqa %%xmm0,   112(%1)       \n\t"
>> +        :
>> +        : "r" (pixels), "r" (block), "r" ((x86_reg)line_size), "r" ((x86_reg)line_size*3)
> 
> maybe 
> "movq (%0),        %%xmm0       \n\t"
> "add  %2,          %0           \n\t"
> "movq (%0),        %%xmm1       \n\t"
> "movq (%0, %2),    %%xmm2       \n\t"
> "movq (%0, %2,2),  %%xmm3       \n\t"
> 
> is faster (it doesnt need line_size*3), then, maybe not
> this would require pixels to be "+r" ...
> 
> whichever is faster the patch (with the faster one) is ok

Your variant seems a little bit faster so I applied yours.

-- 
Baptiste COUDURIER                              GnuPG Key Id: 0x5C1ABAAA
Key fingerprint                 8D77134D20CC9220201FC5DB0AC9325C5C1ABAAA
checking for life_signs in -lkenny... no




More information about the ffmpeg-devel mailing list