[FFmpeg-devel] [PATCH 3/6] diracdec: add 10-bit Deslauriers-Dubuc 9, 7 (9_7) vertical high-pass function

James Darnley jdarnley at obe.tv
Thu Jul 19 18:52:01 EEST 2018


On 2018-07-19 17:26, Rostislav Pehlivanov wrote:
> On 19 July 2018 at 15:52, James Darnley <jdarnley at obe.tv> wrote:
> 
>> int32_t *b1, int32_t *b2, int
>>          b1[i] = COMPOSE_DIRAC53iH0(b0[i], b1[i], b2[i]);
>>  }
>>
>> +static void dd97_vertical_hi_sse2(int32_t *b0, int32_t *b1, int32_t *b2,
>> +                                  int32_t *b3, int32_t *b4, int width)
>> +{
>> +    int i = width & ~3;
>> +    ff_dd97_vertical_hi_sse2(b0, b1, b2, b3, b4, i);
>> +    for(; i<width; i++)
>> +        b2[i] = COMPOSE_DD97iH0(b0[i], b1[i], b2[i], b3[i], b4[i]);
>> +
>> +}
>>
> 
> 
> This, along with the rest of the patchset: what's up with the hybrid
> implementations? Couldn't you put the second part in the asm code as well?
> Now there are 2 function calls instead of 1.

The 8-bit code does this and I just followed it lead.  I believe this is
done because we cannot write junk data beyond what we think is the end
of the line because this might be one of the higher depths and the
coeffs for the next level sit beyond the end of the line.

But now it has just occurred to me that maybe you meant "why didn't you
do the scalar operations in SIMD?", is that what you meant?  Answer is
because it didn't occur to me at the time.  Aside from that I always
write do-while loops in assembly because I can usually guarantee 1 run
of the block.

I can certainly look at making that change.



More information about the ffmpeg-devel mailing list