[FFmpeg-devel] libavcodec/proresdec : add qmat dsp with SSE2, AVX2 simd

Ivan Kalvachev ikalvachev at gmail.com
Tue Oct 10 04:16:12 EEST 2017


On 10/9/17, Martin Vignali <martin.vignali at gmail.com> wrote:
> 2017-10-07 18:16 GMT+02:00 Ronald S. Bultje <rsbultje at gmail.com>:
>
>> Hi Martin,
>>
>> On Sat, Oct 7, 2017 at 11:49 AM, Martin Vignali <martin.vignali at gmail.com>
>> wrote:
>>
>> > 2017-10-07 17:30 GMT+02:00 Ronald S. Bultje <rsbultje at gmail.com>:
>> > > On Sat, Oct 7, 2017 at 10:22 AM, Martin Vignali <
>> > martin.vignali at gmail.com>
>> > > wrote:
>> > > > Patch in attach add a new dsp
>> > > > for manipulation of qmat
>> > > >
>> > > > for now, i move this code inside
>> > > >
>> > > > for (i = 0; i < 64; i++) {
>> > > >         qmat_luma_scaled  [i] = ctx->qmat_luma  [i] * qscale;
>> > > >         qmat_chroma_scaled[i] = ctx->qmat_chroma[i] * qscale;
>> > > > }
>> > > >
>> > > > i add a special case for qscale == 1
>> > > > and SSE2, AVX2 optimization
>> > >
>> > > This loop only executes once per slice. We typically do not
>> SIMD-optimize
>> > > at that level, because it won't give significant speed gains...
>> >
>> > Ok didn't know that.
>> > I mostly follow, what there are already done, like in
>> blockdsp.clear_block
>> >
>>
>> Right, so consider that blockdsp is done per block (16x16 pixels), not per
>> slice.
>>
> Ok on principle (only improve, a func which is called quite often)

It's more of:  We can't refuse code that makes a measurable improvement.

Also have in mind that compilers are getting smarter and this code is
good target for auto-vectorization. Of course FFmpeg disables is,
because of long history of compiler bugs related to it.

>> You could remove this entirely from the slice processing code by simply
>> pre-calculating the values in the init function once for the whole stream,
>> there's only 224 qscale values so it's 224*64*2 multiplications, which is
>> (in the context of prores) virtually negligible.
>>
>
> Not sure, we can do that for prores decoder
> the qmat seems to be set on the decode frame header func
> (based on the header of the frame).

You can at least check if the qscale has changed and avoid recalculation.
I think that the lgpl decoder does that.

Best Regards
   Ivan Kalvachev


More information about the ffmpeg-devel mailing list