[FFmpeg-devel] libavcodec/proresdec : add qmat dsp with SSE2, AVX2 simd

Martin Vignali martin.vignali at gmail.com
Mon Oct 9 22:24:09 EEST 2017


2017-10-07 18:16 GMT+02:00 Ronald S. Bultje <rsbultje at gmail.com>:

> Hi Martin,
>
> On Sat, Oct 7, 2017 at 11:49 AM, Martin Vignali <martin.vignali at gmail.com>
> wrote:
>
> > 2017-10-07 17:30 GMT+02:00 Ronald S. Bultje <rsbultje at gmail.com>:
> > > On Sat, Oct 7, 2017 at 10:22 AM, Martin Vignali <
> > martin.vignali at gmail.com>
> > > wrote:
> > > > Patch in attach add a new dsp
> > > > for manipulation of qmat
> > > >
> > > > for now, i move this code inside
> > > >
> > > > for (i = 0; i < 64; i++) {
> > > >         qmat_luma_scaled  [i] = ctx->qmat_luma  [i] * qscale;
> > > >         qmat_chroma_scaled[i] = ctx->qmat_chroma[i] * qscale;
> > > > }
> > > >
> > > > i add a special case for qscale == 1
> > > > and SSE2, AVX2 optimization
> > >
> > > This loop only executes once per slice. We typically do not
> SIMD-optimize
> > > at that level, because it won't give significant speed gains...
> >
> > Ok didn't know that.
> > I mostly follow, what there are already done, like in
> blockdsp.clear_block
> >
>
> Right, so consider that blockdsp is done per block (16x16 pixels), not per
> slice.
>
Ok on principle (only improve, a func which is called quite often)


>
> You could remove this entirely from the slice processing code by simply
> pre-calculating the values in the init function once for the whole stream,
> there's only 224 qscale values so it's 224*64*2 multiplications, which is
> (in the context of prores) virtually negligible.
>

Not sure, we can do that for prores decoder
the qmat seems to be set on the decode frame header func
(based on the header of the frame).

Martin


More information about the ffmpeg-devel mailing list