[FFmpeg-devel] [PATCH] vp9/x86: 16x16 iadst_idct, idct_iadst and iadst_iadst (ssse3).

Ronald S. Bultje rsbultje at gmail.com
Wed Jan 15 15:36:17 CET 2014


Hi,

On Wed, Jan 15, 2014 at 9:23 AM, Clément Bœsch <u at pkh.me> wrote:

> On Tue, Jan 14, 2014 at 10:15:55PM -0500, Ronald S. Bultje wrote:
> > Sample timings on ped1080p.webm:
> > iadst_idct:  4672 -> 1175 cycles
> > idct_iadst:  4736 -> 1263 cycles
> > iadst_iadst: 4924 -> 1438 cycles
> > Total decoding time changed from 6.565s to 6.413s.
> > ---
> >  libavcodec/x86/vp9dsp_init.c |  25 +++-
> >  libavcodec/x86/vp9itxfm.asm  | 323
> ++++++++++++++++++++++++++++++++++++++++++-
> >  2 files changed, 338 insertions(+), 10 deletions(-)
> >
> [...]
> > +INIT_XMM ssse3
> > +cglobal vp9_idct_iadst_16x16_add, 3, 5, 16, 512, dst, stride, block, eob
>
> Here and following, shouldn't it be 4 instead of 3?
>

Normally yes, but we don't actually use 'eobd', so in this case 3 works as
well.

We'd normally change it once we start using eobd, e.g. when we add a sub8x8
version. (I played with this, but nothing finished yet.)

Also, unless you plan to add specific code in those, you could create a
> macro for all the combination you added (the following code is basically
> duplicated 3x with very small changes).
>
> That would ease the addition of avx btw.


Yes good idea, will do.

(I won't share the macro with the idct_idct, if that's OK, since that one
has several subforms and I don't think this one will benefit as much from
it, since idct_idct is mainly used for inter, whereas this one is
exclusively intra, so the eob distribution is very different.)

Ronald


More information about the ffmpeg-devel mailing list