[FFmpeg-devel] [PATCH] vp9/x86: 16x16 iadst_idct, idct_iadst and iadst_iadst (ssse3).
Ronald S. Bultje
rsbultje at gmail.com
Wed Jan 15 15:36:17 CET 2014
On Wed, Jan 15, 2014 at 9:23 AM, Clément Bœsch <u at pkh.me> wrote:
> On Tue, Jan 14, 2014 at 10:15:55PM -0500, Ronald S. Bultje wrote:
> > Sample timings on ped1080p.webm:
> > iadst_idct: 4672 -> 1175 cycles
> > idct_iadst: 4736 -> 1263 cycles
> > iadst_iadst: 4924 -> 1438 cycles
> > Total decoding time changed from 6.565s to 6.413s.
> > ---
> > libavcodec/x86/vp9dsp_init.c | 25 +++-
> > libavcodec/x86/vp9itxfm.asm | 323
> > 2 files changed, 338 insertions(+), 10 deletions(-)
> > +INIT_XMM ssse3
> > +cglobal vp9_idct_iadst_16x16_add, 3, 5, 16, 512, dst, stride, block, eob
> Here and following, shouldn't it be 4 instead of 3?
Normally yes, but we don't actually use 'eobd', so in this case 3 works as
We'd normally change it once we start using eobd, e.g. when we add a sub8x8
version. (I played with this, but nothing finished yet.)
Also, unless you plan to add specific code in those, you could create a
> macro for all the combination you added (the following code is basically
> duplicated 3x with very small changes).
> That would ease the addition of avx btw.
Yes good idea, will do.
(I won't share the macro with the idct_idct, if that's OK, since that one
has several subforms and I don't think this one will benefit as much from
it, since idct_idct is mainly used for inter, whereas this one is
exclusively intra, so the eob distribution is very different.)
More information about the ffmpeg-devel