[FFmpeg-devel] [PATCH] vp9: initial attempt at a idct_idct_4x4 12bpp x86 simd (sse2) impl.

Ronald S. Bultje rsbultje at gmail.com
Mon Oct 12 16:25:34 CEST 2015


Hi,

On Sat, Oct 10, 2015 at 12:31 PM, Henrik Gramner <henrik at gramner.com> wrote:

> On Tue, Oct 6, 2015 at 9:59 PM, Ronald S. Bultje <rsbultje at gmail.com>
> wrote:
> > +cglobal vp9_idct_idct_4x4_add_12, 4, 4, 6, dst, stride, block, eob
> [...]
> > +    movd                m0, coefd
> > +    punpcklwd           m0, m0
> > +    pshufd              m0, m0, q0000
>
> pshuflw + punpcklqdq is faster on some older CPUs, such as Conroe.


Done.

Ronald


More information about the ffmpeg-devel mailing list