[FFmpeg-devel] [PATCH 5/7] ARM: NEON optimised H.264 8x8 and 16x16 qpel MC

Ian Caulfield ian.caulfield
Mon Dec 8 14:54:24 CET 2008


2008/12/5 Mans Rullgard <mans at mansr.com>:

> +
> +        vshl.i16        q3,  q1,  #4
> +        vshl.i16        q1,  q1,  #2
> +        vshl.i16        q15, q2,  #2
> +        vadd.i16        q1,  q1,  q3
> +        vadd.i16        q2,  q2,  q15
> +
> +        vshl.i16        q3,  q9,  #4
> +        vshl.i16        q9,  q9,  #2
> +        vshl.i16        q15, q10, #2
> +        vadd.i16        q9,  q9,  q3
> +        vadd.i16        q10, q10, q15
> +
> +        vsub.i16        q1,  q1,  q2
> +        vsub.i16        q9,  q9,  q10

Is this any faster? I don't know what the interlocking will be like,
nor whether you have a spare register to hold the scalar... (or even
if setting up the scalars would make it slower)

vmul.i16       q1, q1, <scalar set to 6>
vmul.i16       q9, q9, <scalar set to 6>
vmls.i16       q1, q2, <scalar set to 3>
vmls.i16       q9, q10, <scalar set to 3>

Ian




More information about the ffmpeg-devel mailing list