[FFmpeg-devel] [Patch]x86/hevc : new idct + ASM
jamrial at gmail.com
Mon Jun 2 22:32:09 CEST 2014
On 02/06/14 6:15 AM, Pierre Edouard Lepere wrote:
> +%macro TRANSFORM_DC_ADD 2
> +cglobal hevc_put_transform%1x%1_dc_add_%2, 4, 6, 4, dst, coeffs, stride, col_limit, temp
4, 5, 4. You're using only one temp reg, not two.
> + xor tempw, tempw
No need for this. The mov below should clear the reg. Same with the "xor tempq, tempq" and
"pxor m2, m2" a couple instructions below.
> + mov tempw, [coeffsq]
> + add tempw, 1
> + sar tempw, 1
> + add tempw, [add_%2]
Why use constants for a single value when you can use immediates?
%if %2 == 8
add tempw, 32
add tempw, 8
> + sar tempw, 14-%2
> + movd m0, tempd
> + punpcklwd m0, m0
> + pshufd m0, m0, 0
Use SPLATW here. It will come in handy if you use mmx registers as Ronald suggested for
the 4x4 case. Just make sure to declare the functions as mmxext and not mmx as the latter
doesn't have pshuf* instructions and will instead expand into four punpck* instructions.
> + pxor m1, m1
> + xor tempq, tempq
> + mov tempd, %1
> + pxor m2, m2
> +%if %1 == 2 || (%2 == 8 && %1 <= 4)
There doesn't seem to be a %1 == 2 case.
> + movd m2, [dstq] ; load data from source
> +%elif %1 == 4 || (%2 == 8 && %1 <= 8)
> + movq m2, [dstq] ; load data from source
> + movdqu m2, [dstq] ; load data from source
You can use movu and movh here. They will expand to movdqu/movq and movq/movd depending
if you're using mmx or xmm registers.
something like this:
%if %2 == 8 && %1 <= mmsize/2
Same for the store version at the end of the function.
This only if you go with mmx registers for the 4x4 case, of course.
More information about the ffmpeg-devel