[FFmpeg-devel] [PATCH] vp9: minor refactorings in idct ssse3 assembly.

Clément Bœsch u at pkh.me
Mon Dec 2 13:36:41 CET 2013


On Sun, Dec 01, 2013 at 08:17:28PM -0500, Ronald S. Bultje wrote:
> Make register usage in macros explicit; change mulsub_2w_4x to use 2
> instead of 3 temp registers. Use this free'ed register to load 11585x2
> once before the 2 1d idct4s instead of within each 1d idct4.
> ---
>  libavcodec/x86/vp9itxfm.asm | 84 ++++++++++++++++++++++-----------------------
>  1 file changed, 41 insertions(+), 43 deletions(-)
> 
> diff --git a/libavcodec/x86/vp9itxfm.asm b/libavcodec/x86/vp9itxfm.asm
> index 7b9d7df..730a26e 100644
> --- a/libavcodec/x86/vp9itxfm.asm
> +++ b/libavcodec/x86/vp9itxfm.asm
> @@ -51,33 +51,32 @@ SECTION .text
>  %macro VP9_MULSUB_2W_2X 6 ; dst1, dst2, src (unchanged), round, coefs1, coefs2
>      pmaddwd            m%1, m%3, %5
>      pmaddwd            m%2, m%3, %6
> -    paddd              m%1, m%4
> -    paddd              m%2, m%4
> -    psrad              m%1, 14
> -    psrad              m%2, 14
> +    paddd              m%1,  %4
> +    paddd              m%2,  %4
> +    psrad              m%1,  14
> +    psrad              m%2,  14

Can't you keep this function unchanged? It seems this make the calls to
VP9_UNPACK_MULSUB_2W_4X take only one argument with the "m" form (m7),
which sounds a bit inconsistent and seems to create pointless diff.

>  %endmacro
>  
> -%macro VP9_UNPACK_MULSUB_2W_4X 4 ; dst1, dst2, coef1, coef2
> -    punpckhwd           m6, m%2, m%1
> -    VP9_MULSUB_2W_2X     4, 5,  6, 7, [pw_m%3_%4], [pw_%4_%3]
> +%macro VP9_UNPACK_MULSUB_2W_4X 7 ; dst1, dst2, coef1, coef2, rnd, tmp1, tmp2
> +    punpckhwd          m%6, m%2, m%1
> +    VP9_MULSUB_2W_2X    %7,  %6,  %6, %5, [pw_m%3_%4], [pw_%4_%3]
>      punpcklwd          m%2, m%1
> -    VP9_MULSUB_2W_2X    %1, 6, %2, 7, [pw_m%3_%4], [pw_%4_%3]
> -    packssdw           m%1, m4
> -    packssdw            m6, m5
> -    SWAP                %2, 6
> +    VP9_MULSUB_2W_2X    %1,  %2,  %2, %5, [pw_m%3_%4], [pw_%4_%3]
> +    packssdw           m%1, m%7
> +    packssdw           m%2, m%6
>  %endmacro
>  
> -%macro VP9_STORE_2X 2
> -    movh                m6, [dstq]
> -    movh                m7, [dstq+strideq]
> -    punpcklbw           m6, m4
> -    punpcklbw           m7, m4
> -    paddw               m6, %1
> -    paddw               m7, %2
> -    packuswb            m6, m4
> -    packuswb            m7, m4
> -    movh            [dstq], m6
> -    movh    [dstq+strideq], m7

> +%macro VP9_STORE_2X 5 ; reg1, reg2, tmp1, tmp2, zero, zero

one trailing parameter

> +    movh               m%3, [dstq]
> +    movh               m%4, [dstq+strideq]
> +    punpcklbw          m%3, m%5
> +    punpcklbw          m%4, m%5
> +    paddw              m%3, m%1
> +    paddw              m%4, m%2
> +    packuswb           m%3, m%5
> +    packuswb           m%4, m%5
> +    movh            [dstq], m%3
> +    movh    [dstq+strideq], m%4
>  %endmacro
>  
[...]

Rest LGTM, thanks

Note: ideally the parametrization and the 11585 coeff pre-load could be
split, but I won't insist on it.

-- 
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20131202/87a24ee4/attachment.asc>


More information about the ffmpeg-devel mailing list