[FFmpeg-devel] [PATCH 2/4] swr: convert resample_{common, linear}_double_sse2 to yasm

Mon Jun 30 13:35:38 CEST 2014

Hi,

On Sun, Jun 29, 2014 at 7:19 PM, James Almer <jamrial at gmail.com> wrote:

> -%macro RESAMPLE_FNS 3 ; format [float or int16], bps, log2_bps
> +%macro RESAMPLE_FNS 5 ; format [float or int16], bps, log2_bps
>

Please document last 2 parameters (e.g. float_op_suffix [s or d], 1.0
constant).

> @@ -165,21 +166,21 @@ cglobal resample_common_%1, 1, 7, 2, ctx,
> phase_shift, dst, frac, \
>      lea                      filterq, [min_filter_count_x4q+filterq*%2]
>      mov         min_filter_count_x4q, min_filter_length_x4q
>  %endif
> -%ifidn %1, float
> +%ifidn %1, int16
> +    movd                          m0, [%5]
> +%else ; float/double
>      xorps                         m0, m0, m0
> -%else ; int16
> -    movd                          m0, [pd_0x4000]
>  %endif
>

Well this isn't really necessary in the if, since it's int16 only, you can
just as well hardcode the pd_0x4000 anyway. The meaning of the pd_0x4000
(round) and pf/dbl_1 (1.0 for inversing a div to a mul) isn't really the
same anyway. Makes me wonder if loading the pd_0x4000 outside the loop into
m2 and using that to init m0 makes int16 faster, but anyway. (This is
totally unrelated to your patch, so probably ignore it for this review, but
maybe someone wants to test this later.)

-%ifidn %1, float
> -    cvtsi2ss                     xm0, src_incrd
> -    movss                        xm4, [pf_1]
> -    divss                        xm4, xm0
> -%else ; int16
> -    movd                          m4, [pd_0x4000]
> +%ifidn %1, int16
> +    movd                          m4, [%5]
> +%else ; float/double
> +    cvtsi2s%4                    xm0, src_incrd
> +    movs%4                       xm4, [%5]
> +    divs%4                       xm4, xm0
>  %endif
>

(Since that's what we're doing here. Again, unrelated, so probably ignore.)

I'd probably suggest to mark the macro as taking 3-5 arguments, then
provide 3 arguments for int16 and 5 for float/double, and keep hardcoding
pd_0x4000 in the places where we use int16.

Ronald