[FFmpeg-devel] [PATCH 5/6] x86: lossless audio: SSE4 madd 32bits

Ronald S. Bultje rsbultje at gmail.com
Wed Apr 20 02:01:08 CEST 2016


Hi,

On Tue, Apr 19, 2016 at 4:42 PM, James Almer <jamrial at gmail.com> wrote:

> On 4/18/2016 6:25 PM, Christophe Gisquet wrote:
> > 2016-04-18 21:18 GMT+02:00 Michael Niedermayer <michael at niedermayer.cc>:
> >> > this breaks (only noise)
> >> > \[CCCP\]_Mega_Weird_Audio_Test.mkv track 23
> > Worthwhile sample.
> >
> > I rewrote the patch to reduce code duplication, and I fixed the issue
> > (misread a shift).
> >
> > -- Christophe
> >
> >
> > 0005-x86-lossless-audio-SSE4-madd-32bits.patch
> >
> >
> > From a0d4a96c032d73bc0e34fec320497aefafba3c28 Mon Sep 17 00:00:00 2001
> > From: Christophe Gisquet <christophe.gisquet at gmail.com>
> > Date: Mon, 18 Apr 2016 13:20:07 +0200
> > Subject: [PATCH 5/7] x86: lossless audio: SSE4 madd 32bits
> >
> > The unique user so far is wmalossless 24bits. The few samples tested
> show an
> > order of 8, so more unrolling or an avx2 version do not make sense.
> >
> > Timings: 72 -> 49 cycles
> > ---
> >  libavcodec/x86/lossless_audiodsp.asm    | 31
> +++++++++++++++++++++++++------
> >  libavcodec/x86/lossless_audiodsp_init.c |  7 +++++++
> >  2 files changed, 32 insertions(+), 6 deletions(-)
> >
> > diff --git a/libavcodec/x86/lossless_audiodsp.asm
> b/libavcodec/x86/lossless_audiodsp.asm
> > index 5597dad..d00869b 100644
> > --- a/libavcodec/x86/lossless_audiodsp.asm
> > +++ b/libavcodec/x86/lossless_audiodsp.asm
> > @@ -22,13 +22,17 @@
> >
> >  SECTION .text
> >
> > -%macro SCALARPRODUCT 0
> > +%macro SCALARPRODUCT 1
> >  ; int ff_scalarproduct_and_madd_int16(int16_t *v1, int16_t *v2, int16_t
> *v3,
> >  ;                                     int order, int mul)
> > -cglobal scalarproduct_and_madd_int16, 4,4,8, v1, v2, v3, order, mul
> > -    shl orderq, 1
> > +; int ff_scalarproduct_and_madd_int32(int32_t *v1, int32_t *v2, int32_t
> *v3,
> > +;                                     int order, int mul)
> > +cglobal scalarproduct_and_madd_int %+ %1, 4,4,8, v1, v2, v3, order, mul
> > +    shl orderq, (%1/16)
>
> order is int, so maybe it would be better to use orderd here, to make sure
> the upper
> half of the register is cleared on x86_64.
> Wonder why it was never an issue until now, though.


This is typically only an issue if the data came from stack. On win64 as
well as unix64, the 4th argument never comes from stack but is a direct
register argument instead.

Ronald


More information about the ffmpeg-devel mailing list