[FFmpeg-devel] [WIP] [PATCH 4/4] x86: dsputilenc: convert hf_noise*_mmx to yasm
Michael Niedermayer
michaelni at gmx.at
Mon Jun 2 15:31:09 CEST 2014
On Sun, Jun 01, 2014 at 08:33:16PM -0700, Timothy Gu wrote:
> On Jun 1, 2014 6:36 PM, "Michael Niedermayer" <michaelni at gmx.at> wrote:
> > > +%if %1 == 16
> > > + push pix1q
> > > + push hq
> > > +%endif
> >
>
> > dont use push/pop they can messup the yasm magic macros
> > you can use PUSH/POP but better dont use them either, there should be
> > enough registers
>
> With the other local variable, there will be 6 registers used, which is IMO
> a lot for a function like this. Is there any significant performance
> penalty using PUSH/POP vs. local variable?
using an additional register has no cost, it might in some cases
require the yasm magic to push and pop it to preserve its content
for the caller but thats what you are doing too
also see http://www.agner.org/optimize/#manuals
>
> > > + sub hd, 2
> > > + pxor m7, m7
> > > + pxor m6, m6
> > > + HF_NOISE_PART1 %1, 0, 1, 2, 3
> > > + add pix1q, lsizeq
> > > + HF_NOISE_PART1 %1, 4, 1, 5, 3
> > > + HF_NOISE_PART2 0, 2
> > > + add pix1q, lsizeq
> > > +.loop:
> > > + HF_NOISE_PART1 %1, 0, 1, 2, 3
> > > + HF_NOISE_PART2 4, 5
> > > + add pix1q, lsizeq
> > > + HF_NOISE_PART1 %1, 4, 1, 5, 3
> > > + HF_NOISE_PART2 0, 2
> > > + add pix1q, lsizeq
> > > + sub hd, 2
> > > + jne .loop
> > > +
> > > + mova m0, m6
> > > + punpcklwd m0, m7
> > > + punpckhwd m6, m7
> > > + paddd m6, m0
> > > + mova m0, m6
> > > + psrlq m6, 32
> > > + paddd m0, m6
> > > +%if %1 == 16
> >
> > > + movd ebx, m0 ; ebx = result of hf_noise16;
> >
> > you cant just write into a random register
> > declare a local variable in the cglobal macro above and use it instead
>
> OK. But how about the return value at the end? Is eax specifically designed
> to be clobbered?
yes in functions retunring a 32bit int but
see http://en.wikipedia.org/wiki/X86_calling_conventions
(also maybe someone has a better reference?)
>
> >
> >
> >
> > > + pop hq ; restore h and pix1
> > > + pop pix1q
> > > + ; lsize is unchanged (except movsxd, which hf_noise8 is going to
> do anyway)
> > > + add pix1q, 8 ; pix1 = pix1 + 8;
> >
> > > + call hf_noise8 ; eax = hf_noise8_mmx(pix1, lsize, h);
> >
>
> > dont call cglobal functions, if you do you would have to emulate the
> > calling conventions of all ABIs, x86_32 would pass arguments over the
> > stack for example
>
> Should I then plug in a version of `HF_NOISE 8` without cglobal and stuff?
a macro might be easier if you duplicate it anyway
>
> >
> > also looking at the disassembly of the function with gdb and the
> > register values when it crashes (if it does) or single steping through
> > the code wth gdb should help you understand whats the problem or
> > difference between what you want and what the computer actually does
>
> I spent 2 nights trying to debug this, without any luck. It seems like
> [pix1q+2*lsizeq] is unallocated, as it crashes in the first instruction in
> the loop, the first time it is executed. I can't understand how this would
> happen.
well you can print the address and size of the array you intend to
access with av_log() before the function call and then inspect
in gdb what is actually accessed, it should be outside this printed
rebion if it crashes ...
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Opposition brings concord. Out of discord comes the fairest harmony.
-- Heraclitus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140602/ec207157/attachment.asc>
More information about the ffmpeg-devel
mailing list