[FFmpeg-devel] [WIP] [PATCH 4/4] x86: dsputilenc: convert hf_noise*_mmx to yasm

Timothy Gu timothygu99 at gmail.com
Mon Jun 2 05:33:16 CEST 2014


On Jun 1, 2014 6:36 PM, "Michael Niedermayer" <michaelni at gmx.at> wrote:
> > +%if %1 == 16
> > +    push pix1q
> > +    push hq
> > +%endif
>

> dont use push/pop they can messup the yasm magic macros
> you can use PUSH/POP but better dont use them either, there should be
> enough registers

With the other local variable, there will be 6 registers used, which is IMO
a lot for a function like this. Is there any significant performance
penalty using PUSH/POP vs. local variable?

> > +    sub        hd, 2
> > +    pxor       m7, m7
> > +    pxor       m6, m6
> > +    HF_NOISE_PART1 %1, 0, 1, 2, 3
> > +    add     pix1q, lsizeq
> > +    HF_NOISE_PART1 %1, 4, 1, 5, 3
> > +    HF_NOISE_PART2     0, 2
> > +    add     pix1q, lsizeq
> > +.loop:
> > +    HF_NOISE_PART1 %1, 0, 1, 2, 3
> > +    HF_NOISE_PART2     4, 5
> > +    add     pix1q, lsizeq
> > +    HF_NOISE_PART1 %1, 4, 1, 5, 3
> > +    HF_NOISE_PART2     0, 2
> > +    add     pix1q, lsizeq
> > +    sub        hd, 2
> > +        jne .loop
> > +
> > +    mova       m0, m6
> > +    punpcklwd  m0, m7
> > +    punpckhwd  m6, m7
> > +    paddd      m6, m0
> > +    mova       m0, m6
> > +    psrlq      m6, 32
> > +    paddd      m0, m6
> > +%if %1 == 16
>
> > +    movd      ebx, m0   ; ebx = result of hf_noise16;
>
> you cant just write into a random register
> declare a local variable in the cglobal macro above and use it instead

OK. But how about the return value at the end? Is eax specifically designed
to be clobbered?

>
>
>
> > +    pop        hq       ; restore h and pix1
> > +    pop     pix1q
> > +    ; lsize is unchanged (except movsxd, which hf_noise8 is going to
do anyway)
> > +    add     pix1q, 8    ; pix1 = pix1 + 8;
>
> > +    call    hf_noise8   ; eax = hf_noise8_mmx(pix1, lsize, h);
>

> dont call cglobal functions, if you do you would have to emulate the
> calling conventions of all ABIs, x86_32 would pass arguments over the
> stack for example

Should I then plug in a version of `HF_NOISE 8` without cglobal and stuff?

>
> also looking at the disassembly of the function with gdb and the
> register values when it crashes (if it does) or single steping through
> the code wth gdb should help you understand whats the problem or
> difference between what you want and what the computer actually does

I spent 2 nights trying to debug this, without any luck. It seems like
[pix1q+2*lsizeq] is unallocated, as it crashes in the first instruction in
the loop, the first time it is executed. I can't understand how this would
happen.

I also compared the disassembly of the new code with the old inline one
line by line, but I can't find anything either.

[...]

Timothy


More information about the ffmpeg-devel mailing list