[FFmpeg-devel] [patch][OpenHEVC]added ASM functions for epel + qpel

Ronald S. Bultje rsbultje at gmail.com
Fri Mar 7 12:43:07 CET 2014


Hi,


On Thu, Mar 6, 2014 at 10:40 AM, Pierre Edouard Lepere <
Pierre-Edouard.Lepere at insa-rennes.fr> wrote:

> new patch, now all in a single, smaller file !
>

Thanks!


> >> +    sub             srcq, 1
> >
> >Why? Just subtract one from src when you dereference from it [srcq-1]
> >instead of [srcq]).
>
> because it's more convenient, having filters start at src whether we are
> in h, v or hv.
>

Right, I understand, but we're writing assembly, this isn't exactly
convenient. I'm fine with it as a FIXME for later but at least mark it in
the code as such - it does save one instruction.


> > +    EPEL_LOAD         8, src, 1
> > +    EPEL_COMPUTE       8, 2
> > +    PEL_STORE2       dst, m0, m1
> > +    LOOP_END   epel_h_h_2_8, dst, dststride, src, srcstride
> > +    RET
>
> OK, so the actual code. For play, can you show the _actual disassembly_
> that all these macros eventually got us to? I wonder what it actually
> gives.
>

(Still hoping for this one.)


> >I can understand the pmaddwd approach for second pass may be faster for
> >half-registers, since you fill the register up to full width and save one
> >instruction - but did you measure it?
> >
> >Then, for second, you're just spending instructions shuffling. I don't
> >think 2a is faster than 2b, in fact I expect it to be significantly
> slower.
>
> This was done first with intrinsics, and pmulhw was needed, so it adds
> just too much instructions.


Oh intermediates aren't downshifted at all I guess - that sucks. OK fine
then I guess.

Ronald


More information about the ffmpeg-devel mailing list