[FFmpeg-devel] [PATCH] move h264 loopfilter strength code to yasm

Michael Niedermayer michaelni
Fri Sep 24 23:15:49 CEST 2010


On Fri, Sep 24, 2010 at 03:20:49PM -0400, Ronald S. Bultje wrote:
> Hi,
> 
> On Fri, Sep 24, 2010 at 12:26 PM, Daniel Verkamp <daniel at drv.nu> wrote:
> > On Fri, Sep 24, 2010 at 9:04 AM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> >> So removing pand (which doesn't do anything in the one case, and can
> >> be replaced by a pxor in the other). With the attached patch #2, I get
> >> this:
> >> /var/folders/Rz/RzQTCSLsFPWQeOEO5EXsJE+++TI/-Tmp-//cc8uAjPS.s:315:bad
> >> register name `%%mm0'
> >> /var/folders/Rz/RzQTCSLsFPWQeOEO5EXsJE+++TI/-Tmp-//cc8uAjPS.s:520:bad
> >> register name `%%mm0'
> >>
> >> What does that mean?
> >
> > If you omit all of the optional colon-separated arguments to asm, the
> > % symbols before register names in the asm no longer need to be
> > escaped with a second % (I suppose since there can be no substitution
> > when there are no operand constraints). ?You can add an empty : or
> > just drop the doubled % to avoid this.
> 
> OK, that fixes it. Oddly, it's the same speed, even though
> #instructions is less. OK, so next then. Attached patch is supposed to
> be part of a patch that decreases the insane amount of registers used
> for temporary stuff that could be loaded directly (so instead of doing
> (%0) where %0="m"(var[idx1]), use (%0,%1) with %0="r"(var) and
> %1="r"(idx1). This works and is not slower (eventually it will be
> faster when it saves a few registers, this is work-in-progress.
> 
> The second patch ("test") tries to use d_idx as a global (which it is,
> in effect). Why doesn't this work?
> 
> -                "por  (%0,%1), %%mm1 \n" // nnz[b] || nnz[bn]
> +                "por  %1(%0), %%mm1 \n" // nnz[b] || nnz[bn]
>                  ::"r"(nnz+b_idx),
> -                  "r"(d_idx)
> +                  "g"(d_idx)

for  %1(%0)
%1 must be a constant, it is not in the code so this cannot work

Either you have a for loop then this needs to be a register or you
can manuallay unroll it then it can be a constant

thats a limitation of x86 as you know ;)


The case where unrolling is left to gcc and gcc then would choose depending on
this between register and constant can probably done with av_builtin_constant_p
but that would be a huge mess i susoect and really not a good idea


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I have never wished to cater to the crowd; for what I know they do not
approve, and what they approve I do not know. -- Epicurus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100924/9fa0fb6d/attachment.pgp>



More information about the ffmpeg-devel mailing list