[Ffmpeg-devel] [PATCH] H.264 deblocking mmx

Michael Niedermayer michaelni
Mon Apr 25 01:32:21 CEST 2005


Hi

On Monday 25 April 2005 00:39, Loren Merritt wrote:
> I noticed that the inloop deblocking filter was taking a large fraction of
> the decode time, and it is inherently parallel, so...
>
> Benchmarks on my Athlon-XP:
> C:
> 4182 dezicycles in filter_mb_edgecv, 4193308 runs, 996 skips
> 4004 dezicycles in filter_mb_edgech, 4193305 runs, 999 skips
> 9930 dezicycles in filter_mb_edgev, 4191771 runs, 2533 skips
> 11200 dezicycles in filter_mb_edgeh, 4191510 runs, 2794 skips
>
> MMX:
> 2197 dezicycles in filter_mb_edgecv, 4193544 runs, 760 skips
> 1714 dezicycles in filter_mb_edgech, 4193733 runs, 571 skips
> 4928 dezicycles in filter_mb_edgev, 4192872 runs, 1432 skips
> 3977 dezicycles in filter_mb_edgeh, 4193087 runs, 1217 skips
>
> total: +17% decode speed
>
> ... however, I have reports that this patch crashes on some systems and
> doesn't even compile on amd64. So I'm offering it for anyone who wants to
> figure out what's broken.

[...]

>+        :: "r"(pix-3*stride), "r"(pix), "r"(stride),
>+           "r"(tc0), "r"(alpha), "r"(beta), "m"(ff_pw_4),
>+           "m"(tmp0), "m"(tmp1)

tmp0/tmp1 are writen here as input operands but stuff is written into them, 
stride also needs to be 64bit on amd64 this should be 

        :  "+m"(tmp0), "+m"(tmp1)
        :  "r"(pix-3*stride), "r"(pix), "r"((long)stride),
           "r"(tc0), "r"(alpha), "r"(beta), "m"(ff_pw_4),

btw, commit it, if it works on your computer, we will fix amd64 and any other 
issues as people provide bugreports ...

[...]
-- 
Michael

"nothing is evil in the beginning. Even Sauron was not so." -- Elrond





More information about the ffmpeg-devel mailing list