[Ffmpeg-devel] [PATCH] H.264 deblocking mmx

Aurelien Jacobs aurel
Mon Apr 25 14:11:39 CEST 2005


On Mon, 25 Apr 2005 01:32:21 +0200
Michael Niedermayer <michaelni at gmx.at> wrote:

> Hi
> 
> On Monday 25 April 2005 00:39, Loren Merritt wrote:
> > I noticed that the inloop deblocking filter was taking a large
> > fraction of the decode time, and it is inherently parallel, so...
> >
> > Benchmarks on my Athlon-XP:
> > C:
> > 4182 dezicycles in filter_mb_edgecv, 4193308 runs, 996 skips
> > 4004 dezicycles in filter_mb_edgech, 4193305 runs, 999 skips
> > 9930 dezicycles in filter_mb_edgev, 4191771 runs, 2533 skips
> > 11200 dezicycles in filter_mb_edgeh, 4191510 runs, 2794 skips
> >
> > MMX:
> > 2197 dezicycles in filter_mb_edgecv, 4193544 runs, 760 skips
> > 1714 dezicycles in filter_mb_edgech, 4193733 runs, 571 skips
> > 4928 dezicycles in filter_mb_edgev, 4192872 runs, 1432 skips
> > 3977 dezicycles in filter_mb_edgeh, 4193087 runs, 1217 skips
> >
> > total: +17% decode speed
> >
> > ... however, I have reports that this patch crashes on some systems
> > and doesn't even compile on amd64. So I'm offering it for anyone who
> > wants to figure out what's broken.
> 
> [...]
> 
> >+        :: "r"(pix-3*stride), "r"(pix), "r"(stride),
> >+           "r"(tc0), "r"(alpha), "r"(beta), "m"(ff_pw_4),
> >+           "m"(tmp0), "m"(tmp1)
> 
> tmp0/tmp1 are writen here as input operands but stuff is written into
> them,  stride also needs to be 64bit on amd64 this should be 
> 
>         :  "+m"(tmp0), "+m"(tmp1)
>         :  "r"(pix-3*stride), "r"(pix), "r"((long)stride),
>            "r"(tc0), "r"(alpha), "r"(beta), "m"(ff_pw_4),
> 
> btw, commit it, if it works on your computer, we will fix amd64 and
> any other  issues as people provide bugreports ...

I've just tested the commited version on amd64. It compiles well and it
seems to work fine (I only played few h264 videos, I don't know if it's
enough to test this).

Aurel





More information about the ffmpeg-devel mailing list