[Ffmpeg-devel] [PATCH] fix mpeg4 lowres chroma bug and increase h264/mpeg4 MC speed

Michael Niedermayer michaelni
Mon Feb 12 03:52:05 CET 2007


Hi

On Sun, Feb 11, 2007 at 03:40:39PM -0800, Trent Piepho wrote:
> On Fri, 9 Feb 2007, Michael Niedermayer wrote:
> > > Do you disagree with me that avg_h264_chroma_mc4_mmx2 is completely broken?
> 
> How come you never answer this?

well, its brokenness depends upon what it is supposed to do, i didnt write
that code ....

ideally it
should just write 2 bytes of course but that is too slow. reading 4 bytes
changing 2 and writing the 2changed and 2 unchanged is only slightly better
then writing 2 random extra bytes, multithreaded decoding still could be
affected and it could very well still crash your program if theres something
after the array which has changed in between the read and write ...

we could add a requirement that some extra bytes must be allocated after the
buffer but that might cause problems for some users of ffmpeg and wont help
with the multithreading also it doesnt seem like the correct solution for this
rather minor internal issue

maybe using the plain C version of the code for the rightmost column would be
an option ...


> 
> > > Estimated relative speed improvement against current svn:
> > > fixed	       -5.68%
> > > svn		0.00%
> > > table	       16.04%
> > > fixed+table    14.77%
> > >
> > > The version with both patches is only 1.51% slower than if just the table
> > > lookup is applied (which will not fix the bugs), and is still 14.77% faster
> > > than what is in svn now.
> >
> > ive benchmarked it too, and the table version alone is slower then svn
> 
> What processor?  I'm using Athlon-XP and gcc 4.0.1.  

AMD Duron / gcc 4.1.2
ill maybe test it on a few other cpus tomorrow


> Could you use a
> publicaly available clip, so that the benchmark can be replicated?

i will retry with a clip for which i know a public url


> 
> > static void put_h264_chroma_mc2_mmx2_wrap(uint8_t *dst/*align 2*/, uint8_t *src/*align 1*/, int stride, int h, int x, int y){
> > START_TIMER
> >     put_h264_chroma_mc2_mmx2(dst,src,stride,h,x,y);
> > STOP_TIMER("put_h264_chroma_mc2_mmx2")
> > }
> >
> > and *_h264_chroma_mc2_mmx2 marked with attribute((noinline))
> > file was a 512x256 movie trailer i had laying around decoded with
> > ffplay -lowres 2
> 
> I re-did the benchmarks the same way and got different results than my
> initial benchmark.  I think the problem may have been that I was counting
> the total cycles in a global variable that was close to the xtimesy table
> in memory, and that changed the cache behaviour, making the table lookup
> cheaper than it should have been.
> 
> Why do you discard some times in your TIMER code?  Is the goal just to
> discard those times in which an interrupt occured?  

yes


[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Let us carefully observe those good qualities wherein our enemies excel us
and endeavor to excel them, by avoiding what is faulty, and imitating what
is excellent in them. -- Plutarch
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070212/2261cae6/attachment.pgp>



More information about the ffmpeg-devel mailing list