[Ffmpeg-devel] [PATCH] fix mpeg4 lowres chroma bug and increase h264/mpeg4 MC speed

Michael Niedermayer michaelni
Fri Feb 9 12:47:46 CET 2007


Hi

On Fri, Feb 09, 2007 at 12:03:47AM -0800, Trent Piepho wrote:
> On Thu, 8 Feb 2007, Michael Niedermayer wrote:
> > On Thu, Feb 08, 2007 at 02:48:53AM -0800, Trent Piepho wrote:
> > [...]
> > > Anyway, there is an obvious way to make it faster that we both missed the
> > > first time:
> > >
> > > #define H264_CHROMA_OP2(S,D,T) "punpcklwd 2+" #S ", " #D "\n\t"
> > >
> > > This is about 4.38% faster than the my first patch, and 17.4% faster than
> > > the original code.
> >
> > but slower then what is in svn which is what matters (it slows h.264 down)
> 
> No, it's not slower, it's faster.  I calculated that last percentage
> incorrectly, it should be 15.4%.  That means, it is 15.4% faster than what
> is in svn now, in addition to working correctly, which the svn code does
> not.
> 
> Do you disagree with me that avg_h264_chroma_mc4_mmx2 is completely broken?
> 
> put_h264_chroma_mc4_mmx2() can work in for h264, but you clobber random
> memory after the end of the image.  If you increase the stride and pad the
> end of the image, you use more memory (cache effects, vo's that don't like
> gaps between lines), which could make things slower too.
> 
> IMHO, it's not obvious that the padding method will be faster than what
> I've come up with.
> 
> > could you send seperate patches for each separate change
> 
> Ok, I'm sending two patches.  The first patch fixes the functions so they
> both work.  The second uses a table lookup instead of a multiply to speed
> up the calculation of x*y.  I benchmarked put_h264_chroma_mc4_mmx2() with
> rdtsc, using 1200 frames (fits in disk cache) of the Elephant Dream's 1024
> MPEG4 avi with lowres=2.  mplayer (with correct options) was run 50 times
> in a row for each version, then that was repeated 3 times (total 150 runs
> per version).
> 
> Estimated relative speed improvement against current svn:
> fixed	       -5.68%
> svn		0.00%
> table	       16.04%
> fixed+table    14.77%
> 
> The version with both patches is only 1.51% slower than if just the table
> lookup is applied (which will not fix the bugs), and is still 14.77% faster
> than what is in svn now.

ive benchmarked it too, and the table version alone is slower then svn

3 runs of the old code:
622 dezicycles in avg_h264_chroma_mc2_mmx2, 130876 runs, 196 skips
643 dezicycles in put_h264_chroma_mc2_mmx2, 523609 runs, 679 skips

609 dezicycles in avg_h264_chroma_mc2_mmx2, 130837 runs, 235 skips
651 dezicycles in put_h264_chroma_mc2_mmx2, 523550 runs, 738 skips

614 dezicycles in avg_h264_chroma_mc2_mmx2, 130881 runs, 191 skips
642 dezicycles in put_h264_chroma_mc2_mmx2, 523481 runs, 807 skips

3 runs of your table code:
638 dezicycles in avg_h264_chroma_mc2_mmx2, 130851 runs, 221 skips
670 dezicycles in put_h264_chroma_mc2_mmx2, 523632 runs, 656 skips

631 dezicycles in avg_h264_chroma_mc2_mmx2, 130839 runs, 233 skips
671 dezicycles in put_h264_chroma_mc2_mmx2, 523588 runs, 700 skips

638 dezicycles in avg_h264_chroma_mc2_mmx2, 130835 runs, 237 skips
675 dezicycles in put_h264_chroma_mc2_mmx2, 523555 runs, 733 skips

benchmark done with:
static void avg_h264_chroma_mc2_mmx2_wrap(uint8_t *dst/*align 2*/, uint8_t *src/*align 1*/, int stride, int h, int x, int y){
START_TIMER
    avg_h264_chroma_mc2_mmx2(dst,src,stride,h,x,y);
STOP_TIMER("avg_h264_chroma_mc2_mmx2")
}

static void put_h264_chroma_mc2_mmx2_wrap(uint8_t *dst/*align 2*/, uint8_t *src/*align 1*/, int stride, int h, int x, int y){
START_TIMER
    put_h264_chroma_mc2_mmx2(dst,src,stride,h,x,y);
STOP_TIMER("put_h264_chroma_mc2_mmx2")
}

and *_h264_chroma_mc2_mmx2 marked with attribute((noinline))
file was a 512x256 movie trailer i had laying around decoded with
ffplay -lowres 2

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

it is not once nor twice but times without number that the same ideas make
their appearance in the world. -- Aristotle
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070209/962bb745/attachment.pgp>



More information about the ffmpeg-devel mailing list