[Ffmpeg-devel] [RFC] smallcpy for h264

Michael Niedermayer michaelni
Sat Oct 7 13:37:54 CEST 2006


On Sat, Oct 07, 2006 at 01:25:59PM +0200, Luca Barbato wrote:
> Michael Niedermayer wrote:
> > Hi
> > 
> > 
> > gcc on x86 replaces memcpy(constant) by inlined and fast code IIRC so
> > anything like the proposed patch needs to be very carefully benchmarked
> > on x86
> that's why I'm posting it
> > 
> > additionally due to call overhead compared to inlined double/uint64_t based
> > copy most of this cannot be faster even if the functions could do the copy
> > in 0 cpu cycles
> so do you think macros are better?

yes, and there should be 3 user selectable cases
1. always use memcpy and leave it to gcc
2. use generic uint64_t based copy
3. use cpu specific tricks which of course will break runtime cpu selection

but before i will agree to this i want
1. to know why we spend a significnat time doing small memcpys
2. why ppc doesnt inline memcpy like x86 does

furthermore these aligment related changes must be split,reviewed
and applied before any benchmarking makes sense (= your benchmark
of missaliged arrays with memcpy vs. your code with aligned arrays
might show more the speed difference of alignment and less that
of the actual code)

Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In the past you could go to a library and read, borrow or copy any book
Today you'd get arrested for mere telling someone where the library is

More information about the ffmpeg-devel mailing list