[Ffmpeg-devel] [PATCH] Little optimization to fft_sse.c
Mon Mar 6 13:28:35 CET 2006
On Mon, Mar 06, 2006 at 04:30:16PM +0800, Zuxy Meng wrote:
> 2006/3/6, Diego Biurrun <diego at biurrun.de>:
> > On Mon, Mar 06, 2006 at 03:15:55AM +0800, Zuxy Meng wrote:
> > >
> > > I have also written a 3DNow! version of fft. Is that still needed?
> > Sure, those old processors have not all been thrown away yet...
> Attached are FFT routines that can be used just the same as
> ff_fft_calc_sse (the external interfaces are all the same).
> For my Athlon XP 2800+, the 3DNow! version is about 5% slower than the
> SSE version but 50% faster than the FPU version. The speedup might be
> more prominent in a K6-2/III for its lack of a fully pipelined fpu.
> However, the fastest is the Extended 3DNow! version, which is yet 33%
> faster than the SSE version. So for an "original" K7 without SSE, the
> speedup is beyond 100%.
> Two reasons why I send complete source files instead of patches:
> 1. I'm not very familiar with ffmpeg's policy dealing ISA specific
> optimizations. Are macros like HAVE_3DNOWEX, HAVE_SSE,
> RUNTIME_CPUDETECTION etc. valid here? Changes must be made to
RUNTIME_CPUDETECTION always true
HAVE_* always true if HAVE_MMX
> libavcodec\fft.c but I don't know what's the proper way.
i would say
if(mm_support() & MM_SSE)
s->fft_calc = ff_fft_calc_sse;
s->fft_calc = ff_fft_calc_3dnow;
or something similar in fft.c
> 2. These two files are written in intrinsics like the original
> fft_sse.c. However, they require the mm3dnow.h instead of xmmintrin.h.
> The former seems to be absent in gcc3. Although gcc4's mm3dnow.h can
> be used for any gcc version that supports 3DNow builtins, subsequent
> change must be made to the configure script or we may include the
> header in the ffmpeg package. Again I don't know what's the proper
well, either dont use intrinsics or find a way so that the code works
everywhere, i mean either fix the gcc3 issue or disable the code for gcc3
> And more tests are of course necessary.
regression tests like we do for codecs and containers would be nice too
More information about the ffmpeg-devel