[FFmpeg-devel] [PATCH 0/4] fftw exploration (WIP)
Ronald S. Bultje
rsbultje at gmail.com
Fri Mar 25 21:40:55 CET 2016
On Fri, Mar 25, 2016 at 3:55 PM, Ganesh Ajjanagadde <gajjanag at gmail.com>
> On Fri, Mar 25, 2016 at 12:11 PM, Paul B Mahol <onemda at gmail.com> wrote:
> > On 3/25/16, Ganesh Ajjanagadde <gajjanag at gmail.com> wrote:
> >> On Fri, Mar 25, 2016 at 9:36 AM, Nicolas George <george at nsup.org>
> >>> Le sextidi 6 germinal, an CCXXIV, Ganesh Ajjanagadde a ecrit :
> >>>> Depends on if it is small or not. Yes, in many codecs, FFT's are short
> >>>> length ones, e.g 512. However, on long lengths, e.g 8192+, as seen
> >>>> from the benches, there are sometimes 2x variations at the moment.
> >>> And how much of the actual total decoding is spent in the FFT? Even a
> >>> speedup would be useless if it is for a function that never amounts to
> >>> more
> >>> than 0,01% of the actual time. The FFT is probably not that negligible,
> >>> but
> >>> this is not a *50 speedup either, and I have no idea how frequent are
> >>> lengths.
> >> Paul had some interest in 2^17 fft's at a point.
> > And it was done in avfft. So feel free to improve our avfft instead.
> Just to be clear: I won't be working on improving avfft, but of course
> I won't oppose patches generally.
> Basically, it boils down to the current asm code being a mess
"I don't understand the code" is not the same as "the code is a mess".
The code is not a mess, it's highly optimized and you could ask the person
that wrote it (see copyright line) for details instead of loudly
complaining that you can't understand it.
> I can't even identify really what algo is being used.
Check the C code.
> If anyone cares here, I do not know why we can't use inline asm or
> intrinsics. The chief benefit of intrinsics is that the hard part (for
> humans) of register allocation/checks is taken care of, but optimized
> instructions get used in a readable fashion. Anyway, I know there are
> a ton of arguments against it in FFmpeg, almost none of which I buy in
> 2016 with modern toolchains.
Inline asm doesn't solve any problem you just mentioned, in fact it makes
things worse because it doesn't work on half of our supported compilers
Intrinsics generally perform worse (or at the very least "inconsistent")
than the same sequence of operations written out in hand-written asm.
If you're interested in getting actual help in learning "hand-written"
assembly, let us know and we'll help you move on.
More information about the ffmpeg-devel