[FFmpeg-devel] [PATCH 0/4] fftw exploration (WIP)
gajjanag at gmail.com
Fri Mar 25 20:55:02 CET 2016
On Fri, Mar 25, 2016 at 12:11 PM, Paul B Mahol <onemda at gmail.com> wrote:
> On 3/25/16, Ganesh Ajjanagadde <gajjanag at gmail.com> wrote:
>> On Fri, Mar 25, 2016 at 9:36 AM, Nicolas George <george at nsup.org> wrote:
>>> Le sextidi 6 germinal, an CCXXIV, Ganesh Ajjanagadde a ecrit :
>>>> Depends on if it is small or not. Yes, in many codecs, FFT's are short
>>>> length ones, e.g 512. However, on long lengths, e.g 8192+, as seen
>>>> from the benches, there are sometimes 2x variations at the moment.
>>> And how much of the actual total decoding is spent in the FFT? Even a *50
>>> speedup would be useless if it is for a function that never amounts to
>>> than 0,01% of the actual time. The FFT is probably not that negligible,
>>> this is not a *50 speedup either, and I have no idea how frequent are long
>> Paul had some interest in 2^17 fft's at a point.
> And it was done in avfft. So feel free to improve our avfft instead.
Just to be clear: I won't be working on improving avfft, but of course
I won't oppose patches generally.
Basically, it boils down to the current asm code being a mess so that
I can't even identify really what algo is being used. Sure, I can go
and locally optimize, insert fma's, move some 128 bit simd to 256 bit
simd, etc. But it is not even clear that the approach in FFmpeg is the
best possible among currently known FFT approaches; the only marker I
see is "based on ideas from libdjbfft".
If anyone cares here, I do not know why we can't use inline asm or
intrinsics. The chief benefit of intrinsics is that the hard part (for
humans) of register allocation/checks is taken care of, but optimized
instructions get used in a readable fashion. Anyway, I know there are
a ton of arguments against it in FFmpeg, almost none of which I buy in
2016 with modern toolchains.
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
More information about the ffmpeg-devel