[FFmpeg-devel] [PATCH 2/4] armv6: Accelerate ff_fft_calc for general case (nbits != 4)

Michael Niedermayer michaelni at gmx.at
Fri Jul 11 02:44:44 CEST 2014


On Fri, Jul 11, 2014 at 12:14:29AM +0100, Ben Avison wrote:
> The previous implementation targeted DTS Coherent Acoustics, which only
> requires nbits == 4 (fft16()). This case was (and still is) linked directly
> rather than being indirected through ff_fft_calc_vfp(), but now the full
> range from radix-4 up to radix-65536 is available. This benefits other codecs
> such as AAC and AC3.
> 
> The implementaion is based upon the C version, with each routine larger than
> radix-16 calling a hierarchy of smaller FFT functions, then performing a
> post-processing pass. This pass benefits a lot from loop unrolling to
> counter the long pipelines in the VFP. A relaxed calling standard also
> reduces the overhead of the call hierarchy, and avoiding the excessive
> inlining performed by GCC probably helps with I-cache utilisation too.
> 
> I benchmarked the result by measuring the number of gperftools samples that
> hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
> specifically in the FFT routines (fft4() to fft512() and pass()) for the
> same sample AAC stream:
> 
>               Before          After
>               Mean   StdDev   Mean   StdDev  Confidence  Change
> Audio decode  2245.5 53.1     1599.6 43.8    100.0%      +40.4%
> FFT routines  940.6  22.0     348.1  20.8    100.0%      +170.2%
> ---
>  libavcodec/arm/fft_init_arm.c |    8 +-
>  libavcodec/arm/fft_vfp.S      |  261 ++++++++++++++++++++++++++++++++++++++---
>  2 files changed, 252 insertions(+), 17 deletions(-)

it seems this fails to build
./configure  --cross-prefix=/usr/arm-linux-gnueabi/bin/ --cc='ccache arm-linux-gnueabi-gcc-4.5' --extra-cflags='-mfpu=neon -mfloat-abi=softfp' --cpu=cortex-a8 --arch=armv7 --target-os=linux --enable-cross-compile && make -j12

AS      libavcodec/arm/fft_vfp.o
ffmpeg/libavcodec/arm/fft_vfp.S: Assembler messages:
ffmpeg/libavcodec/arm/fft_vfp.S:512: Error: cannot use register index with PC-relative addressing -- `ldr ip,[pc,ip,lsl#2]'
make: *** [libavcodec/arm/fft_vfp.o] Error 1

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The educated differ from the uneducated as much as the living from the
dead. -- Aristotle 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20140711/c0de0919/attachment.asc>


More information about the ffmpeg-devel mailing list