[FFmpeg-devel] [PATCH] Altivec split-radix FFT
Wed Aug 26 00:07:05 CEST 2009
On Mon, Aug 24, 2009 at 1:43 PM, Loren Merritt<lorenm at u.washington.edu> wrote:
> Fixed oprofile thanks to M?ns.
> Now measures 1.85x FFT speedup, on the sizes vorbis uses.
> On Mon, 24 Aug 2009, Guillaume POIRIER wrote:
>>> I used raw asm rather than intrinsics because gcc adds a ginormous
>>> to each function call. Is there anything I need to do to make it work on
>>> ppc64, if it doesn't already?
>> I'll look into this. I think all you need to do is avoid refering to
>> the general purpose registers' name explicitely.
> You mean using registers' number instead? That's the only thing gas
I have no idea what I'm talking about ;-)
Your ASM looks OK at first look, but there's smth wrong about it that
doesn't make it PPC64-compatible:
Starting program: /home/guillaume/ffmpeg-svn/libavcodec/fft-test
FFT 512 test
Program received signal SIGSEGV, Segmentation fault.
0x1001115800000000 in ?? ()
#1 0x00000000100107d0 in .ff_fft_calc_altivec ()
(gdb) disassemble $pc-32 $pc+32
Dump of assembler code from 0x100107b0 to 0x100107f0:
0x00000000100107b0 <.ff_fft_calc_altivec+524>: ld r9,-31728(r2)
0x00000000100107b4 <.ff_fft_calc_altivec+528>: rldicr r0,r0,3,60
0x00000000100107b8 <.ff_fft_calc_altivec+532>: add r9,r9,r0
0x00000000100107bc <.ff_fft_calc_altivec+536>: ld r11,0(r9)
0x00000000100107c0 <.ff_fft_calc_altivec+540>: mtctr r11
0x00000000100107c4 <.ff_fft_calc_altivec+544>: stw r2,-4(r1)
0x00000000100107c8 <.ff_fft_calc_altivec+548>: li r2,16
0x00000000100107cc <.ff_fft_calc_altivec+552>: bctrl
0x00000000100107d0 <.ff_fft_calc_altivec+556>: lwz r2,-4(r1)
0x00000000100107d4 <.ff_fft_calc_altivec+560>: ld r9,608(r31)
0x00000000100107d8 <.ff_fft_calc_altivec+564>: lwz r0,0(r9)
0x00000000100107dc <.ff_fft_calc_altivec+568>: extsw r0,r0
0x00000000100107e0 <.ff_fft_calc_altivec+572>: cmpwi cr7,r0,4
0x00000000100107e4 <.ff_fft_calc_altivec+576>: bgt- cr7,0x10010810
0x00000000100107e8 <.ff_fft_calc_altivec+580>: ld r11,616(r31)
0x00000000100107ec <.ff_fft_calc_altivec+584>: ld r9,608(r31)
End of assembler dump.
(gdb) print $r2
$1 = 16
(gdb) print $r1
$2 = 17359809783712
I need to look into this. I've never ported code to PPC64, so now's a
good time to start...
>> You patch doesn't apply cleanly here:
>> patch -p1 --dry-run < ../fft_altivec.diff
>> Did I miss something?
> That command works for me, on top of svn-r19689.
The problem was that my version of "patch" was confused by the CR/LF
line endings. Fixed locally.
Only a very small fraction of our DNA does anything; the rest is all
comments and ifdefs.
Ogden Nash - "The trouble with a kitten is that when it grows up,
it's always a cat." -
More information about the ffmpeg-devel