[FFmpeg-devel] [PATCH] Some ARM VFP optimizations (vector_fmul, vector_fmul_reverse, float_to_int16)
Sun Apr 20 23:54:30 CEST 2008
On Sun, Apr 20, 2008 at 11:41:57PM +0200, Michael Niedermayer wrote:
> On Sun, Apr 20, 2008 at 11:33:43PM +0300, Siarhei Siamashka wrote:
> > Now about the problems why this code is still not in FFmpeg SVN. The most
> > important problem that stalled everything is 64-bit alignment. In order to
> > perform dual loads, data needs to be 64-bit (8 byte) aligned. This applies
> > to constants loading and accessing temporary data on stack. Constants loading
> > requires some kind of '.align' directive in the code. The problem is that
> > '.align' is nonportable and '.balign' or '.p2align' are not supported by
> > broken Apple toolchain on x86 and apparently on ARM too (actually it has a
> > lot more issues in addition to this one). One more problem is related to
> > stack, because legacy systems and toolchains (those that use OABI) support
> > only 4 byte stack alignment. There was a tweak in the code that performs
> > stack realignment when IDCT code is built for these broken legacy systems
> > with absolutely no overhead on modern systems (this code does not get compiled
> > in for modern EABI systems that already provide 64-bit stack alignment), but
> > it was rejected by Michael Niedermayer. I strongly disagree with this decision
> > and breaking support for legacy systems for no reason is not a valid option
> > for me.
> I do not remember exactly but there was a serious problem with your suggested
> "solution", IIRC it caused some serious speedloss on some systems.
> 1. Why do you need a aligned stack? That is why cant you use the heap?
> 2. Assuming you do need one, where was the problem with using a recent gcc
> which supports maintaining stack alignment?
> 3. What effect does your solution have on systems which do align the stack
> aka a recent gcc on pre EABI. Or even a non gcc compiler.
To awnser 3.
huge speedloss, and thats why this isnt a solution
> > The last (actually minor) problem is related to crop table that is defined in
> > C code and used from assembly code. If anybody would ever change MAX_NEG_CROP
> > constant in C code for whatever reason, assembly implementation of ARMv5TE
> > IDCT will be silently broken (it will compile, but will break at runtime). I
> > would prefer to have compilation failure here. A good example is lowres
> > decoding support that is now broken for many less popular architectures
> > (lowres decoding was added at some time in the past, but nobody tweaked sparc
> > or alpha code to properly support it). Another option (which I initially used)
> > is just to have a copy of crop table stored in assembly source file, it will
> > take extra space, but will make code reliable and not dependent on any other
> > stuff from outside.
> > I really would like to hear some constructive feedback instead of "this is
> > unacceptable" with no alternatives suggested :)
> #if MAX_NEG_CROP != 12345
> Please fix code in file foobar
or better add a comment to the #define setting MAX_NEG_CROP
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
It is dangerous to be right in matters on which the established authorities
are wrong. -- Voltaire
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: Digital signature
More information about the ffmpeg-devel