[FFmpeg-devel] [PATCH] Fix mm_flags, mm_support for ARM
Wed Jul 2 20:57:01 CEST 2008
Siarhei Siamashka wrote:
> On Monday 30 June 2008, matthieu castet wrote:
>>> Could you or anybody else having compatible ARM device just do some
>>> benchmarking to confirm my results (I posted benchmarks here multiple
>>> times already). It would be a really good help. Because I feel that
>>> some people here still doubt that it provides a major performance
>> For dct-test (yes I know it is not a benchmark) on a arm926ejs svn
>> implementation got 126.7 kdct/s, your 154.6 kdct/s.
>>> Once/if the performance improvement is confirmed, a help with integration
>>> would be really needed. That's not a joke, I really fail to see any
>>> problems with the "balign/ASMALIGN/stack alignment" stuff, so I can't fix
>>> them. A good example of a solution (a working patch) is very much
>> Could you list the integration problem that remains ?
> AFAIK the known problems are only alignment related. But I may be wrong.
May be also a way to keep MAX_NEG_CROP synchronised.
A easy way could be to include dsputils.h and use #ifndef __ASSEMBLY__
to hide C declaration
>> For the alignement stack, may be for old eabi you could use ldm/stm
>> instead of double load/store instruction but still use double load/store
>> instruction on EABI.
Another solution could be to use only your code on EABI and using the
old code for not aligned stack.
>> For memory pool, why don't you do only one memory pool ?
>> With a good packing, this could avoid lot's of balign.
> The problem is that normal LDR/STR instructions can have +-4096 as immediate
> offset when addressing memory. But LDRD/STRD can only have +-256 as immediate
> offset. When using pc-relative addressing, it means that memory pool needs to
> be very close to the code using it. So having several pools is required when
> using LDRD/STRD instructions here.
Well you never exploit the fact that the offset is signed.
I manage to make only one pool by :
- making idct_two_col_armv5te a function instead of a macro (I know I
introduce some extra cycle for call/ret, but there less code and data
(fit better in cache)).
- put all constant between idct_two_col_armv5te and idct_rows_armv5te
More information about the ffmpeg-devel