[FFmpeg-devel] [PATCH] Fix mm_flags, mm_support for ARM

Laurent Desnogues laurent.desnogues
Tue Jul 1 12:56:13 CEST 2008

On Tue, Jul 1, 2008 at 12:25 PM, Siarhei Siamashka
<siarhei.siamashka at gmail.com> wrote:
>> What you say is not always true:  when you have data close
>> to instructions, you pollute your Icache with data, and your
>> Dcache with instructions;  on top of that you make sure you
>> need one Itlb *plus* one Dtlb entry.
> In order to reduce instruction/data cache pollution, data and code can
> be aligned at cache line boundaries, hence the use of .balign
> directives.

Doing so does not prevent from the above-mentionned TLB
thrashing;  getting short on TLB entries is something you
really don't want and can kill performance by a big factor.
And you will still lose some cache words due to forced

> Do you know any way of generating code for ARM which would not
> intermix instructions with data? You should keep in mind that all the
> ARM instructions (I'm not considering thumb here) have fixed size
> which is 32-bit. You can't fit any arbitrary constant immediate
> operand in it. Moreover, you can't encode some absolute address into
> instruction and get it fixed by applying relocations. So absolute
> addresses are always stored intermixed with code and accessed using
> pc-relative addressing. Please try to compile something like the
> following fragment to see what is generated (pay attention to how
> external variables are accessed so that this code can be linked with
> other object files):
> extern int x;
> extern int y;
> extern int z;
> void set_global_variables()
> {
>    x = 0x12345678;
>    y = 0x1234;
>    y = 0x12;
> }

I know ARM well enough, thanks :-)

On ARMv7 you have movt/movw instructions to load 32 bit
constants using two instructions (or unsigned 16 bit
constants using one instruction).  And ARM ELF defines
relocation tags for these (R_ARM_MOV{T,W})

Latest CSL gcc release uses these instructions.

>> I think both approaches have to be benchmarked in real
>> life situation, and on several processors.
> Please do it. Any improvements are very much welcome. Based on your
> previous posts, I assume that you have ARM hardware to run these
> tests.

Heh, I was just making some comment about a claim you
made that's not proven.  I know some ARM design well
enough to know loading constants is better in some places
by using movt/movw.  If I had access to older designs such
as ARM11 or ARM9, I would benchmark on them.

>> Also when loading from memory, if your data side is blocking
>> then you are basically stalling your pipeline while the data is
>> loaded.
> When all the data fits into a single cache line, adding one more
> constant so that this data set still fits cache line, will not
> introduce extra cache misses. It there anything wrong in this
> statement (except for my English grammar)? Cache line is 32 bytes on
> ARM9/ARM11 and 64 bytes on Cortex-A8

Yeah I agree.

Don't take what I told as a criticism toward you. I just want
to be sure people don't take for granted that what is true
on a given processor is also true on following generations.


More information about the ffmpeg-devel mailing list