[FFmpeg-devel] pre discussion around Blackfin dct_quantize_bfin routine

Marc Hoffman mmhoffm
Tue Jun 12 17:47:33 CEST 2007

On 6/12/07, Reimar Doeffinger <Reimar.Doeffinger at stud.uni-karlsruhe.de> wrote:
> Hello,
> On Tue, Jun 12, 2007 at 09:27:22AM -0400, Marc Hoffman wrote:
> [...]
> > O really, I have never seen such a problem interesting.  Anyways I'm
> > sure it exists, however this is for a specific machine which I know
> > this works for.  Based on that fact what are your thought, or do you
> > have a suggestion other than the wasteful use of electrons:  hi<<32ll
> > | lo?
> Have you tested the code this generates? Because at least for the Atmel
> 8 bit stuff gcc optimizes this perfectly, so no need to avoid it.

I guess the compiler needs some support around DImode and the asm stuff.

unsigned long long read_time (void)
  unsigned long long t0;
  unsigned lo,hi;
  asm volatile ("%0=cycles; %1=cycles2;" : "=d" (lo), "=d" (hi));
  t0 = lo;
  t0 |= (unsigned long long)hi << 32;
  return t0;

Generates the following output codes, I would have thought that the
compiler would  have done something a little bit more efficient than

        .align 4
.global _read_time;
.type _read_time, STT_FUNC;
        LINK 0;
        R0=cycles; R2=cycles2;
        R1 = 0 (X);
        R3 = R2;
        R2 = 0 (X);
        R0 = R0 | R2;
        R1 = R1 | R3;
        .size   _read_time, .-_read_time
        .ident  "GCC: (GNU) 4.1.1 (ADI 07R1)"

I'm not sure why the compiler is compelled to use logical or....

Even the structure based method produces better code, its still not optimial.

unsigned long long read_time1 (void)
  union {
    struct {
      unsigned lo;
      unsigned hi;
    } p;
    unsigned long long c;
  } t;
  asm ("%0=cycles; %1=cycles2;" : "=d" (t.p.lo), "=d" (t.p.hi));
  return t.c;

.type _read_time1, STT_FUNC;
        LINK 0;
        R2=cycles; R3=cycles2;
        R0 = R2;
        R1 = R3;
        .size   _read_time1, .-_read_time1
        .ident  "GCC: (GNU) 4.1.1 (ADI 07R1)"
yoda:~ mmh$

And at the end of the day all we really want is this:

        LINK 0;
        R0=cycles; R1=cycles2;

With all that said can I use the struct/union on Blackfin its
supported correctly and produces the correct results.  We are arguing
about something which is not relevant here because the union is being
used to do a hardware specific mapping which I believe is acceptable.

Once the compiler is correct I will change it.  Actually the code that
should work is actually something like this:

static inline uint64_t read_time (void)
  unsigned long long t0;
  asm volatile ("%0=cycles; %H0=cycles2;" : "=D" (t0));
  return t0;

Which unfortunately doesn't work atm.


