[FFmpeg-devel] pre discussion around Blackfin dct_quantize_bfin routine

Marc Hoffman mmh
Tue Jun 12 12:17:11 CEST 2007


On Jun 12, 2007, at 5:28 AM, Reimar Doeffinger wrote:

> Hello,
> On Mon, Jun 11, 2007 at 10:58:00PM -0400, Marc Hoffman wrote:
>> This is my rough draft it works well with the small precision errors
>> due to 16bit arithmetic, actually its the same (as far as I can tell)
>> as the MMX quantizer. I ran my reference fixpoint against the mmx and
>> it seems to work on the cases I tried, its 1 bit off which seems
>> about right considering the truncation of the coeffs from 22 to
>> 16bits.  Can someone look over my shoulder and give a peak. I know I
>> have to clean up the codes a bit but I wanted to get someone else to
>> review my work up until now.  I have used inline asm, Michael this
>> should be pretty straight forward for you to get your head around.
>> This doubles the performance roughly.
> Hmm. Ok, most is cleanup suggestions so I might be telling you things
> you already know, still...

Actually any help and suggestions are more than welcome they make me  

>> #define clock()      ({ int _t; asm volatile ("%0=cycles;" :  
>> "=d" (_t)); _t; })
> If you leave that in the final version you should probably use a less
> collision-freindly name like bfin_clock or so.
> Also that ({ }) construction is a nasty gcc-ism, you could avoid it by
> making it a static always_inline fuction, though that increases the  
> risk
> of the compiler messing up...
> And actually, maybe you could just implement START_TIMER and  
> in libavutil (or does that do something else)?
> If not, maybe naming it START_PROF and END_PROF is at least more
> consistent, and least the EPROF name is not quite obvious in its  
> meaning
> ;-)
This makes perfect sense considering thats what I care about. How do  
you enable this level of performance analysis?

> [...]
>>            av_log (0,0,"%-20s: %12.4f\t%12.4f\n", TelemNames 
>> [i],v,v/64);
>>    av_log (0,0,"%-20s: %12.4f\t%12.4f\n%20.4f\t%d\n", "total",s/ 
>> TelemCnt,s/TelemCnt/64,s,TelemCnt);
> it might have the same result but using NULL and one of the loglevel
> defines as first av_log arguments instead of 0 looks probably better
Ok I will use

av_log (NULL, 0, ....)

>> #define abs(x) (((x)<0)?-(x):(x))
> I think FFABS from libavutil does exactly the same...

>> #define L1CODE __attribute__ ((l1_text))
> for consistency, I'd name it e.g. attribute_l1_text

I will circle back and change all of these, actually is there a place  
I can put this type of define globally because I now have it in 3  
different places in the code?

> [...]
>>    /*  for(i=start_i; i<64; i++) {                           */
>>    /*      sign     = (block[i]>>15)|1;                      */
>>    /*      level    = ((abs(block[i])+bias[0])*qmat[i])>>16; */
>>    /*      if (level < 0) level = 0;                         */
>>    /*      max     |= level;                                 */
>>    /*      level    = level * sign;                          */
>>    /*      block[i] = level;                                 */
>>    /*  } */
> below you put the C code to the right of the asm. I'd say make it
> consistent (possibly just remove it, but if not C is not  
> necessarily the
> best way to annotate it).
> Unfortunately I don't know what it does so I'm not sure it really  
> makes
> sense, but I'd suggest splitting the long asm lines, e.g.:
>>                  "     r1=abs r1  
>> (v)                                    || r2=[%2++];\n\t"
> vs.
>>                   "     r1=abs r1 (v)"
>>                   "       || r2=[%2++];\n\t"
> or
>>                  "     r1.h=(a1 =r1.h*r2.h), r1.l=(a0 =r1.l*r2.l)  
>> (tfu);             \n\t"
> vs.
>>                  "     r1.h=(a1 =r1.h*r2.h),
>>                  "     r1.l=(a0 =r1.l*r2.l)
>>                  "(tfu);             \n\t"
> The best way of doing this of course depends on the actual meaning of
> this, and my suggestions might be rather stupid thus ;-)

Not stupid at all Reimar, actually its my personal taste to keep the  
instruction stream on a single line.  This has been a point of  
argument in the past.  My only reason for liking the spliting as you  
suggested is to keep things with in 80 char lines.  But to me the  
body seems so simple to read this way.  Lets look at your way.

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: reimar.txt
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070612/465e9b3c/attachment.txt>
-------------- next part --------------

I kind of like your idea here, looks real nice with the commentary  
integrated into the body.


More information about the ffmpeg-devel mailing list