[FFmpeg-devel] [PATCH] avoid mb_xy recalculation in h264

Alexander Strange astrange
Sat May 10 09:09:04 CEST 2008


On May 7, 2008, at 10:42 PM, Alexander Strange wrote:

>
> On May 7, 2008, at 3:15 PM, Michael Niedermayer wrote:
>
>> On Wed, May 07, 2008 at 02:49:20PM -0400, Alexander Strange wrote:
>>>
>>> On May 7, 2008, at 8:53 AM, Michael Niedermayer wrote:
>>>
>>>> On Wed, May 07, 2008 at 03:05:47AM -0400, Alexander Strange wrote:
>>>>> This line appears in a lot of h264.c functions:
>>>>> const int mb_xy= s->mb_x + s->mb_y*s->mb_stride;
>>>>>
>>>>> It only changes once per MB, so we can add it to the context and  
>>>>> only
>>>>> recalculate it in decode_slice().
>>>>> I put it at the end of H264Context, but it could be moved up  
>>>>> earlier if
>>>>> someone likes it there. The position doesn't seem to affect  
>>>>> performance
>>>>> much; it can't be moved next to mb_x/mb_y since that would put  
>>>>> it in
>>>>> MpegEncContext, and none of the other codecs use it.
>>>>>
>>>>> Patches:
>>>>> 1- does that, updating it for MBAFF when needed
>>>>> 2- removes some newly unused variables
>>>>> 3- does the same thing to svq3.c
>>>>>
>>>>> Before: avg 8.799s max 8.8s min 8.794s
>>>>> After: avg 8.645s max 8.651s min 8.641s
>>>>
>>>> Knowing the cpu and compiler would also be usefull.
>>>
>>> gcc 4.0.1, core 2 x86-32 Darwin
>>>
>>> I guess it's about the same as the usual distro compiler, but this  
>>> isn't
>>> really a sensitive messing-with-the-register-allocator patch.
>>
>> How does the change compare to a splited (litterally duplicated)
>> decode_cabac_residual()
>> with the mb_xy only calculated for the "cat"s which actually need it?
>> That is one for cat=0/3 and one for the others
>> or
>> with the mb_xy calc moved in the if(cat=0/3) which use it?
>
> Actually, the compiled decode_cabac_residual is barely different at  
> all. gcc loads h->mb_xy, immediately spills it to the same place in  
> the stack, and the rest of the function is the same. I suppose  
> changing mb_xy to h->mb_xy for the DC cats might be an improvement.
>
> Diffing the asm shows:
> * less imull
> * hl_motion is different (mb_x load moved into an if())
> * write_back_intra_pred_mode() is inlined and removed
> * slightly different stack sizes (hard to know if that means anything)
>
> Making write_back_intra_pred_mode() always_inline doesn't change  
> cavlc time, but explains about half the speed change for cabac.
>
>> Iam not at all against putting mb_xy in the context but i somehow  
>> doubt
>> that this is what causes the speed difference.
>>
>> Also it would be very nice if you had a few more compilers installed
>> if you want to do more optimizations & benchmarks.
>
> Yeah, I'll make a linux vm for gcc 2/3/the other ccs - the 64bit one  
> I have now doesn't support so many compilers. If they don't inline  
> that after the patch (4.4 inlines it as it is) I'll mess with  
> always_inline on some of the smaller functions.

4.0 and 3.3 are the only ones that didn't already inline  
write_back_intra_pred_mode.

nm h264.o | wc -l (approximately)
      154 LLVM
      161 apple gcc 4.2
      162 gcc 4.1
      163 gcc 4.2
      175 gcc 4.4.0 20080509
      181 apple gcc 4.0
      188 gcc 2.95
      197 gcc 3.3
      216 icc (it added some extra symbols and never inlines much)

The only bad and commonly used one is 4.0 - 2.95 still inlines all the  
hot functions. Besides that, adding more av_always_inline is really  
unrelated to this patch, I think.




More information about the ffmpeg-devel mailing list