[FFmpeg-soc] [soc]: r5419 - als/alsdec.c

Thilo Borgmann thilo.borgmann at googlemail.com
Thu Oct 22 11:29:55 CEST 2009


Thilo Borgmann schrieb:
> Michael Niedermayer schrieb:
>> On Wed, Oct 21, 2009 at 12:33:21PM +0200, Thilo Borgmann wrote:
>>> Michael Niedermayer schrieb:
>>>> On Tue, Oct 20, 2009 at 03:00:40PM +0200, thilo.borgmann wrote:
>>>>> Author: thilo.borgmann
>>>>> Date: Tue Oct 20 15:00:40 2009
>>>>> New Revision: 5419
>>>>>
>>>>> Log:
>>>>> Splits reading of block data and decoding of block data.
>>>>> Introduces ALSBlockData struct.
>>>> You are missing the "why" part, that should be explained in the commit
>>>> message
>>> Yes, sorry.
>>>
>>>> also this needs a benchmark as there are many additional dereferences
>>>> added
>>> It is a necessary evil to support MCC. If it would be faster the "old"
>>> way for non-MCC files, would this reason to have both, a split read &
>>> decode function pair and an all-in-one function?
>> I think a benchmark is usefull to judge if we should spend time thinking
>> about alternatives to the many dereferences or not
>>
> 
> The combined (old) function:
> 
> 848450 dezicycles in combined, 1 runs, 0 skips
> 436625 dezicycles in combined, 2 runs, 0 skips
> 422562 dezicycles in combined, 4 runs, 0 skips
> 251822 dezicycles in combined, 8 runs, 0 skips
> 275631 dezicycles in combined, 16 runs, 0 skips
> 244726 dezicycles in combined, 32 runs, 0 skips
> 206217 dezicycles in combined, 64 runs, 0 skips
> 179422 dezicycles in combined, 119 runs, 9 skips
> 179422 dezicycles in combined, 119 runs, 137 skips
> 
> The separate (new) functions:
> 
> 984100 dezicycles in separate, 1 runs, 0 skips
> 499555 dezicycles in separate, 2 runs, 0 skips
> 534420 dezicycles in separate, 4 runs, 0 skips
> 369905 dezicycles in separate, 8 runs, 0 skips
> 340817 dezicycles in separate, 16 runs, 0 skips
> 280026 dezicycles in separate, 32 runs, 0 skips
> 263883 dezicycles in separate, 64 runs, 0 skips
> 231872 dezicycles in separate, 119 runs, 9 skips
> 231872 dezicycles in separate, 119 runs, 137 skips
> 
> This is a 30% difference which makes me think to try these alternatives.
> 
> What comes into my mind would be to use local copies, thus dereferencing
> the field of *bd just twice. One at the top and one at the bottom of the
> function.
> 

I tested using local copies instead of dereferencing:

10823450 dezicycles in local copies, 1 runs, 0 skips
6122845 dezicycles in local copies, 2 runs, 0 skips
4420565 dezicycles in local copies, 4 runs, 0 skips
3557323 dezicycles in local copies, 8 runs, 0 skips
2553006 dezicycles in local copies, 16 runs, 0 skips
2554690 dezicycles in local copies, 32 runs, 0 skips
2424406 dezicycles in local copies, 64 runs, 0 skips
2535575 dezicycles in local copies, 128 runs, 0 skips
2242664 dezicycles in local copies, 256 runs, 0 skips

69085900 dezicycles in dereferences, 1 runs, 0 skips
35455330 dezicycles in dereferences, 2 runs, 0 skips
19061607 dezicycles in dereferences, 4 runs, 0 skips
10732197 dezicycles in dereferences, 8 runs, 0 skips
6036062 dezicycles in dereferences, 16 runs, 0 skips
3893601 dezicycles in dereferences, 32 runs, 0 skips
3105304 dezicycles in dereferences, 64 runs, 0 skips
2732319 dezicycles in dereferences, 128 runs, 0 skips
2333672 dezicycles in dereferences, 256 runs, 0 skips

That's a 4% gain so I think local copies don't pay off...

Other alternatives?

-Thilo


More information about the FFmpeg-soc mailing list