[FFmpeg-devel] [PATCH 1/2] lavc/pcm_tablegen: slight speedup of table generation

Ganesh Ajjanagadde gajjanag at mit.edu
Sun Jan 3 17:21:23 CET 2016


On Sun, Jan 3, 2016 at 6:13 AM, Michael Niedermayer
<michael at niedermayer.cc> wrote:
> On Wed, Dec 30, 2015 at 08:34:55PM -0800, Ganesh Ajjanagadde wrote:
>> This gets rid of some branches to speed up table generation slightly
>> (impact higher on mulaw than alaw). Tables are identical to before,
>> tested with FATE.
>>
>> Sample benchmark (Haswell, GNU/Linux+gcc):
>> old:
>>  313494 decicycles in build_alaw_table,    4094 runs,      2 skips
>>  315959 decicycles in build_alaw_table,    8190 runs,      2 skips
>>
>>  323599 decicycles in build_ulaw_table,    4095 runs,      1 skips
>>  318849 decicycles in build_ulaw_table,    8188 runs,      4 skips
>>
>> new:
>>  261902 decicycles in build_alaw_table,    4096 runs,      0 skips
>>  266519 decicycles in build_alaw_table,    8192 runs,      0 skips
>>
>>  209657 decicycles in build_ulaw_table,    4096 runs,      0 skips
>>  232656 decicycles in build_ulaw_table,    8192 runs,      0 skips
>>
>> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde at gmail.com>
>> ---
>>  libavcodec/pcm_tablegen.h | 24 ++++++++++++------------
>>  1 file changed, 12 insertions(+), 12 deletions(-)
>>
>> diff --git a/libavcodec/pcm_tablegen.h b/libavcodec/pcm_tablegen.h
>> index 1387210..7269977 100644
>> --- a/libavcodec/pcm_tablegen.h
>> +++ b/libavcodec/pcm_tablegen.h
>> @@ -87,21 +87,21 @@ static av_cold void build_xlaw_table(uint8_t *linear_to_xlaw,
>>  {
>>      int i, j, v, v1, v2;
>>
>> -    j = 0;
>> -    for(i=0;i<128;i++) {
>> -        if (i != 127) {
>> -            v1 = xlaw2linear(i ^ mask);
>> -            v2 = xlaw2linear((i + 1) ^ mask);
>> -            v = (v1 + v2 + 4) >> 3;
>> -        } else {
>> -            v = 8192;
>> -        }
>> -        for(;j<v;j++) {
>> +    j = 1;
>> +    linear_to_xlaw[8192] = mask;
>> +    for(i=0;i<127;i++) {
>> +        v1 = xlaw2linear(i ^ mask);
>> +        v2 = xlaw2linear((i + 1) ^ mask);
>> +        v = (v1 + v2 + 4) >> 3;
>> +        for(;j<v;j+=1) {
>> +            linear_to_xlaw[8192 - j] = (i ^ (mask ^ 0x80));
>>              linear_to_xlaw[8192 + j] = (i ^ mask);
>> -            if (j > 0)
>> -                linear_to_xlaw[8192 - j] = (i ^ (mask ^ 0x80));
>>          }
>>      }
>> +    for(;j<8192;j++) {
>> +        linear_to_xlaw[8192 - j] = (127 ^ (mask ^ 0x80));
>> +        linear_to_xlaw[8192 + j] = (127 ^ mask);
>> +    }
>
> removing the if(j>0) and replacing it by the direct init before
> is fine.
> do the other changes have any significnat speed effect ?
> i think they make the code harder to read and this is not really
> speed critical code

It is still "speed critical" enough for people to retain
CONFIG_HARDCODED_TABLES. My goal here is simple: I want to get cycle
count down enough so that hardcoded tables can be removed here.

If patch 2 is fine as is, i.e if the current code is fast enough, than
I will just commit with the removal of if(j > 0).

>
> [...]
> --
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> Avoid a single point of failure, be that a person or equipment.
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>


More information about the ffmpeg-devel mailing list