[FFmpeg-devel] [PATCH] Split reading and decoding of blocks in ALS

Justin Ruggles justin.ruggles
Tue Nov 24 00:24:00 CET 2009


Thilo Borgmann wrote:

> Michael Niedermayer schrieb:
>> On Mon, Nov 23, 2009 at 11:28:35AM +0100, Thilo Borgmann wrote:
>>> Michael Niedermayer schrieb:
>>>> On Sat, Nov 14, 2009 at 02:15:30PM +0100, Thilo Borgmann wrote:
>>>>> Hi,
>>>>>
>>>>> in order to support multi-channel correlation, reading and decoding of a
>>>>> block has to be separated. This patch introduces ALSBockData struct for
>>>>> that purpose.
>>>> [...]
>>>>> +/** Decodes the block data for a constant block
>>>>> + */
>>>>> +static void decode_const_block_data(ALSDecContext *ctx, ALSBlockData *bd)
>>>>> +{
>>>>> +    int smp;
>>>>> +
>>>>>      // write raw samples into buffer
>>>>> -    for (k = 0; k < block_length; k++)
>>>>> -        raw_samples[k] = const_val;
>>>>> +    for (smp = 0; smp < bd->block_length; smp++)
>>>>> +        bd->raw_samples[smp] = bd->const_val;
>>>>>  }
>>>> memset32(dst, val, len){
>>>> }
>>>> would possibly be faster due to not doing bd-> in the loop
>>> Ok will do...
>>>
>>>
>>>> [...]
>>>>> @@ -553,115 +579,132 @@
>>>>>      }
>>>>>  
>>>>>      if (get_bits1(gb))
>>>>> -        *shift_lsbs = get_bits(gb, 4) + 1;
>>>>> +        bd->shift_lsbs = get_bits(gb, 4) + 1;
>>>>>  
>>>>> -    store_prev_samples = (*js_blocks && raw_other) || *shift_lsbs;
>>>>> +    bd->store_prev_samples = (bd->js_blocks && bd->raw_other) || bd->shift_lsbs;
>>>>>  
>>>>> -
>>>>>      if (!sconf->rlslms) {
>>>>>          if (sconf->adapt_order) {
>>>>> -            int opt_order_length = av_ceil_log2(av_clip((block_length >> 3) - 1,
>>>>> +            int opt_order_length = av_ceil_log2(av_clip((bd->block_length >> 3) - 1,
>>>>>                                                  2, sconf->max_order + 1));
>>>>> -            opt_order            = get_bits(gb, opt_order_length);
>>>>> +            bd->opt_order        = get_bits(gb, opt_order_length);
>>>>>          } else {
>>>>> -            opt_order = sconf->max_order;
>>>>> +            bd->opt_order = sconf->max_order;
>>>>>          }
>>>>>  
>>>>> -        if (opt_order) {
>>>>> +        if (bd->opt_order) {
>>>>>              int add_base;
>>>>>  
>>>>>              if (sconf->coef_table == 3) {
>>>>>                  add_base = 0x7F;
>>>>>  
>>>>>                  // read coefficient 0
>>>>> -                quant_cof[0] = 32 * parcor_scaled_values[get_bits(gb, 7)];
>>>>> +                bd->quant_cof[0] = 32 * parcor_scaled_values[get_bits(gb, 7)];
>>>>>  
>>>>>                  // read coefficient 1
>>>>> -                if (opt_order > 1)
>>>>> -                    quant_cof[1] = -32 * parcor_scaled_values[get_bits(gb, 7)];
>>>>> +                if (bd->opt_order > 1)
>>>>> +                    bd->quant_cof[1] = -32 * parcor_scaled_values[get_bits(gb, 7)];
>>>>>  
>>>>>                  // read coefficients 2 to opt_order
>>>>> -                for (k = 2; k < opt_order; k++)
>>>>> -                    quant_cof[k] = get_bits(gb, 7);
>>>>> +                for (k = 2; k < bd->opt_order; k++)
>>>>> +                    bd->quant_cof[k] = get_bits(gb, 7);
>>>>>              } else {
>>>>>                  int k_max;
>>>>>                  add_base = 1;
>>>>>  
>>>>>                  // read coefficient 0 to 19
>>>>> -                k_max = FFMIN(opt_order, 20);
>>>>> +                k_max = FFMIN(bd->opt_order, 20);
>>>>>                  for (k = 0; k < k_max; k++) {
>>>>>                      int rice_param = parcor_rice_table[sconf->coef_table][k][1];
>>>>>                      int offset     = parcor_rice_table[sconf->coef_table][k][0];
>>>>> -                    quant_cof[k] = decode_rice(gb, rice_param) + offset;
>>>>> +                    bd->quant_cof[k] = decode_rice(gb, rice_param) + offset;
>>>>>                  }
>>>>>  
>>>>>                  // read coefficients 20 to 126
>>>>> -                k_max = FFMIN(opt_order, 127);
>>>>> +                k_max = FFMIN(bd->opt_order, 127);
>>>>>                  for (; k < k_max; k++)
>>>>> -                    quant_cof[k] = decode_rice(gb, 2) + (k & 1);
>>>>> +                    bd->quant_cof[k] = decode_rice(gb, 2) + (k & 1);
>>>>>  
>>>>>                  // read coefficients 127 to opt_order
>>>>> -                for (; k < opt_order; k++)
>>>>> -                    quant_cof[k] = decode_rice(gb, 1);
>>>>> +                for (; k < bd->opt_order; k++)
>>>>> +                    bd->quant_cof[k] = decode_rice(gb, 1);
>>>>>  
>>>>> -                quant_cof[0] = 32 * parcor_scaled_values[quant_cof[0] + 64];
>>>>> +                bd->quant_cof[0] = 32 * parcor_scaled_values[bd->quant_cof[0] + 64];
>>>>>  
>>>>> -                if (opt_order > 1)
>>>>> -                    quant_cof[1] = -32 * parcor_scaled_values[quant_cof[1] + 64];
>>>>> +                if (bd->opt_order > 1)
>>>>> +                    bd->quant_cof[1] = -32 * parcor_scaled_values[bd->quant_cof[1] + 64];
>>>>>              }
>>>>>  
>>>>> -            for (k = 2; k < opt_order; k++)
>>>>> -                quant_cof[k] = (quant_cof[k] << 14) + (add_base << 13);
>>>>> +            for (k = 2; k < bd->opt_order; k++)
>>>>> +                bd->quant_cof[k] = (bd->quant_cof[k] << 14) + (add_base << 13);
>>>> it might make sense to have commonly used variables like opt_order on the
>>>> stack intead of just accessable through a pointer
>>> Yes.
>>>
>>>> besides does this patch lead to any slowdown?
>>> The benchmark results showed a little slowdown but at least hardly
>>> noticable on the command line...
>>> We had this issue at the soc list. Ended up with me providing some
>>> assembler file grep results for checking inlining done by the compiler.
>> was the slowdown fixed or not?
>> "not" is bad
> 
> It was not. You blamed the many bd-> to be a reason for the slowdown and
> requested a check for wrong inlining before considering other
> possibilities to make this faster. I'm not able to see anything about
> the inlining stuff in assembler files so I provided a grep result you
> asked for.
> 
> See:
> https://lists.mplayerhq.hu/pipermail/ffmpeg-soc/2009-October/008364.html
> 
> I'm ready to fix that slowdown issue, all I need is a little advise
> about how to... the obvious way to use local variables to store
> bd->something and removing many dereferences that way did not yield a
> significant reduction of the slowdown.

grepping for "call" and removing duplicates gives:

combined:
        call    _parse_bs_info
        call    _decode_end
        call    _read_block_data
        call    _zero_remaining

separate:
        call    _parse_bs_info
        call    _decode_end
        call    _decode_var_block_data
        call    _zero_remaining
        call    _read_var_block_data
        call    _decode_blocks_ind

It seems that decode_blocks_ind is inlined in the combined version but
not in the separate version.

-Justin



More information about the ffmpeg-devel mailing list