[FFmpeg-soc] [PATCH] AMR-WB Decoder

Mon Sep 13 02:00:32 CEST 2010

On 09/13/2010 12:21 AM, Marcelo Galvão Póvoa wrote:
> On 12 September 2010 14:39, Vitor Sessak<vitor1001 at gmail.com>  wrote:
>> On 09/12/2010 04:34 PM, Marcelo Galvão Póvoa wrote:
>>>
>>> On 12 September 2010 06:28, Vitor Sessak<vitor1001 at gmail.com>    wrote:
>>>>
>>>> On 09/12/2010 12:25 AM, Marcelo Galvão Póvoa wrote:
>>>>>
>>>>> On 11 September 2010 18:54, Vitor Sessak<vitor1001 at gmail.com>      wrote:
>>>>>>
>>>>>> On 09/11/2010 06:08 PM, Marcelo Galvão Póvoa wrote:
>>>>>>>
>>>>>>> 2010/9/11 Marcelo Galvão Póvoa<marspeoplester at gmail.com>:
>>>>>>>>
>>>>>>>> On 11 September 2010 12:37, Vitor Sessak<vitor1001 at gmail.com>
>>>>>>>>   wrote:
>>>>>>>>>
>>>>>>>>> On 09/11/2010 04:47 PM, Marcelo Galvão Póvoa wrote:
>>>>>>>>>>
>>>>>>>>>> On 9 September 2010 16:11, Vitor Sessak<vitor1001 at gmail.com>
>>>>>>>>>>   wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 09/07/2010 02:05 AM, Marcelo Galvão Póvoa wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> On 6 September 2010 11:31, Vitor Sessak<vitor1001 at gmail.com>
>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 09/06/2010 02:46 AM, Marcelo Galvăo Póvoa wrote:
>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [...]
>>>>>>>>>
>>>>>>>>>>>>> Can celp_filters.c:ff_celp_circ_addf() simplify this?
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I already gave some thought to that but I couldn't figure out how
>>>>>>>>>>>> to
>>>>>>>>>>>> use ff_celp_circ_addf() there. I wrote the algorithm the simplest
>>>>>>>>>>>> way
>>>>>>>>>>>> I could think of.
>>>>>>>>>>>
>>>>>>>>>>> Try
>>>>>>>>>>>
>>>>>>>>>>>         for (i = 0; i<          AMRWB_SFR_SIZE; i++)
>>>>>>>>>>>             if (fixed_vector[i])
>>>>>>>>>>>                 ff_celp_circ_addf(buf, buf, coef, i,
>>>>>>>>>>> fixed_vector[i],
>>>>>>>>>>>                                   AMRWB_SFR_SIZE);
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> That's right indeed, thanks!
>>>>>>>>>
>>>>>>>>>> /*
>>>>>>>>>>   * AMR wideband decoder
>>>>>>>>>>   * Copyright (c) 2010 Marcelo Galvao Povoa
>>>>>>>>>>   *
>>>>>>>>>>   * This file is part of FFmpeg.
>>>>>>>>>>   *
>>>>>>>>>>   * FFmpeg is free software; you can redistribute it and/or
>>>>>>>>>>   * modify it under the terms of the GNU Lesser General Public
>>>>>>>>>>   * License as published by the Free Software Foundation; either
>>>>>>>>>>   * version 2.1 of the License, or (at your option) any later
>>>>>>>>>> version.
>>>>>>>>>>   *
>>>>>>>>>>   * FFmpeg is distributed in the hope that it will be useful,
>>>>>>>>>>   * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>>>>>>>>   * MERCHANTABILITY or FITNESS FOR A particular PURPOSE.  See the
>>>>>>>>>> GNU
>>>>>>>>>>   * Lesser General Public License for more details.
>>>>>>>>>>   *
>>>>>>>>>>   * You should have received a copy of the GNU Lesser General Public
>>>>>>>>>>   * License along with FFmpeg; if not, write to the Free Software
>>>>>>>>>>   * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
>>>>>>>>>> 02110-1301 USA
>>>>>>>>>>   */
>>>>>>>>>>
>>>>>>>>>> #include "libavutil/lfg.h"
>>>>>>>>>>
>>>>>>>>>> #include "avcodec.h"
>>>>>>>>>> #include "get_bits.h"
>>>>>>>>>> #include "lsp.h"
>>>>>>>>>> #include "celp_math.h"
>>>>>>>>>> #include "celp_filters.h"
>>>>>>>>>> #include "acelp_filters.h"
>>>>>>>>>> #include "acelp_vectors.h"
>>>>>>>>>> #include "acelp_pitch_delay.h"
>>>>>>>>>>
>>>>>>>>>> #include "amrwbdata.h"
>>>>>>>>>
>>>>>>>>> No need to include amr.h?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Oops, fixed
>>>>>>>>
>>>>>>>
>>>>>>> Sorry, I forgot to define the table type before the include. Fixed
>>>>>>
>>>>>> Just one last think I just saw:
>>>>>>
>>>>>>
>>>>>>>     ctx->fr_cur_mode = unpack_bitstream(ctx, buf, buf_size);
>>>>>>>     expected_fr_size = ((cf_sizes_wb[ctx->fr_cur_mode] + 7)>>      3) +
>>>>>>> 1;
>>>>>>>
>>>>>>>     if (buf_size<      expected_fr_size) {
>>>>>>>         av_log(avctx, AV_LOG_ERROR,
>>>>>>>             "Frame too small (%d bytes). Truncated file?\n", buf_size);
>>>>>>>         *data_size = 0;
>>>>>>>         return buf_size;
>>>>>>>     }
>>>>>>
>>>>>> This check is not useful like that. Imagine that buf_size == 1. Then,
>>>>>> unpack_bitstream will try to read buf[2] and will segfault. This check
>>>>>> should be done before the call of ff_amr_bit_reorder().
>>>>>>
>>>>>
>>>>> I thought about that, but how I would calculate expected_fr_size if I
>>>>> don't know of which mode the frame is? One solution would be doing
>>>>> this check inside ff_amr_bit_reorder() just after decoding the header.
>>>>
>>>> Another solution would be to get rid of the unpack_bitstrem() function
>>>> and
>>>> move its code to amrwb_decode_frame(). It is already almost a wrapper
>>>> around
>>>> ff_amr_bit_reorder(). It would also allows to avoid checking
>>>> (ctx->fr_cur_mode<    MODE_SID) twice...
>>>>
>>>
>>> Interesting, but how about instead of moving all the code to
>>> amrwb_decode_frame(), we keep a function to decode the header since
>>> there are more than one type of it. I wrote the simpler (and most
>>> common I guess) "MIME/storage" header format but in case someone wants
>>> to implement the others this way would be easier. See the patch I
>>> attached.
>>
>> I like it. Just a last nitpick:
>>
>>> +    header_size      = decode_mime_header(ctx, buf);
>>> +    expected_fr_size = ((cf_sizes_wb[ctx->fr_cur_mode] + 7)>>  3) + 1;
>>> +
>>> +    if (buf_size<  expected_fr_size) {
>>> +        av_log(avctx, AV_LOG_ERROR,
>>> +            "Frame too small (%d bytes). Truncated file?\n", buf_size);
>>> +        *data_size = 0;
>>> +        return buf_size;
>>> +    }
>>> +
>>
>>> +    if (!ctx->fr_quality || ctx->fr_cur_mode>  MODE_SID) {
>>> +        av_log(avctx, AV_LOG_ERROR, "Encountered a bad or corrupted
>>> frame\n");
>>> +    }
>>
>> Is there any chance of decoding that frame if ctx->fr_cur_mode>  MODE_SID?
>> Better to bail out with an error in this case.
>>
>>> +    if (ctx->fr_cur_mode<  MODE_SID) { /* Normal speech frame */
>>> +        ff_amr_bit_reorder((uint16_t *)&ctx->frame, sizeof(AMRWBFrame),
>>> +            buf + header_size,
>>> amr_bit_orderings_by_mode[ctx->fr_cur_mode]);
>>> +    }
>>> +    else if (ctx->fr_cur_mode == MODE_SID) { /* Comfort noise frame */
>>> +        av_log_missing_feature(avctx, "SID mode", 1);
>>> +        return -1;
>>> +    }
>>
>> I think it all be clearer in the following way:
>>
>>     header_size      = decode_mime_header(ctx, buf);
>>     expected_fr_size = ((cf_sizes_wb[ctx->fr_cur_mode] + 7)>>  3) + 1;
>>
>>     if (buf_size<  expected_fr_size) {
>>         av_log(avctx, AV_LOG_ERROR,
>>             "Frame too small (%d bytes). Truncated file?\n", buf_size);
>>         *data_size = 0;
>>         return buf_size;
>>     }
>>
>>     if (!ctx->fr_quality || ctx->fr_cur_mode>  MODE_SID)
>>         av_log(avctx, AV_LOG_ERROR, "Encountered a bad or corrupted
>> frame\n");
>>
>>     if (ctx->fr_cur_mode == MODE_SID) /* Comfort noise frame */
>>         av_log_missing_feature(avctx, "SID mode", 1);
>>
>>     if (ctx->fr_cur_mode>= MODE_SID)
>>         return -1;
>>
>>    ff_amr_bit_reorder((uint16_t *)&ctx->frame, sizeof(AMRWBFrame),
>>         buf + header_size, amr_bit_orderings_by_mode[ctx->fr_cur_mode]);
>>
>> In that way, the ff_amr_bit_reorder() call is not in the middle of the error
>> handling. Finally, after fixing this, if Rob agree, I think you can move
>> this discussion to ffmpeg-devel.
>>
>
> Fixed.
>
> Ok, there are still some of those "XXX" notes however, did you check them?

Most of them looks like differences between the spec and the ref 
decoder. Such things happens all the time, and one should follow the ref 
decoder in those cases. About voice_factor(), I understand that you 
tested this function thoroughly when tracking that old quality-affecting 
bug and found no difference from the reference. Finally, about the 
truncation of the excitation, this was done in AMRNB more to reduce the 
differences due to our implementation been floating-point based while 
the reference being fixed-point. But for AMRNB those differences were 
causing audible artifacts. If you want to be sure, you can check if the 
stddev with the pre-encoding input increases if you add such a 
truncation (this time using the right shift of -190).

-Vitor