[FFmpeg-soc] [PATCH] AMR-WB Decoder

Vitor Sessak vitor1001 at gmail.com
Thu Sep 2 23:11:35 CEST 2010


On 09/02/2010 09:30 PM, Marcelo Galvão Póvoa wrote:
> Sorry about the delay to reply
>
> On 30 August 2010 12:39, Vitor Sessak<vitor1001 at gmail.com>  wrote:
>    
>> On 08/30/2010 04:44 PM, Marcelo Galvão Póvoa wrote:
>>      
>>> On 30 August 2010 09:31, Vitor Sessak<vitor1001 at gmail.com>    wrote:
>>>
>>>        
>>>> On 08/27/2010 11:13 PM, Marcelo Galvão Póvoa wrote:
>>>>
>>>>          
>>>>> On 26 August 2010 18:55, Vitor Sessak<vitor1001 at gmail.com>      wrote:
>>>>>
>>>>>
>>>>>            
>>>>>> Marcelo Galvão Póvoa wrote:
>>>>>>
>>>>>>
>>>>>>              
>>>>>>> Fixed excitation array incorrect length.
>>>>>>>
>>>>>>> Should now be applicable cleanly to the latest ffmpeg revision
>>>>>>> (fd151a5f8bd152c456a).
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>> First look:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>> [...]
>>>>
>>>>
>>>>          
>>>>>>> +/**
>>>>>>> + * Convert an ISF vector into an ISP vector
>>>>>>> + *
>>>>>>> + * @param[in] isf                  Isf vector
>>>>>>> + * @param[out] isp                 Output isp vector
>>>>>>> + * @param[in] size                 Isf/isp size
>>>>>>> + */
>>>>>>> +static void isf2isp(const float *isf, double *isp, int size)
>>>>>>> +{
>>>>>>> +    int i;
>>>>>>> +
>>>>>>> +    for (i = 0; i<      size - 1; i++)
>>>>>>> +        isp[i] = cos(2.0 * M_PI * isf[i]);
>>>>>>> +
>>>>>>> +    isp[size - 1] = cos(4.0 * M_PI * isf[size - 1]);
>>>>>>> +}
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>> Almost duplication of amrnbdec.c:lsf2lsp().
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>>> Yes, but this last element doubling diverges between AMR-NB and AMR-WB
>>>>> reference sources.
>>>>>
>>>>>
>>>>>            
>>>> Can't you just do "isp[size - 1] *=  2" or it is used elsewhere?
>>>>
>>>>
>>>>          
>>> It's not the last element output that is doubled but the input. Since
>>> it is a cosine, I would have to calculate it again for the last
>>> element I guess.
>>>
>>>        
>> I mean doubling the last element of the input before calling the function,
>> if it doesn't make the rest of the code more complex.
>>
>>      
> Well, I only call this function in two places, so it seems to worth
> reusing this way
>    

nice

>>>> [...]
>>>>
>>>>
>>>>          
>>>>>>> +        for(j = i - 1; j>      1; j--)
>>>>>>> +            f[j] += f[j - 1] * val + f[j - 2];
>>>>>>> +        f[1] += 0.25 * val;
>>>>>>> +    }
>>>>>>> +}
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>> Hmm, can't this function be replaced by ff_lsp2polyf() with some
>>>>>> rescaling?
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>>> Actually I realize now that the difference between these functions is
>>>>> just the entire output scaled down by 0.25. That was silly because I
>>>>> was scaling by 4.0 after this function. It is weird why the reference
>>>>> decoder does this process. Maybe it should worth someone checking it
>>>>> out to be sure about this.
>>>>>
>>>>>
>>>>>            
>>>> I suppose you checked that after this change the output of your decoder
>>>> didn't change, no?
>>>>
>>>>
>>>>          
>>> Actually I cannot compare the output bitwise because the high band
>>> involves a random excitation, I am afraid that it is not the same for
>>> each run (or it is?).
>>>        
>> Yes, you always send 1 as seed, so the prng output is always the same (and
>> this is a good thing), but since you use floats output might not be
>> bitexact. I suggest you do:
>>
>> ./tests/tiny_psnr output_ref.wav output_patched.wav 2
>>
>> And see if stddev is ~ 0.1
>>
>>      
> By "output_ref.wav" you mean the earlier output from my decoder or the
> fixed-point reference implementation? I did both tests comparing my
> recent decoder with a 1kHz sine wave to:
> - 3GPP:
> stddev:  340.60 PSNR: 45.68 MAXDIFF: 1130 bytes:   959360/   959360
> - Mine before the reviews:
> stddev:    0.00 PSNR:999.99 MAXDIFF:    0 bytes:   959360/   959360
>
> I don't know how to analyze all these values
>    

stddev = 1/N sqrt(sum[i=0,...,N-1] (file1[i] - file2[i])*(file1[i] - 
file2[i]))

PSNR = log(stddev)

thus,

stddev == 0 means the files are identical

stddev < 0.04 means the files are really close, just a rare +- 1 difference

stddev < 1 means the files are very close, no audible difference

But I suggest you do not test with a sine-wave, it might amplify some 
problems of minor differences adding up to make a big difference.

>> [...]
>>
>>      
>>> +/**
>>>        
>>>>> + * Parses a speech frame, storing data in the Context
>>>>> + *
>>>>> + * @param[in,out] ctx              The context
>>>>> + * @param[in] buf                  Pointer to the input buffer
>>>>> + * @param[in] buf_size             Size of the input buffer
>>>>> + *
>>>>> + * @return The frame mode
>>>>> + */
>>>>> +static enum Mode unpack_bitstream(AMRWBContext *ctx, const uint8_t
>>>>> *buf,
>>>>> +                                  int buf_size)
>>>>> +{
>>>>> +    GetBitContext gb;
>>>>> +    uint16_t *data;
>>>>> +
>>>>> +    init_get_bits(&gb, buf, buf_size * 8);
>>>>> +
>>>>> +    /* decode frame header (1st octet) */
>>>>> +    skip_bits(&gb, 1);  // padding bit
>>>>> +    ctx->fr_cur_mode  = get_bits(&gb, 4);
>>>>> +    ctx->fr_quality   = get_bits1(&gb);
>>>>> +    skip_bits(&gb, 2);  // padding bits
>>>>> +
>>>>> +    // XXX: We are using only the "MIME/storage" format
>>>>> +    // used by libopencore. This format is simpler and
>>>>> +    // does not carry the auxiliary information of the frame
>>>>> +
>>>>> +    data = (uint16_t *)&ctx->frame;
>>>>> +    memset(data, 0, sizeof(AMRWBFrame));
>>>>> +    buf++;
>>>>> +
>>>>> +    if (ctx->fr_cur_mode<      MODE_SID) { /* Normal speech frame */
>>>>> +        const uint16_t *perm =
>>>>> amr_bit_orderings_by_mode[ctx->fr_cur_mode];
>>>>> +        int field_size;
>>>>> +
>>>>> +        while ((field_size = *perm++)) {
>>>>> +            int field = 0;
>>>>> +            int field_offset = *perm++;
>>>>> +            while (field_size--) {
>>>>> +               uint16_t bit_idx = *perm++;
>>>>> +               field<<= 1;
>>>>> +               /* The bit index inside the byte is reversed (MSB->LSB)
>>>>> */
>>>>> +               field |= BIT_POS(buf[bit_idx>>      3], 7 - (bit_idx&
>>>>>   7));
>>>>> +            }
>>>>> +            data[field_offset] = field;
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>> +    return ctx->fr_cur_mode;
>>>>> +}
>>>>>
>>>>>
>>>>>            
>>>> Can't you reorder the bit indexes in a way it is in the LSB->MSB so you
>>>> can
>>>> use just the usual bitstream functions?
>>>>
>>>>
>>>>          
>>> Which bitstream functions?
>>>        
>> Those in libavcodec/get_bits.h (get_bits(), etc).
>>
>>      
> I am following the unpack_bitstream() from AMR-NB
>    

Sorry, I remembered wrong how unpack_bitstream() was coded in amrnb.

>>> If the tables were in LSB->MSB order, it
>>> would be the same algorithm as in AMR-NB, the only difference is that
>>> I do (7 - index) to reverse the order.
>>>
>>>        
>> Can't you change the tables in a way you can reuse the unpack function of
>> AMRNB?
>>
>>      
> I guess so, after all it's an interesting method of reordering and
> assigning the variables
>
> So, until now we have three functions that can be reused and would
> need to move their code to an appropriate header file:
>    

Not only the header, but the actual code should be moved to the 
associated .c files.

> - sipr.c:lsp2lpc_sipr() ->  lsp.h: Do you have a suggestion to a new
> name for this? There is already a function called ff_acelp_lsp2lpc()
>    

I suggest some AMR-centric name like ff_amrwb_lsp2lpc(). SIPR probably 
copied from AMRWB and not the other way around.

> - amrnbdec.c:lsf2lsp() ->  lsp.h
>    

Looks reasonable. I suggest ff_acelp_lsf2lspd() as name.

> - internal part of amrnbdec.c:unpack_bitstream() ->  get_bits.h (I guess)
>    

I think the internal part of unpack_bitstream() is still too AMR 
specific to be in such widely used file. Unless someone came up with a 
better idea, I'd say to create a amr.c for code shared between both.

> I will try to create patches for these modifications and send to ffmpeg-devel
>    

Great!

-Vitor


More information about the FFmpeg-soc mailing list