[FFmpeg-soc] [PATCH] AMR-WB Decoder

Thu Sep 9 02:50:40 CEST 2010

On 8 September 2010 06:54, Vitor Sessak <vitor1001 at gmail.com> wrote:
> On 09/07/2010 03:21 AM, Marcelo Galvão Póvoa wrote:
>>
>> Hello,
>>
>> On 6 September 2010 10:13, Ronald S. Bultje<rsbultje at gmail.com>  wrote:
>>>
>>> Hi,
>>>
>>> On Mon, Sep 6, 2010 at 5:54 AM, Vitor Sessak<vitor1001 at gmail.com>  wrote:
>>>>
>>>> On 09/06/2010 02:46 AM, Marcelo Galvão Póvoa wrote:
>>>>>
>>>>> Ok, fortunately I've found the bug!
>>>>>
>>>>> It was just a MIN_ISF_SPACING parameter which I extracted from the
>>>>> reference code but was unsure about it's Q level. After some time, I
>>>>> thought I have it figured out but I was wrong. Now I know the answer
>>>>> the hard way...
>>>>>
>>>>> The clipping and the sharp peaks are gone, the waveforms are really
>>>>> close now!
>>>>
>>>> That's great news!
>>>>
>>>>> Also, the stddev against the reference decoder decreased a
>>>>> lot (it was ~884 before):
>>>>> all_men.awb stddev:   51.72 PSNR: 62.05 MAXDIFF: 1089 bytes:   473600/
>>>>> 473600
>>>>
>>>> stddev of 51 looks pretty good to me for this case.
>>>
>>> Maxdiff of 1089 looks like a lot to me, with a low stddev that
>>> suggests that one particular part is off. Can you trace which part is
>>> off and why (phase shift vs. actual bug)?
>>>
>>
>> Sorry, but can you suggest a way of doing it?
>
> Your method of inverting one sample and summing in audacity would work on
> showing where it is happening (some point will have an amplitude of 1089).
> To know if is a phase shift or a bug, you will have to compare visually both
> waves.
>

I don't know exactly how to detect a phase shift this way but the
difference waveform I obtained [1] has some peaks at the sibilant
parts I think. Probably just where the high band is louder.

>> Also, what do MAXDIFF
>> and the "2" at the end of the command line mean?
>
> MAXDIFF is the biggest difference among two samples. The "2" at the end of
> the command line says to read two-byte integers (16-bit). If you were
> comparing video pixels, you would use "1". You can also see the source of
> tiny_psnr.c, it is pretty simple.
>
>> This sample have long silence parts and I'm comparing my floating
>> point implementation to the reference 16-bit fixed point. How close
>> you think they should be?
>
> A very small stddev (< 1.00) would assure there is no bug in your decoder,
> but the fact that it is large does not means there is one.
>
> I suggest you do the following test:
>
> a) Get a biggish file (> 30 minutes)
> b) Convert it to the a WAV with the sample rate and number of channel the
> AMR encoder takes as input
> c) Encode the file obtained in (b) it with the reference encoder
> d) Decode the file obtained in (c) with the reference decoder
> e) Decode the file obtained in (c) with ffamrwb
> f) Compare the stddev of files obtained in (b) and (d) with that of (b) and
> (e). If file decoded with ffamr are as close to the original as that decoded
> with the reference decoder, it's good.
>

Results:
$ ./tests/tiny_psnr ~/ref_pod.wav ~/orig_pod.wav 2
stddev: 2599.69 PSNR: 28.03 MAXDIFF:39592 bytes: 76480640/ 76480660
$ ./tests/tiny_psnr ~/my_pod.wav ~/orig_pod.wav 2
stddev: 2600.02 PSNR: 28.03 MAXDIFF:39653 bytes: 76480640/ 76480660
$ ./tests/tiny_psnr ~/my_pod.wav ~/ref_pod.wav 2
stddev:   95.62 PSNR: 56.72 MAXDIFF: 5529 bytes: 76480640/ 76480640

By removing the fractional part of the excitation as AMR-NB does, the
result was slightly better:
$ ./tests/tiny_psnr ~/my_pod2.wav ~/orig_pod.wav 2
stddev: 2589.73 PSNR: 28.06 MAXDIFF:39639 bytes: 76480640/ 76480660

I think these are good news! It was much closer than I would expect.
The test file was a 39:50 long podcast transcoded from mp3.

[1] http://www.students.ic.unicamp.br/~ra082115/all_men_diff.wav

-- 
Marcelo