[FFmpeg-devel] [Issue 664] [PATCH] Fix AAC PNS Scaling

Tue Oct 7 19:48:40 CEST 2008

On Tue, Oct 7, 2008 at 1:01 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Mon, Oct 06, 2008 at 11:30:56PM -0400, Alex Converse wrote:
>> On Mon, Oct 6, 2008 at 10:20 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> > On Mon, Oct 06, 2008 at 09:52:51PM -0400, Alex Converse wrote:
>> >> On Mon, Oct 6, 2008 at 9:39 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> >> > On Mon, Oct 06, 2008 at 08:52:06PM -0400, Alex Converse wrote:
>> >> >> On Mon, Oct 6, 2008 at 8:22 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
>> >> >> > On Mon, Oct 06, 2008 at 03:46:55PM -0400, Alex Converse wrote:
>> >> >> >> On Tue, Sep 30, 2008 at 11:25 PM, Alex Converse <alex.converse at gmail.com> wrote:
>> >> >> >> > Hi,
>> >> >> >> >
>> >> >> >> > The attached patch should fix AAC PNS scaling [Issue 664]. It will not
>> >> >> >> > fix PNS conformance.
>> >> >> >>
>> >> >> >> Here's a sightly updated patch (sqrtf instead of sqrt). The current
>> >> >> >> method of PNS will never conform because sample energy simpl doesn't
>> >> >> >> converge to it's mean fast enough. The spec explicitly states that PNS
>> >> >> >> should be normalized per band. Not doing it that way causes PNS-1
>> >> >> >> conformance to fail for 45 bands.
>> >> >> >
>> >> >> > elaborate, what part of the spec says what?
>> >> >>
>> >> >> 14496-3:2005/4.6.13.3 p184 (636 of the PDF)
>> >> >>
>> >> >> > what is PNS-1 conformance ?
>> >> >>
>> >> >> 14496-4:2004/6.6.1.2.2.4 p94 (102 PDF)
>> >> >> 14496-5/conf_pns folder
>> >> >
>> >> > do you happen to have URLs for these?
>> >> >
>> >> >
>> >> >>
>> >> >> > the part that feels a little odd is normalizing random data on arbitrary
>> >> >> > and artificial bands, this simply makes things non random.
>> >> >> > This would be most extreemly vissibly with really short bands of 1 or 2
>> >> >> > coeffs ...
>> >> >> > another way to see the issue is to take 100 coeffs and split them into
>> >> >> > 10 bands, if you now normalize litterally these 10 bands then the 100
>> >> >> > coeffs will no longer be random at all, they will be significantly
>> >> >> > correlated. This may be inaudible, it may or may not sound better and
>> >> >> > may or may not be what the spec wants but still it feels somewhat wrong
>> >> >> > to me ...
>> >> >> >
>> >> >>
>> >> >> Ralph Sperschneider from FhG/MPEG spelled it all out:
>> >> >> http://lists.mpegif.org/pipermail/mp4-tech/2003-June/002358.html
>> >> >>
>> >> >> I'm not saying it's a smart way to design a CODEC but it's what MPEG picked.
>> >> >
>> >> > yes, so i guess the most sensible solution would be to precalculate
>> >> > a second of noise normalized to the band sizes and randomly pick from
>> >> > these.
>> >> >
>> >>
>> >> That sounds messy and overly complex. What's wrong with doing it the
>> >> way MPEG tells us to?
>> >
>> > that is what mpeg tells us to do, they do not mandate any specific way
>> > to calculate random values. And i do not like doing sqrt() per band ...
>> >
>>
>> One sqrtf() per band isn't that intense. To stick with the current
>> approach we still need to do a sqrt on the band size. We could even
>> use one of those fast 1/sqrt algorithms.
>
> we do not need to do a sqrt() on the band size, not in the current
> approuch and not with the other variant. A small LUT will do fine
> considering the small number of band sizes. And even that is not
> needed in all cases ...
>
>
>>
>> >
>> >> Or just sticking with what we have it sounds
>> >> fine and is fast.
>> >
>> > well if its conformant thats fine, but it seemed to me that it is not
>>
>> It isn't but it seemed to me that strict conformance is not always
>> required for ffmpeg (RE: block switching)?
>
> the way i understood the spec when i read the relevant pat last time is
> that this specific kind of block switching is invalid, thus outside the
> spec and no special behavior is mandated.
>
>
>>
>> > though iam still waiting for a quote from the spec to confirm this.
>> >
>>
>> I sent you section and page numbers. It's not a one liner. The [PNS-1]
>> section of 14496-4:2004/6.6.1.2.2.4 p94 (102 PDF) is fairly straight
>> forward.
>
> I will not pay iso for a document just to confirm the claim.
> Luckily though ive found 14496-3:2005 on my hd (sorry, i do have a mess)
> It says:
> The noise substitution decoding process for one channel is defined by the following pseudo code:
> nrg = global_gain - NOISE_OFFSET - 256;
> for (g=0; g<num_window_groups; g++) {
>   /* Decode noise energies for this group */
>   for (sfb=0; sfb<max_sfb; sfb++) {
>        if (is_noise(g,sfb)) {
>           nrg += dpcm_noise_nrg[g][sfb];
>           noise_nrg[g][sfb] = nrg;
>        }
>    }
>     /* Do perceptual noise substitution decoding */
>     for (b=0; b<window_group_length[g]; b++) {
>         for (sfb=0; sfb<max_sfb; sfb++) {
>             if (is_noise(g,sfb)) {
>                 size = swb_offset[sfb+1] - swb_offset[sfb];
>                 /* Generate random vector */
>                 gen_rand_vector( &spec[g][b][sfb][0], size );
>                 nrg=0;
>                 for (i=0; i<size; i++) {
>                    nrg+= spec[g][b][sfb][i] * spec[g][b][sfb][i];
>                 }
>                 sqrt_nrg = sqrt (nrg);
>                 scale *= 2.0^(0.25*noise_nrg [g][sfb]) / sqrt_nrg;
>                 /* scale random vector to desired target energy */
>                 for (i=0; i<size; i++) {
>                    spec[g][b][sfb][i] *= scale;
>                 }
>             }
>         }
>     }
> }
> The constant NOISE_OFFSET is used to adapt the range of average noise energy values to the usual range of
> scalefactors and has a value of 90.
> The function gen_rand_vector( addr, size ) generates a vector of length <size> with signed random values whereas
> their sum of squares is unequal to zero. A suitable random number generator can be realized using one
> multiplication/accumulation per random value.
> ---------------
>
> So it seems the spec has been changed to use the more idiotic normalization.
> And that confirms your claim.
>
>
>>
>>
>> >
>> >>
>> >> >
>> >> >>
>> >> >> >
>> >> >> >>
>> >> >> >> However with this patch there appears to be no audible difference
>> >> >> >> between the approaches.
>> >> >> >
>> >> >> >> I don't know the ideal mean energy so I'm
>> >> >> >> using the sample mean energy for 1024 iterations of the LCG.
>> >> >> >
>> >> >> > i assume cpu cycles got more expensive if people can only spare a few
>> >> >> > thousand
>> >> >> >
>> >> >>
>> >> >> How many do you propose then? I tried running it over the whole period
>> >> >> and the result seemed low, I think it's a classic case of adding too
>> >> >> many equal size floating point values.
>> >> >
>> >> > real mathematicans tend not to use floats that are bound to rounding errors
>> >> >
>> >> > try:
>> >> > for(i=min; i<=max; i++){
>> >> >    uint64_t a= i*i;
>> >> >    var += a;
>> >> >    if(var < a){
>> >> >            var2++;
>> >> >    }
>> >> > }
>> >> >
>> >>
>> >> That will only hold 5 or 6 big values 2^64/((2^31)^2) = 4.
>> >
>> > read the code again please, especially var2
>> > also, just to make sure you have the types correct
>> >
>> > int min= -(1<<31)+1;
>> > int max=  (1<<31)-1;
>> > int64_t i;
>> > uint64_t var=0;
>> > uint64_t var2=0;
>> >
>>
>> Why are we leaving out (int)0x80000000?
>
> because otherwise the mean would be -0.5.

But isn't it in the output space? We're trying to find the energy of
one period of the LCG no? If we are going to ignore it shouldn't the
LCG be written to explicitly reject it? And adding logic to explictly
reject it seems nutty for such a small deviation in the mean. It has
far more significance in energy.

[...]

--Alex