[FFmpeg-devel] [Issue 664] [PATCH] Fix AAC PNS Scaling

Tue Oct 7 19:01:35 CEST 2008

On Mon, Oct 06, 2008 at 11:30:56PM -0400, Alex Converse wrote:
> On Mon, Oct 6, 2008 at 10:20 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> > On Mon, Oct 06, 2008 at 09:52:51PM -0400, Alex Converse wrote:
> >> On Mon, Oct 6, 2008 at 9:39 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> >> > On Mon, Oct 06, 2008 at 08:52:06PM -0400, Alex Converse wrote:
> >> >> On Mon, Oct 6, 2008 at 8:22 PM, Michael Niedermayer <michaelni at gmx.at> wrote:
> >> >> > On Mon, Oct 06, 2008 at 03:46:55PM -0400, Alex Converse wrote:
> >> >> >> On Tue, Sep 30, 2008 at 11:25 PM, Alex Converse <alex.converse at gmail.com> wrote:
> >> >> >> > Hi,
> >> >> >> >
> >> >> >> > The attached patch should fix AAC PNS scaling [Issue 664]. It will not
> >> >> >> > fix PNS conformance.
> >> >> >>
> >> >> >> Here's a sightly updated patch (sqrtf instead of sqrt). The current
> >> >> >> method of PNS will never conform because sample energy simpl doesn't
> >> >> >> converge to it's mean fast enough. The spec explicitly states that PNS
> >> >> >> should be normalized per band. Not doing it that way causes PNS-1
> >> >> >> conformance to fail for 45 bands.
> >> >> >
> >> >> > elaborate, what part of the spec says what?
> >> >>
> >> >> 14496-3:2005/4.6.13.3 p184 (636 of the PDF)
> >> >>
> >> >> > what is PNS-1 conformance ?
> >> >>
> >> >> 14496-4:2004/6.6.1.2.2.4 p94 (102 PDF)
> >> >> 14496-5/conf_pns folder
> >> >
> >> > do you happen to have URLs for these?
> >> >
> >> >
> >> >>
> >> >> > the part that feels a little odd is normalizing random data on arbitrary
> >> >> > and artificial bands, this simply makes things non random.
> >> >> > This would be most extreemly vissibly with really short bands of 1 or 2
> >> >> > coeffs ...
> >> >> > another way to see the issue is to take 100 coeffs and split them into
> >> >> > 10 bands, if you now normalize litterally these 10 bands then the 100
> >> >> > coeffs will no longer be random at all, they will be significantly
> >> >> > correlated. This may be inaudible, it may or may not sound better and
> >> >> > may or may not be what the spec wants but still it feels somewhat wrong
> >> >> > to me ...
> >> >> >
> >> >>
> >> >> Ralph Sperschneider from FhG/MPEG spelled it all out:
> >> >> http://lists.mpegif.org/pipermail/mp4-tech/2003-June/002358.html
> >> >>
> >> >> I'm not saying it's a smart way to design a CODEC but it's what MPEG picked.
> >> >
> >> > yes, so i guess the most sensible solution would be to precalculate
> >> > a second of noise normalized to the band sizes and randomly pick from
> >> > these.
> >> >
> >>
> >> That sounds messy and overly complex. What's wrong with doing it the
> >> way MPEG tells us to?
> >
> > that is what mpeg tells us to do, they do not mandate any specific way
> > to calculate random values. And i do not like doing sqrt() per band ...
> >
> 
> One sqrtf() per band isn't that intense. To stick with the current
> approach we still need to do a sqrt on the band size. We could even
> use one of those fast 1/sqrt algorithms.

we do not need to do a sqrt() on the band size, not in the current
approuch and not with the other variant. A small LUT will do fine
considering the small number of band sizes. And even that is not
needed in all cases ...

> 
> >
> >> Or just sticking with what we have it sounds
> >> fine and is fast.
> >
> > well if its conformant thats fine, but it seemed to me that it is not
> 
> It isn't but it seemed to me that strict conformance is not always
> required for ffmpeg (RE: block switching)?

the way i understood the spec when i read the relevant pat last time is
that this specific kind of block switching is invalid, thus outside the
spec and no special behavior is mandated.

> 
> > though iam still waiting for a quote from the spec to confirm this.
> >
> 
> I sent you section and page numbers. It's not a one liner. The [PNS-1]
> section of 14496-4:2004/6.6.1.2.2.4 p94 (102 PDF) is fairly straight
> forward.

I will not pay iso for a document just to confirm the claim.
Luckily though ive found 14496-3:2005 on my hd (sorry, i do have a mess)
It says:
The noise substitution decoding process for one channel is defined by the following pseudo code:
nrg = global_gain - NOISE_OFFSET - 256;
for (g=0; g<num_window_groups; g++) {
   /* Decode noise energies for this group */
   for (sfb=0; sfb<max_sfb; sfb++) {
        if (is_noise(g,sfb)) {
           nrg += dpcm_noise_nrg[g][sfb];
           noise_nrg[g][sfb] = nrg;
        }
    }
     /* Do perceptual noise substitution decoding */
     for (b=0; b<window_group_length[g]; b++) {
         for (sfb=0; sfb<max_sfb; sfb++) {
             if (is_noise(g,sfb)) {
                 size = swb_offset[sfb+1] - swb_offset[sfb];
                 /* Generate random vector */
                 gen_rand_vector( &spec[g][b][sfb][0], size );
                 nrg=0;
                 for (i=0; i<size; i++) {
                    nrg+= spec[g][b][sfb][i] * spec[g][b][sfb][i];
                 }
                 sqrt_nrg = sqrt (nrg);
                 scale *= 2.0^(0.25*noise_nrg [g][sfb]) / sqrt_nrg;
                 /* scale random vector to desired target energy */
                 for (i=0; i<size; i++) {
                    spec[g][b][sfb][i] *= scale;
                 }
             }
         }
     }
}
The constant NOISE_OFFSET is used to adapt the range of average noise energy values to the usual range of
scalefactors and has a value of 90.
The function gen_rand_vector( addr, size ) generates a vector of length <size> with signed random values whereas
their sum of squares is unequal to zero. A suitable random number generator can be realized using one
multiplication/accumulation per random value.
---------------

So it seems the spec has been changed to use the more idiotic normalization.
And that confirms your claim.

> 
> 
> >
> >>
> >> >
> >> >>
> >> >> >
> >> >> >>
> >> >> >> However with this patch there appears to be no audible difference
> >> >> >> between the approaches.
> >> >> >
> >> >> >> I don't know the ideal mean energy so I'm
> >> >> >> using the sample mean energy for 1024 iterations of the LCG.
> >> >> >
> >> >> > i assume cpu cycles got more expensive if people can only spare a few
> >> >> > thousand
> >> >> >
> >> >>
> >> >> How many do you propose then? I tried running it over the whole period
> >> >> and the result seemed low, I think it's a classic case of adding too
> >> >> many equal size floating point values.
> >> >
> >> > real mathematicans tend not to use floats that are bound to rounding errors
> >> >
> >> > try:
> >> > for(i=min; i<=max; i++){
> >> >    uint64_t a= i*i;
> >> >    var += a;
> >> >    if(var < a){
> >> >            var2++;
> >> >    }
> >> > }
> >> >
> >>
> >> That will only hold 5 or 6 big values 2^64/((2^31)^2) = 4.
> >
> > read the code again please, especially var2
> > also, just to make sure you have the types correct
> >
> > int min= -(1<<31)+1;
> > int max=  (1<<31)-1;
> > int64_t i;
> > uint64_t var=0;
> > uint64_t var2=0;
> >
> 
> Why are we leaving out (int)0x80000000?

because otherwise the mean would be -0.5.

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Its not that you shouldnt use gotos but rather that you should write
readable code and code with gotos often but not always is less readable
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20081007/9444eb13/attachment.pgp>