[FFmpeg-devel] [PATCH] AAC Decoder - Round 2.
Robert Swain
robert.swain
Sun Jun 29 17:21:52 CEST 2008
2008/6/29 Michael Niedermayer <michaelni at gmx.at>:
> On Sun, Jun 29, 2008 at 03:19:24PM +0100, Robert Swain wrote:
>> 2008/6/29 Robert Swain <robert.swain at gmail.com>:
>> > 2008/6/29 Michael Niedermayer <michaelni at gmx.at>:
>> >> On Sun, Jun 29, 2008 at 10:42:40AM +0100, Robert Swain wrote:
>> >>> 2008/6/27 Michael Niedermayer <michaelni at gmx.at>:
>> >>> > On Fri, Jun 27, 2008 at 03:35:06PM +0100, Robert Swain wrote:
>> >>> >> 2008/6/23 Michael Niedermayer <michaelni at gmx.at>:
>> >>> >> > On Mon, Jun 23, 2008 at 02:10:56PM +0100, Robert Swain wrote:
>> >>> >> >> 2008/6/20 Michael Niedermayer <michaelni at gmx.at>:
>> >>> >> >> > On Thu, Jun 19, 2008 at 04:22:57PM +0100, Robert Swain wrote:
>> >>> >> >> > [...]
>> >>> >> >> >> +
>> >>> >> >> >> + for (g = 0; g < ics->num_window_groups; g++) {
>> >>> >> >> >> + for (i = 0; i < ics->max_sfb; i++) {
>> >>> >> >> >> + if (cb[g][i] == NOISE_HCB) {
>> >>> >> >> >> + for (group = 0; group < ics->group_len[g]; group++) {
>> >>> >> >> >> + float energy = 0;
>> >>> >> >> >> + float scale = 1.;// / (float)(offsets[i+1] - offsets[i]);
>> >>> >> >> >> + for (k = offsets[i]; k < offsets[i+1]; k++)
>> >>> >> >> >> + energy += (float)icoef[group*128+k] * icoef[group*128+k];
>> >>> >> >> >> + scale *= sf[g][i] / sqrt(energy);
>> >>> >> >> >
>> >>> >> >> > are you sure that the random values have to be normalized like that?
>> >>> >> >> > I suspect energy is supposed tp be a constant.
>> >>> >> >>
>> >>> >> >> That's how it is in the spec. From section 4.6.13 Perceptual Noise
>> >>> >> >> Substitution (PNS):
>> >>> >> >
>> >>> >> > Ive checked the spec before my reply, and i belive your code is wrong.
>> >>> >> >
>> >>> >> >> The energy information for percpetual noise substitution decoding is
>> >>> >> >> represented by a "noise energy" value indicating the overall power of
>> >>> >> >> the substituted spectral coefficients in steps of 1.5 dB. If noise
>> >>> >> >> substitution coding is active for a particular group and scalefactor
>> >>> >> >> band, a noise energy value is transmitted instead of the scalefactor
>> >>> >> >> of the respective channel.
>> >>> >> >
>> >>> >> > Doesnt say that the output from the random number generator should be choped
>> >>> >> > up in bands and each independantly renormalized.
>> >>> >> >
>> >>> >> > Heres what the spec says:
>> >>> >> > /* Decode noise energies for this group */
>> >>> >> > for (sfb=0; sfb<max_sfb; sfb++)
>> >>> >> > if (is_noise(g,sfb))
>> >>> >> > noise_nrg[g][sfb] = nrg += dpcm_noise_nrg[g][sfb];
>> >>> >> > /* Do perceptual noise substitution decoding */
>> >>> >> > for (b=0; b<window_group_length[g]; b++) {
>> >>> >> > for (sfb=0; sfb<max_sfb; sfb++) {
>> >>> >> > if (is_noise(g,sfb)) {
>> >>> >> > offs = swb_offset[sfb];
>> >>> >> > size = swb_offset[sfb+1] - offs;
>> >>> >> > /* Generate random vector */
>> >>> >> > gen_rand_vector( &spec[g][b][sfb][0], size );
>> >>> >> > scale = 1/(size * sqrt(MEAN_NRG));
>> >>> >> > scale *= 2.0^(0.25*noise_nrg [g][sfb]);
>> >>> >> > /* Scale random vector to desired target energy */
>> >>> >> > for (i=0; i<len; i++)
>> >>> >> > spec[g][b][sfb][i] *= scale;
>> >>> >> > }
>> >>> >> > }
>> >>> >> > }
>> >>> >> > ...
>> >>> >> > The function gen_rand_vector( addr, size ) generates a vector of length <size> with signed random values of
>> >>> >> > average energy MEAN_NRG per random value. A suitable random number generator can be realized using one
>> >>> >> > multiplication/accumulation per random value.
>> >>> >> >
>> >>> >> > No weird renormalization!
>> >>> >> > also the size factor is commented out in our code, i guess to cancel the
>> >>> >> > incorrect normalization mostly out.
>> >>> >>
>> >>> >> OK, I understand what you mean now. I see that the current code we
>> >>> >> have calculates the energy of the band in question (hence there's no
>> >>> >> need for /size) and scales to that energy rather than scaling to
>> >>> >> 1/(size * MEAN_NRG). I have a few questions:
>> >>> >>
>> >>> >
>> >>> >> - Is the av_random() & 0xFFFF code OK or should the 32-bit random
>> >>> >> value rather be scaled to 16-bit? I suspect it should be scaled (i.e.
>> >>> >> >> 16).
>> >>> >
>> >>> > Why dont you use the whole 32bit value? Or was the array int16_t in
>> >>> > which case >>16 is as good as &0xFFFF given the random number generator
>> >>> > is good.
>> >>>
>> >>> I might as well and I'll probably move the av_random() calls to
>> >>> quant_to_spec_tool().
>> >>>
>> >>> >> - How does one analytically calculate the mean energy of the noise we
>> >>> >> generate such that this value can be defined as a constant and used in
>> >>> >> our code?
>> >>> >
>> >>> > How do you define energy?
>> >>>
>> >>> For a discrete signal x_i for i = [0, N], the energy of the signal is
>> >>> sum(x_i * x_i, i=0, i=N).
>> >>>
>> >>> > What is the distribution of the values av_random() outputs?
>> >>>
>> >>> /** generates a random number on [0,0xffffffff]-interval */
>> >>>
>> >>> If it claims to generate a random number on that interval then without
>> >>> stating any bias in the distribution, it should mean generates values
>> >>> in that interval with equal probability.
>> >>>
>> >>> >> After assuming white noise and dabbling about a bit with
>> >>> >> some maths it seems it should be 1/3 * maximum possible energy i.e.
>> >>> >> #define MEAN_NRG sqrt(3.0) * 32768.0 or something like that. Is this
>> >>> >> correct?
>> >>> >
>> >>> > If you know how you define energy then you can easily check this by
>> >>> > calculating the exact nummerical energy of many values from av_random()
>> >>> > your calculation should be close to the value returned ...
>> >>>
>> >>> I tried my sum for large values of N with x_i on the interval [0.0,
>> >>> 1.0] and it came close to 1/3.
>> >>>
>> >>> Ideally, for the mean energy, one would use the limit as N->inf of the
>> >>> above-mentioned sum / N
>> >>>
>> >>> > If you further assume random() has a flat equi distibuted output than
>> >>> > a simple
>> >>> > for(i=0; i<=0xFFFF; i++)
>> >>> > find_energy(&energy, i);
>> >>> > will give you the exact value
>> >>> > ... the analytical one, assuming you solved the correct integral ;) is
>> >>> > actually just an approximation for the continuous case ...
>> >>>
>> >>> Right, and I was just wondering if you could clarify the continuous case. :)
>> >>>
>> >>> By the way, I've just looked at the FAAD and 3GPP implementations and
>> >>> they both scale per random vector, i.e. per band as the current code
>> >>> does. They both actually do the scaling within the random vector
>> >>> generation function though I guess FAAD is based off the 3GPP
>> >>> reference code.
>> >>>
>> >>> So, which should we use?
>> >>>
>> >>> - Scaling per band as in the reference code
>> >>> - Scaling with constant mean energy
>> >>
>> >> the second, because
>> >> 1. thats what the spec we have says
>> >> 2. its faster
>> >> 3. its simpler
>> >> 4. its random, rescaling subsets of numbers "dynamically" is not random
>> >> anymore.
>> >>
>> >> If of course all files we had with thsis feature sounded better with the
>> >> faad/3gpp variant then we should choose the first but IIRC we dont have
>> >> any such files ...
>> >>
>> >>
>> >>>
>> >>> If the latter, what precision should be used for the sqrt(3.0) * 2^31 constant?
>> >>
>> >> as the output is float, a float constant seems logic ...
>> >
>> > See attached.
>>
>> Damnit. Why do I always spot small errors right after I've decided I'm
>> happy with a patch and have submitted it? The return of av_random is
>> unsigned and it needs to be signed so cast to int32_t. See attached
>> again. Sorry for the noise.
>
> ok
Applied.
Rob
More information about the ffmpeg-devel
mailing list