[FFmpeg-devel] Nellymoser encoder

Michael Niedermayer michaelni
Thu Aug 28 14:43:23 CEST 2008


On Thu, Aug 28, 2008 at 12:53:50PM +0200, Bartlomiej Wolowiec wrote:
> Thursday 28 August 2008 00:11:20 Michael Niedermayer napisa?(a):
> [...]
> > > +    DSPContext      dsp;
> > > +    MDCTContext     mdct_ctx;
> > > +    DECLARE_ALIGNED_16(float, mdct_out[NELLY_SAMPLES]);
> > > +    DECLARE_ALIGNED_16(float, buf[2 * NELLY_SAMPLES]);     ///< sample
> > > buffer +} NellyMoserEncodeContext;
> > > +
> > >
> > > +static DECLARE_ALIGNED_16(float, sine_window[NELLY_SAMPLES]);
> >
> > duplicate of ff_sine_windows and sine_window form nellymoserdec
> 
> not really, sine_window from nellymoserdec is just a half of it. I haven't 
> compared efficiency, but it seems to me that vector_fmul may be quicker than 
> overlap_and_window? from nellymoserdec using half of sine_window. Or maybe I 
> miss some details...

you miss vector_fmul_reverse


> 
> [...]
> > > +/**
> > > + * Searching index in table with size table_size, where
> > > + * |val-table[best_idx]| is minimal.
> > > + * It assumes that table elements are in increasing order and uses
> > > binary search. + */
> > > +#define find_best_value(val, table, table_size, best_idx) \
> > > +{ \
> > > +    int first=0, last=table_size-1, mid; \
> > > +    while(first<=last){ \
> > > +        mid=(first+last)/2; \
> > > +        if(val > table[mid]){ \
> > > +            first = mid + 1; \
> > > +        }else{ \
> > > +            last = mid - 1; \
> > > +        } \
> > > +    } \
> > > +    if(!first || (first!=table_size && table[first]-val <
> > > val-table[last])) \ +        best_idx = first; \
> > > +    else \
> > > +        best_idx = last; \
> > > +}
> >
> > This can be done faster with a look up table
> > and a single right value vs. left value check
> 
> Ok, I may do it for ff_nelly_init_table and ff_nelly_delta_table, but I don't 
> really now how to do it for float type (ff_nelly_dequantization_table)

idx= LUT[lrintf(val*A+B)]
if(fabs(val - tab[idx]) > fabs(val - tab[idx+1])
    idx++;

with A and B being appropriate constants


> 
> > > +
> > > +/**
> > > + * Encodes NELLY_SAMPLES samples. It assumes, that samples contains 3 *
> > > NELLY_BUF_LEN values + *  @param s               encoder context
> > > + *  @param output          output buffer
> > > + *  @param output_size     size of output buffer
> > > + *  @param samples         input samples
> > > + */
> > > +static void encode_block(NellyMoserEncodeContext *s,
> > > +                         unsigned char *output, int output_size, float
> > > *samples) +{
> > > +    PutBitContext pb;
> > > +    int i, band, block, best_idx, power_idx = 0;
> > > +    float power_val, power_candidate, coeff, coeff_sum;
> > > +    int band_start, band_end;
> > > +
> > > +    apply_mdct(s, samples, s->mdct_out);
> > > +    apply_mdct(s, samples + NELLY_BUF_LEN, s->mdct_out + NELLY_BUF_LEN);
> > > +
> > > +    init_put_bits(&pb, output, output_size * 8);
> > > +
> > > +    band_start = 0;
> > > +    band_end = ff_nelly_band_sizes_table[0];
> > > +    for (band = 0; band < NELLY_BANDS; band++) {
> > > +        coeff_sum = 0;
> > > +        for (i = band_start; i < band_end; i++) {
> > >
> > > +            for (block = 0; block < 2; block++) {
> > > +                coeff = s->mdct_out[i + block * NELLY_BUF_LEN];
> > > +                coeff_sum += coeff * coeff;
> > > +            }
> >
> > id unroll that by hand to
> > coeff_sum += s->mdct_out[i                ]*s->mdct_out[i                ];
> >             +s->mdct_out[i + NELLY_BUF_LEN]*s->mdct_out[i + NELLY_BUF_LEN];
> >
> > > +        }
> > > +        power_candidate =
> > > +            (log(FFMAX(64.0, coeff_sum /
> > > (ff_nelly_band_sizes_table[band] << 1))) - +             log(64.0)) *
> > > 1024.0 / M_LN2;
> >
> > log(FFMAX(1.0, coeff_sum / (ff_nelly_band_sizes_table[band] << 7))) *
> > 1024.0 / M_LN2;
> >
> > also this is based on
> > (sum(0..N) ABS(coeff)^2/N)^(1/2)
> >
> > it would be interresting to try
> > C*(sum(0..N) ABS(coeff)^D/N)^(1/D) for different values of C and D
> >
> > maybe you could try
> > C={0.9,1.0,1.1}
> > D={1.9,2.0,2.1}
> > at first and see if any improves distortion
> 
> Hmm... How should I check distortion? 

listening is one option ...


> I've listened to few recorgings and in 
> my opinion differences are insignificant - sometimes D=2.0 is better, 
> sometimes D=2.3... C!=1.0 in my opinion doesn't give better effects. 

have you tried different C for D=2.3 ?
also what about larger D like 3.0 ?
besides can you post a patch so other can (if they want) also experiment with
this?


> 
> > > +
> > > +        if (band) {
> > > +            power_candidate -= power_idx;
> > > +            find_best_value(power_candidate, ff_nelly_delta_table, 32,
> > > best_idx); +            put_bits(&pb, 5, best_idx);
> > > +            power_idx += ff_nelly_delta_table[best_idx];
> > > +        } else {
> > > +            //base exponent
> > > +            find_best_value(power_candidate, ff_nelly_init_table, 64,
> > > best_idx); +            put_bits(&pb, 6, best_idx);
> > > +            power_idx = ff_nelly_init_table[best_idx];
> > > +        }
> >
> > I wish i knew how to optimally assign these values, sadly i do not.
> > Suggestions would be welcome of course in case anyone has an idea on how
> > to optimally select them, the tricky part is that these not only scale the
> > signal, they also are the basis upon which the bits per band and thus
> > encoding is selected.
> >
> > Still they could be made to closer match the "power_candidate" values from
> > above using viterbi though arguably it would just be closer to a guess.
> >
> > An alternative may be to just retry the whole encode_block with slightly
> > changed power_candidate values for each band and pick what end up with the
> > least distortion (that is least difference to the input signal)
> > This should be rather easy to try ...
> 
> slightly change ? What do you exactly? mean?

power_candidate += random


> And again a problem how to 
> measure distortion - common difference mdct won't give a good effect.

Well, kostyas psy model if it where finished could be used ...


> 
> > > +
> > > +        if (power_idx >= 0) {
> > > +            power_val = pow_table[power_idx & 0x7FF] / (1 << (power_idx
> > > >> 11)); +        } else {
> > > +            power_val = -pow(2, -power_idx / 2048.0 - 3.0);
> > > +        }
> >
> > power_idx can be <0 ?
> 
> Yes. In this encoder code it can be, possibly in original decoder too.

pow_table[power_idx & 0x7FF] / (1 << ((power_idx >> 11)+C));

with appropriately changed pow_table may avoid it

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

If a bugfix only changes things apparently unrelated to the bug with no
further explanation, that is a good sign that the bugfix is wrong.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080828/df7daf69/attachment.pgp>



More information about the ffmpeg-devel mailing list