[FFmpeg-devel] Nellymoser encoder

Michael Niedermayer michaelni
Sun Aug 31 23:49:23 CEST 2008


On Sun, Aug 31, 2008 at 10:07:22PM +0200, Bartlomiej Wolowiec wrote:
> Sunday 31 August 2008 15:53:23 Michael Niedermayer napisa?(a):
> > On Sun, Aug 31, 2008 at 01:07:15PM +0200, Bartlomiej Wolowiec wrote:
> > > Saturday 30 August 2008 18:10:41 Michael Niedermayer napisa?(a):
> > > > On Sat, Aug 30, 2008 at 03:42:37PM +0200, Bartlomiej Wolowiec wrote:
> > > > > Friday 29 August 2008 22:36:10 Michael Niedermayer napisa?(a):
> > > > > > > > > > > +
> > > > > > > > > > > +void apply_mdct(NellyMoserEncodeContext *s, float *in,
> > > > > > > > > > > float *coefs) +{
> > > > > > > > > > > +    DECLARE_ALIGNED_16(float, in_buff[NELLY_SAMPLES]);
> > > > > > > > > > > +
> > > > > > > > > > > +    memcpy(&in_buff[0], &in[0], NELLY_SAMPLES *
> > > > > > > > > > > sizeof(float)); +    s->dsp.vector_fmul(in_buff,
> > > > > > > > > > > ff_sine_128, NELLY_BUF_LEN); +   
> > > > > > > > > > > s->dsp.vector_fmul_reverse(in_buff + NELLY_BUF_LEN,
> > > > > > > > > > > in_buff + NELLY_BUF_LEN, ff_sine_128, NELLY_BUF_LEN); +
> > > > > > > > > > > ff_mdct_calc(&s->mdct_ctx, coefs, in_buff);
> > > > > > > > > > > +}
> > > > > > > > > >
> > > > > > > > > > The data is copied once in encode_frame and twice here
> > > > > > > > > > There is no need to copy the data 3 times.
> > > > > > > > > > vector_fmul can be used with a singl memcpy to get the data
> > > > > > > > > > into any destination, and vector_fmul_reverse doesnt even
> > > > > > > > > > need 1 memcpy, so overall a single memcpy is enough
> > > > > > > > >
> > > > > > > > > Hope that you meant something similar to my solution.
> > > > > > > >
> > > > > > > > no, you still do 2 memcpy() but now the code is really messy as
> > > > > > > > well.
> > > > > > > >
> > > > > > > > what you should do is, for each block of samples you get from
> > > > > > > > the user 1. apply one half of the window onto it with
> > > > > > > > vector_fmul_reverse and destination of some internal buffer
> > > > > > > > 2. memcpy into the 2nd destination and apply the other half of
> > > > > > > > the window onto it with vector_fmul
> > > > > > > > 3. run the mdct as appropriate on the internal buffers.
> > > > > > >
> > > > > > > Hmm, I considered it, but I don't understand exactly what should
> > > > > > > I change... In the code I copy data two times:
> > > > > > > a) in encode_frame - I convert int16_t to float and copy data to
> > > > > > > s->buf - I need to do it somewhere because vector_mul requires
> > > > > > > float *. Additionally, part of the data is needed to the next
> > > > > > > call of encode_frame b) in apply_mdct - here I think that some
> > > > > > > additional part of buffer is needed. If I understood correctly I
> > > > > > > have to get rid of a), but how to get access to old data when the
> > > > > > > next call of encode_frame is performed and how call vector_fmul
> > > > > > > on int16_t?
> > > > > >
> > > > > > have you tried setting AVCodec.sample_fmts to SAMPLE_FMT_FLT ?
> > > > > > I think ffmpeg should support this already. If it does not work
> > > > > > then we can keep int16 for now which would implicate more copying
> > > > >
> > > > > Hmm... I tried to use SAMPLE_FMT_FLT, but something doesn't work. I
> > > > > made only that changes:
> > > > >
> > > > > float *samples = data;
> > > > > ...
> > > > > for (i = 0; i < avctx->frame_size; i++) {
> > > > >     s->buf[s->bufsel][i] = samples[i]*(1<<15);
> > > > > }
> > > > > ...
> > > > > .sample_fmts = (enum SampleFormat[]){SAMPLE_FMT_FLT,SAMPLE_FMT_NONE},
> > > >
> > > > hmm
> > >
> > > Any idea? or should I leave it as it is?
> >
> > does PCM float work for you? if so what is the difference to your encoder?
> 
> pcm_f32le doesn't work - because it isn't hacked in ffmpeg.c. Nellymoser 
> probably for the same reason...

[...]
> > > +
> > > +    apply_mdct(s);
> > > +
> > >
> > > +    init_put_bits(&pb, output, output_size * 8);
> > > +
> > > +    i = 0;
> > > +    for (band = 0; band < NELLY_BANDS; band++) {
> > > +        coeff_sum = 0;
> > > +        for (j = 0; j < ff_nelly_band_sizes_table[band]; i++, j++) {
> > > +            //coeff_sum += s->mdct_out[i                ] *
> > > s->mdct_out[i                ] +            //           + s->mdct_out[i
> > > + NELLY_BUF_LEN] * s->mdct_out[i + NELLY_BUF_LEN]; +            coeff_sum
> > > += pow(fabs(s->mdct_out[i]), D) + pow(fabs(s->mdct_out[i +
> > > NELLY_BUF_LEN]), D); +        }
> > > +        cand[band] =
> > > +            //log(FFMAX(1.0, coeff_sum /
> > > (ff_nelly_band_sizes_table[band] << 7))) * 1024.0 / M_LN2; +            C
> > > * log(FFMAX(1.0, coeff_sum / (ff_nelly_band_sizes_table[band] << 7))) *
> > > 1024.0 / log(D);
> >
> > the MAX should maybe be done after the correction for D
> 
> I don't know what exactly do you mean...

forget it, ive misread the order of the () somehow


> 
> -- 
> Bartlomiej Wolowiec

> Index: nellymoserenc.c
> ===================================================================
> --- nellymoserenc.c	(wersja 15126)
> +++ nellymoserenc.c	(kopia robocza)
> @@ -45,11 +45,18 @@
>  #define POW_TABLE_SIZE (1<<11)
>  #define POW_TABLE_OFFSET 3
>  
> +#undef NDEBUG
> +#include <assert.h>
> +
>  typedef struct NellyMoserEncodeContext {
>      AVCodecContext  *avctx;
>      int             last_frame;
> +    int             bufsel;


> +    int             have_saved;
>      DSPContext      dsp;
>      MDCTContext     mdct_ctx;
> +    DECLARE_ALIGNED_16(float, mdct_out[NELLY_SAMPLES]);

ok


[...]

> @@ -146,6 +169,212 @@
>      if (fabs(val - table[best_idx]) > fabs(val - table[best_idx + 1])) \
>          best_idx++;
>  
> +static void get_exponent_greedy(NellyMoserEncodeContext *s, float *cand, int *idx_table)
> +{
> +    int band, best_idx, power_idx = 0;
> +    float power_candidate;
> +
> +    //base exponent
> +    find_best(cand[0], ff_nelly_init_table, sf_lut, -20, 96);
> +    idx_table[0] = best_idx;
> +    power_idx = ff_nelly_init_table[best_idx];
> +
> +    for (band = 1; band < NELLY_BANDS; band++) {
> +        power_candidate = cand[band] - power_idx;
> +        find_best(power_candidate, ff_nelly_delta_table, sf_delta_lut, 37, 78);
> +        idx_table[band] = best_idx;
> +        power_idx += ff_nelly_delta_table[best_idx];
> +    }
> +}

ok


> +
> +#define OPT_SIZE ((1<<15) + 3000)
> +
> +static inline float distance(float x, float y, int band)
> +{
> +    //return pow(fabs(x-y), 2.0);
> +    float tmp = x - y;
> +    return tmp * tmp;
> +}
> +
> +static void get_exponent_dynamic(NellyMoserEncodeContext *s, float *cand, int *idx_table)
> +{
> +    int i, j, band, best_idx;
> +    float power_candidate, best_val;
> +
> +    float opt[NELLY_BANDS][OPT_SIZE];
> +    int path[NELLY_BANDS][OPT_SIZE];
> +
> +    for (i = 0; i < NELLY_BANDS * OPT_SIZE; i++) {
> +        opt[0][i] = INFINITY;
> +    }
> +
> +    for (i = 0; i < 64; i++) {
> +        opt[0][ff_nelly_init_table[i]] = distance(cand[0], ff_nelly_init_table[i], 0);
> +        path[0][ff_nelly_init_table[i]] = i;
> +    }
> +
> +    for (band = 1; band < NELLY_BANDS; band++) {
> +        int q, c = 0;
> +        float tmp;
> +        int idx_min, idx_max, idx;
> +        power_candidate = cand[band];
> +        for (q = 1000; !c && q < OPT_SIZE; q <<= 2) {
> +            idx_min = FFMAX(0, cand[band] - q);
> +            idx_max = FFMIN(OPT_SIZE, cand[band - 1] + q);
> +            for (i = FFMAX(0, cand[band - 1] - q); i < FFMIN(OPT_SIZE, cand[band - 1] + q); i++) {
> +                if ( isinf(opt[band - 1][i]) )
> +                    continue;
> +                for (j = 0; j < 32; j++) {
> +                    idx = i + ff_nelly_delta_table[j];
> +                    if (idx > idx_max)
> +                        break;
> +                    if (idx >= idx_min) {
> +                        tmp = opt[band - 1][i] + distance(idx, power_candidate, band);
> +                        if (opt[band][idx] > tmp) {
> +                            opt[band][idx] = tmp;
> +                            path[band][idx] = j;
> +                            c = 1;
> +                        }
> +                    }
> +                }
> +            }
> +        }
> +        assert(c); //FIXME
> +    }
> +
> +    best_val = INFINITY;
> +    best_idx = -1;
> +    band = NELLY_BANDS - 1;
> +    for (i = 0; i < OPT_SIZE; i++) {
> +        if (best_val > opt[band][i]) {
> +            best_val = opt[band][i];
> +            best_idx = i;
> +        }
> +    }
> +    for (band = NELLY_BANDS - 1; band >= 0; band--) {
> +        idx_table[band] = path[band][best_idx];
> +        if (band) {
> +            best_idx -= ff_nelly_delta_table[path[band][best_idx]];
> +        }
> +    }
> +}

this could be improved a bit but when it doesnt help quality, theres no
point, so its ok too


> +
> +/**
> + * Encodes NELLY_SAMPLES samples. It assumes, that samples contains 3 * NELLY_BUF_LEN values
> + *  @param s               encoder context
> + *  @param output          output buffer
> + *  @param output_size     size of output buffer
> + */
> +static void encode_block(NellyMoserEncodeContext *s, unsigned char *output, int output_size)
> +{
> +    PutBitContext pb;
> +    int i, j, band, block, best_idx, power_idx = 0;
> +    float power_val, coeff, coeff_sum;
> +    float pows[NELLY_FILL_LEN];
> +    int bits[NELLY_BUF_LEN], idx_table[NELLY_BANDS];
> +    float cand[NELLY_BANDS];
> +
> +    const float C = 1.0;
> +    const float D = 2.0;
> +
> +    apply_mdct(s);
> +
> +    init_put_bits(&pb, output, output_size * 8);
> +
> +    i = 0;
> +    for (band = 0; band < NELLY_BANDS; band++) {
> +        coeff_sum = 0;
> +        for (j = 0; j < ff_nelly_band_sizes_table[band]; i++, j++) {
> +            //coeff_sum += s->mdct_out[i                ] * s->mdct_out[i                ]
> +            //           + s->mdct_out[i + NELLY_BUF_LEN] * s->mdct_out[i + NELLY_BUF_LEN];
> +            coeff_sum += pow(fabs(s->mdct_out[i]), D) + pow(fabs(s->mdct_out[i + NELLY_BUF_LEN]), D);
> +        }
> +        cand[band] =
> +            //log(FFMAX(1.0, coeff_sum / (ff_nelly_band_sizes_table[band] << 7))) * 1024.0 / M_LN2;
> +            C * log(FFMAX(1.0, coeff_sum / (ff_nelly_band_sizes_table[band] << 7))) * 1024.0 / log(D);
> +    }
> +
> +    if (s->avctx->trellis) {
> +        get_exponent_dynamic(s, cand, idx_table);
> +    } else {
> +        get_exponent_greedy(s, cand, idx_table);
> +    }
> +
> +    i = 0;
> +    for (band = 0; band < NELLY_BANDS; band++) {
> +        if (band) {
> +            power_idx += ff_nelly_delta_table[idx_table[band]];
> +            put_bits(&pb, 5, idx_table[band]);
> +        } else {
> +            power_idx = ff_nelly_init_table[idx_table[0]];
> +            put_bits(&pb, 6, idx_table[0]);
> +        }
> +        power_val = pow_table[power_idx & 0x7FF] / (1 << ((power_idx >> 11) + POW_TABLE_OFFSET));
> +        for (j = 0; j < ff_nelly_band_sizes_table[band]; i++, j++) {
> +            s->mdct_out[i] *= power_val;
> +            s->mdct_out[i + NELLY_BUF_LEN] *= power_val;
> +            pows[i] = power_idx;
> +        }
> +    }
> +
> +    ff_nelly_get_sample_bits(pows, bits);
> +
> +    for (block = 0; block < 2; block++) {
> +        for (i = 0; i < NELLY_FILL_LEN; i++) {
> +            if (bits[i] > 0) {
> +                const float *table = ff_nelly_dequantization_table + (1 << bits[i]) - 1;
> +                coeff = s->mdct_out[block * NELLY_BUF_LEN + i];
> +                best_idx =
> +                    quant_lut[av_clip (
> +                            coeff * quant_lut_mul[bits[i]] + quant_lut_add[bits[i]],
> +                            quant_lut_offset[bits[i]],
> +                            quant_lut_offset[bits[i]+1] - 1
> +                            )];
> +                if (fabs(coeff - table[best_idx]) > fabs(coeff - table[best_idx + 1]))
> +                    best_idx++;
> +
> +                put_bits(&pb, bits[i], best_idx);
> +            }
> +        }
> +        if (!block)
> +            put_bits(&pb, NELLY_HEADER_BITS + NELLY_DETAIL_BITS - put_bits_count(&pb), 0);
> +    }
> +}

as the C/D stuff turned out  useless you can remove that again, except that ok

the rest of the patch is ok as well (except the #undef NDEBUG)
unless you want to fix ffmpeg to work with floats in which case the rest
can be simplified.

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

If a bugfix only changes things apparently unrelated to the bug with no
further explanation, that is a good sign that the bugfix is wrong.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080831/4f73ee04/attachment.pgp>



More information about the ffmpeg-devel mailing list