[FFmpeg-devel] [RFC] Monkey's Audio decoder

Kostya kostya.shishkov
Wed Sep 12 07:08:26 CEST 2007


On Tue, Sep 11, 2007 at 08:23:09PM +0200, Michael Niedermayer wrote:
> Hi
> 
> On Tue, Sep 11, 2007 at 10:22:54AM +0300, Kostya wrote:
> > On Tue, Sep 11, 2007 at 12:10:54AM +0200, Michael Niedermayer wrote:
> > > Hi
> > > 
> > > On Mon, Sep 10, 2007 at 03:28:59PM +0300, Kostya wrote:
> > > > Here's Monkey's Audio decoder made from libdemac by Benjamin Zores
> > > > and made working by me.
> > > > 
> > > > Please comment on decoder/demuxer structure
> > > > (ape.c is demuxer, apedec.c is decoder).
> > > 
> > > just a incomplete review as the code is too messy for my taste and this
> > > wasnt a real submission but just RFC ...
> > 
> > Now here's an updated version with most issues addressed (aligning, comments,
> > some functions, extradata is also in portable format).
> > 
> > And notes on design:
> >   APE container resembles Musepack in the container design - data is stored
> > in 32-bit little-endian words and there is no padding between frames, so
> > in order to decode demuxer must read several bytes before actual start and
> > send offset to decoder, which bswap()s input, skips tail of the previous frame
> > and begins decoding.
> >   Another distinct feature is that single frame may contain several megabytes
> > of audio, which is clearly unusable. So this decoder (like demac does) decodes
> > frame by 4608*channels samples. As it is impossible to detect their boundaries
> > within frame data, demuxer sends first packet with frame data and then empty
> > packets and decoder decodes only this fixed block of data per each call.
> 
> why does the code need these dummy packets? the application should call the
> decoder again with not yet used data

a) data should be bswap32()d before use
b) offset in data should be passed to decoder

While demuxer take care of this, it is hard to make application do so.
Alternatives are handling the whole frames or decoding blocks to determine their
boundaries (which requires half of decoder code duplication).

The only problem with current approach - it makes error happening in decode_audio2()
because input frame size is larger than output audio buffer (*frame_size_ptr < buf_size)
but I am sure that condition may be dropped without any bad consequences.

> the problem with your approch is that if the stuff is stream copied then the
> file will be full of these dummy packets

I can't imagine this situation - it's another case of codec/container lock in.
 
> [...]
> > +/* Total size of all predictor histories - 50 * sizeof(int32_t) */
> > +#define PREDICTOR_SIZE 50
> 
> not doxygen compatible
> 
> 
> [...]
> > +typedef struct APEPredictor {
> > +    /* Filter histories */
> > +    int32_t *buf;
> 
> also not doxygen comatible

will doxify 

> [...]
> > +    APEFilter filter_l0[2], filter_l1[2], filter_l2[2];
> 
> APEFilter filter[3][2];
> 
> [...]
> > +static void init_predictor_decoder(APEContext * ctx)
> > +{
> > +    APEPredictor *p = &ctx->predictor;
> > +
> > +    /* Zero the history buffers */
> > +    memset(p->historybuffer, 0, PREDICTOR_SIZE * sizeof(int32_t));
> > +    p->buf = p->historybuffer;
> > +
> > +    /* Initialise and zero the co-efficients */
> > +    memcpy(p->YcoeffsA, initial_coeffs, sizeof(initial_coeffs));
> > +    memcpy(p->XcoeffsA, initial_coeffs, sizeof(initial_coeffs));
> > +    memset(p->YcoeffsB, 0, sizeof(p->YcoeffsB));
> > +    memset(p->XcoeffsB, 0, sizeof(p->XcoeffsB));
> > +
> > +    p->YfilterA = 0;
> > +    p->YfilterB = 0;
> > +    p->YlastA = 0;
> > +
> > +    p->XfilterA = 0;
> > +    p->XfilterB = 0;
> > +    p->XlastA = 0;
> 
> that ABXY stuff could be [0][0],[0][1],... and that could be a loop
> 
> [...]
> > +static void predictor_decode_stereo(APEContext * ctx, int count)
> > +{
> > +    int32_t predictionA, predictionB;
> > +    APEPredictor *p = &ctx->predictor;
> > +    int32_t *decoded0 = ctx->decoded0;
> > +    int32_t *decoded1 = ctx->decoded1;
> > +
> > +    while (count--) {
> > +        /* Predictor Y */
> > +        p->buf[YDELAYA]           = p->YlastA;
> > +        p->buf[YADAPTCOEFFSA]     = APESIGN(p->buf[YDELAYA]);
> > +        p->buf[YDELAYA - 1]       = p->buf[YDELAYA] - p->buf[YDELAYA - 1];
> > +        p->buf[YADAPTCOEFFSA - 1] = APESIGN(p->buf[YDELAYA - 1]);
> > +
> 
> > +        predictionA = (p->buf[YDELAYA]     * p->YcoeffsA[0]) +
> > +                      (p->buf[YDELAYA - 1] * p->YcoeffsA[1]) +
> > +                      (p->buf[YDELAYA - 2] * p->YcoeffsA[2]) +
> > +                      (p->buf[YDELAYA - 3] * p->YcoeffsA[3]);
> 
> the () are superflous
> 
> 
> > +
> > +        /*  Apply a scaled first-order filter compression */
> > +        p->buf[YDELAYB]           = p->XfilterA - ((p->YfilterB * 31) >> 5);
> > +        p->buf[YADAPTCOEFFSB]     = APESIGN(p->buf[YDELAYB]);
> > +        p->YfilterB = p->XfilterA;
> > +
> > +        p->buf[YDELAYB - 1]       = p->buf[YDELAYB] - p->buf[YDELAYB - 1];
> > +        p->buf[YADAPTCOEFFSB - 1] = APESIGN(p->buf[YDELAYB - 1]);
> > +
> > +        predictionB = (p->buf[YDELAYB]     * p->YcoeffsB[0]) +
> > +                      (p->buf[YDELAYB - 1] * p->YcoeffsB[1]) +
> > +                      (p->buf[YDELAYB - 2] * p->YcoeffsB[2]) +
> > +                      (p->buf[YDELAYB - 3] * p->YcoeffsB[3]) +
> > +                      (p->buf[YDELAYB - 4] * p->YcoeffsB[4]);
> > +
> > +        p->YlastA = *decoded0 + ((predictionA + (predictionB >> 1)) >> 10);
> > +        p->YfilterA = p->YlastA + ((p->YfilterA * 31) >> 5);
> > +
> > +        /* Predictor X */
> > +
> > +        p->buf[XDELAYA]           = p->XlastA;
> > +        p->buf[XADAPTCOEFFSA]     = APESIGN(p->buf[XDELAYA]);
> > +        p->buf[XDELAYA - 1]       = p->buf[XDELAYA] - p->buf[XDELAYA - 1];
> > +        p->buf[XADAPTCOEFFSA - 1] = APESIGN(p->buf[XDELAYA - 1]);
> > +
> > +        predictionA = (p->buf[XDELAYA]     * p->XcoeffsA[0]) +
> > +                      (p->buf[XDELAYA - 1] * p->XcoeffsA[1]) +
> > +                      (p->buf[XDELAYA - 2] * p->XcoeffsA[2]) +
> > +                      (p->buf[XDELAYA - 3] * p->XcoeffsA[3]);
> > +
> > +        /*  Apply a scaled first-order filter compression */
> > +        p->buf[XDELAYB]           = p->YfilterA - ((p->XfilterB * 31) >> 5);
> > +        p->buf[XADAPTCOEFFSB]     = APESIGN(p->buf[XDELAYB]);
> > +        p->XfilterB = p->YfilterA;
> > +        p->buf[XDELAYB - 1]       = p->buf[XDELAYB] - p->buf[XDELAYB - 1];
> > +        p->buf[XADAPTCOEFFSB - 1] = APESIGN(p->buf[XDELAYB - 1]);
> > +
> > +        predictionB = (p->buf[XDELAYB]     * p->XcoeffsB[0]) +
> > +                      (p->buf[XDELAYB - 1] * p->XcoeffsB[1]) +
> > +                      (p->buf[XDELAYB - 2] * p->XcoeffsB[2]) +
> > +                      (p->buf[XDELAYB - 3] * p->XcoeffsB[3]) +
> > +                      (p->buf[XDELAYB - 4] * p->XcoeffsB[4]);
> > +
> > +        p->XlastA = *decoded1 + ((predictionA + (predictionB >> 1)) >> 10);
> > +        p->XfilterA = p->XlastA + ((p->XfilterA * 31) >> 5);
> > +
> > +        if (*decoded0 > 0) {
> > +            p->YcoeffsA[0] -= p->buf[YADAPTCOEFFSA];
> > +            p->YcoeffsA[1] -= p->buf[YADAPTCOEFFSA - 1];
> > +            p->YcoeffsA[2] -= p->buf[YADAPTCOEFFSA - 2];
> > +            p->YcoeffsA[3] -= p->buf[YADAPTCOEFFSA - 3];
> > +
> > +            p->YcoeffsB[0] -= p->buf[YADAPTCOEFFSB];
> > +            p->YcoeffsB[1] -= p->buf[YADAPTCOEFFSB - 1];
> > +            p->YcoeffsB[2] -= p->buf[YADAPTCOEFFSB - 2];
> > +            p->YcoeffsB[3] -= p->buf[YADAPTCOEFFSB - 3];
> > +            p->YcoeffsB[4] -= p->buf[YADAPTCOEFFSB - 4];
> > +        } else if (*decoded0 < 0) {
> > +            p->YcoeffsA[0] += p->buf[YADAPTCOEFFSA];
> > +            p->YcoeffsA[1] += p->buf[YADAPTCOEFFSA - 1];
> > +            p->YcoeffsA[2] += p->buf[YADAPTCOEFFSA - 2];
> > +            p->YcoeffsA[3] += p->buf[YADAPTCOEFFSA - 3];
> > +
> > +            p->YcoeffsB[0] += p->buf[YADAPTCOEFFSB];
> > +            p->YcoeffsB[1] += p->buf[YADAPTCOEFFSB - 1];
> > +            p->YcoeffsB[2] += p->buf[YADAPTCOEFFSB - 2];
> > +            p->YcoeffsB[3] += p->buf[YADAPTCOEFFSB - 3];
> > +            p->YcoeffsB[4] += p->buf[YADAPTCOEFFSB - 4];
> > +        }
> > +
> > +        *(decoded0++) = p->YfilterA;
> > +
> > +        if (*decoded1 > 0) {
> > +            p->XcoeffsA[0] -= p->buf[XADAPTCOEFFSA];
> > +            p->XcoeffsA[1] -= p->buf[XADAPTCOEFFSA - 1];
> > +            p->XcoeffsA[2] -= p->buf[XADAPTCOEFFSA - 2];
> > +            p->XcoeffsA[3] -= p->buf[XADAPTCOEFFSA - 3];
> > +
> > +            p->XcoeffsB[0] -= p->buf[XADAPTCOEFFSB];
> > +            p->XcoeffsB[1] -= p->buf[XADAPTCOEFFSB - 1];
> > +            p->XcoeffsB[2] -= p->buf[XADAPTCOEFFSB - 2];
> > +            p->XcoeffsB[3] -= p->buf[XADAPTCOEFFSB - 3];
> > +            p->XcoeffsB[4] -= p->buf[XADAPTCOEFFSB - 4];
> > +        } else if (*decoded1 < 0) {
> > +            p->XcoeffsA[0] += p->buf[XADAPTCOEFFSA];
> > +            p->XcoeffsA[1] += p->buf[XADAPTCOEFFSA - 1];
> > +            p->XcoeffsA[2] += p->buf[XADAPTCOEFFSA - 2];
> > +            p->XcoeffsA[3] += p->buf[XADAPTCOEFFSA - 3];
> > +
> > +            p->XcoeffsB[0] += p->buf[XADAPTCOEFFSB];
> > +            p->XcoeffsB[1] += p->buf[XADAPTCOEFFSB - 1];
> > +            p->XcoeffsB[2] += p->buf[XADAPTCOEFFSB - 2];
> > +            p->XcoeffsB[3] += p->buf[XADAPTCOEFFSB - 3];
> > +            p->XcoeffsB[4] += p->buf[XADAPTCOEFFSB - 4];
> > +        }
> > +
> > +        *(decoded1++) = p->XfilterA;
> > +
> 
> all the code above looks like its duplicated betwee X and Y

Yes, it's symmetric, will redo as function.  
 
> [...]
> > +static void init_frame_decoder(APEContext * ctx)
> > +{
> > +    init_entropy_decoder(ctx);
> > +    init_predictor_decoder(ctx);
> > +
> > +    switch (ctx->compression_level) {
> > +    case COMPRESSION_LEVEL_NORMAL:
> > +        init_filter(ctx, ctx->filter_l0, ctx->filterbuf32, 16);
> > +        break;
> > +
> > +    case COMPRESSION_LEVEL_HIGH:
> > +        init_filter(ctx, ctx->filter_l0, ctx->filterbuf256, 64);
> > +        break;
> > +
> > +    case COMPRESSION_LEVEL_EXTRA_HIGH:
> > +        init_filter(ctx, ctx->filter_l1, ctx->filterbuf256, 256);
> > +        init_filter(ctx, ctx->filter_l0, ctx->filterbuf32, 32);
> > +        break;
> > +
> > +    case COMPRESSION_LEVEL_INSANE:
> > +        init_filter(ctx, ctx->filter_l2, ctx->filterbuf1280, 1280);
> > +        init_filter(ctx, ctx->filter_l1, ctx->filterbuf256, 256);
> > +        init_filter(ctx, ctx->filter_l0, ctx->filterbuf32, 16);
> > +    }
> 
> orders[4][3]={ {16,0,0}, {64,0,0}, {32,256,0}, {16,256,1280}};
> 
> i= ctx->compression_level/1000;
> init_filter(ctx, ctx->filter[0], orders[i][0]);
> init_filter(ctx, ctx->filter[1], orders[i][1]);
> init_filter(ctx, ctx->filter[2], orders[i][2]);
> 
> and the filterbuf can be av_malloc()ed safing a few bytes in the context
> 
> 
> [...]
> > +static int ape_decode_frame(AVCodecContext * avctx,
> > +                            void *data, int *data_size,
> > +                            uint8_t * buf, int buf_size)
> > +{
> > +    APEContext *s = avctx->priv_data;
> > +    int16_t *samples = data;
> > +    int nblocks;
> > +    int i, n;
> > +    int blockstodecode;
> > +
> > +    if (buf_size == 0 && !s->samples) {
> > +        *data_size = 0;
> > +        return 0;
> > +    }
> > +
> > +    if(buf_size > 4){
> > +        s->data = av_realloc(s->data, (buf_size + 3) & ~3);
> > +        s->dsp.bswap_buf(s->data, buf, buf_size >> 2);
> > +        s->ptr = s->data;
> > +        s->data_end = s->data + buf_size;
> > +
> > +        nblocks = s->samples = bytestream_get_be32(&s->ptr);
> 
> > +        n =  bytestream_get_be32(&s->ptr);
> > +        s->ptr += n;
> 
> please check the validity of that

ok 
 
> > +
> > +        s->currentframeblocks = nblocks;
> > +        buf += 4;
> > +        if (!s->samples) {
> > +            *data_size = 0;
> > +            return buf_size;
> > +        }
> > +
> > +
> > +        memset(s->decoded0,  0, sizeof(s->decoded0));
> > +        memset(s->decoded1,  0, sizeof(s->decoded1));
> > +
> > +        /* Initialize the frame decoder */
> > +        init_frame_decoder(s);
> > +    }
> > +
> > +    nblocks = s->samples;
> > +    blockstodecode = FFMIN(BLOCKS_PER_LOOP, nblocks);
> > +
> > +    if ((s->channels == 1) || (s->frameflags & APE_FRAMECODE_PSEUDO_STEREO))
> > +        ape_unpack_mono(s, blockstodecode);
> > +    else
> > +        ape_unpack_stereo(s, blockstodecode);
> > +
> > +    for (i = 0; i < blockstodecode; i++) {
> > +        *samples++ = s->decoded0[i];
> > +        if(s->channels == 2)
> > +            *samples++ = s->decoded1[i];
> > +    }
> 
> you should check that there is enough space in samples before writing
> into it

will add check
 
> [...]
> -- 
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB




More information about the ffmpeg-devel mailing list