[FFmpeg-devel] [PATCH] G.722 decoder

Fri Aug 6 20:24:23 CEST 2010

Hi,

The G.722 decoder patch that Kenan Gillet took a far way through the 
review process last year seemed to have stalled, I'm trying to pick it up 
and get it committed.

On Fri, 10 Apr 2009, Michael Niedermayer wrote:

> On Thu, Apr 09, 2009 at 08:14:58AM -0700, Kenan Gillet wrote:
> [...]
> > + * @note For the 56000bps and 48000bps bitrates, the respectively 7 and 6 bits
> > + *       codeword might be packed, so unpacking might be needed either
> > + *       internally or as a separate parser.
> 
> do non packed cases exist anywhere in the wild?
> do packed cases exist anywhere in the wild?
> 
> it seems to me that filling 30% of the bits with 0 is not particlarely
> likely in a real world environment

According to the spec, the point of the lower bitrate versions is to be 
able to use 1 or 2 bits per byte as an auxillary data channel. The encoder 
always outputs the 64 kbps mode, but you can choose to overwrite the 
lowest two bits if you want to.

The decoder is able to cope with these bits being overwritten with random 
bits, but you get slightly better results if you tell the decoder that 
this has happened and so it should disregard those bits.

With that in mind, as far as I can interpret the spec, it's the lowest 
bits that should be skipped for those modes, but the source for this 
decoder seems to think otherwise. The implementation in spandsp (that 
has some common heritage with this code base, iirc, does things this 
way, too.

> > +    /**
> > +     * The band[0] and band[1] correspond respectively to the lower band and higher band.
> > +     */
> > +    struct G722Band {
> 
> > +        int16_t s_predictor;         ///< predictor output value
> 
> ok that makes sense
> 
> 
> > +        int32_t s_zero;              ///< zero section output signal
> 
> what is that?
> 
> 
> > +        int8_t  part_reconst_mem[2]; ///< partially reconstructed signal memory
> 
> and that?

Tried to clarify these doxy comments, to the best of my understanding of 
the spec.

> > +        int16_t prev_qtzd_reconst;   ///< previous quantized reconstructed signal
> 
> this is not true, this uses a different dequantization table in some cases
> i even wonder if this is a bug

No, it's explicitly stated that it should be done this way in the spec. I 
figure it's part of the "scalability" feature - the predictor updating is 
always done using the 4-bit dequantizer so that it evolves in the same way 
regardless of the mode it is decoded in. Each individual output sample is 
calculated using the more precise dequantizer if that is chosen, but the 
predictor is always updated using the lower precision dequantizer.

> 
> > +        int16_t pole_mem[2];         ///< second-order pole section coefficient buffer
> > +        int32_t diff_mem[6];         ///< quantizer difference signal memory
> > +        int16_t zero_mem[6];         ///< Seventh-order zero section coefficient buffer
> 
> > +        int16_t log_factor;          ///< delayed logarithmic quantizer factor
> 
> log what? log_e log_2 log_10 ?

log_2, clarified

> 
> > +        int16_t scale_factor;        ///< delayed quantizer scale factor
> > +    } band[2];
> > +} G722Context;
> > +
> > +
> > +static const int8_t sign_lookup[2] = { -1, 1 };
> > +
> > +static const int16_t ilb[32] = {
> > +  2048, 2093, 2139, 2186, 2233, 2282, 2332, 2383,
> > +  2435, 2489, 2543, 2599, 2656, 2714, 2774, 2834,
> > +  2896, 2960, 3025, 3091, 3158, 3228, 3298, 3371,
> > +  3444, 3520, 3597, 3676, 3756, 3838, 3922, 4008
> > +};
> 
> > +static const int16_t wh[2]   = { 798, -214 };
> 
> 2 letters is better than 1 ... well sometimes its not much better though

Tried to give this table a better name

> > +static const int16_t qm2[4]  = { -7408, -1616,  7408,   1616 };
> 
> > +/**
> > + * qm3[index] == wl[rl42[index]]
> > + */
> > +static const int16_t qm3[16] = {
> > +   -60, 3042, 1198, 538, 334, 172,  58, -30,
> > +  3042, 1198,  538, 334, 172,  58, -30, -60
> > +};
> > +static const int16_t qm4[16] = {
> > +      0, -20456, -12896,  -8968, -6288,  -4240,  -2584,  -1200,
> > +  20456,  12896,   8968,  6288,   4240,   2584,   1200,      0
> > +};
> 
> what is qm?
> i know these are dequantization tables, and at that poorly designed ones
> (yeah the duplicate entries)
> but i cant relate "qm" to "dequantization"

Tried to give better names to these tables, too

> > +
> > +/**
> > + * quadrature mirror filter (QMF) coefficients
> > + *
> > + * ITU-T G.722 Table 11
> > + */
> > +static const int16_t qmf_coeffs[12] = {
> > +  3, -11, 12, 32, -210, 951, 3876, -805, 362, -156, 53, -11,
> > +};
> > +
> > +
> 
> > +/**
> > + * adaptive predictor
> > + *
> > + * @note On x86 using the MULL macro in a loop is slower than not using the macro.
> > + */
> > +static void do_adaptive_prediction(struct G722Band *band, const int cur_diff)
> 
> missing doxy for cur_diff

Added some sort of doxy for this parameter

> [...]
> > +static int inline scale(const int log_factor, int shift)
> > +{
> > +    const int wd1 = ilb[(log_factor >> 6) & 31];
> > +    shift -= log_factor >> 11;
> > +    return (shift < 0 ? wd1 << -shift : wd1 >> shift) << 2;
> > +}
> 
> i belive we already have some integer exp2()

Didn't find any common method to do shifts where the shift can be 
negative - only found av_shr_i in libavutil/integer.h, that works on 
arbitrary precision integers.

> also this one lacks documentation and a name that is related to what it 
> does if there is a reason why common code cannot be used

Renamed it to a slightly more descriptive name

> [...]
> > +static void apply_qmf(int16_t *prev_samples, int *xout1, int *xout2)
> > +{
> > +    int i;
> > +
> > +    *xout1 = 0;
> > +    *xout2 = 0;
> > +    for (i = 0;  i < 12;  i++) {
> > +        MAC16(*xout2, prev_samples[2*i  ], qmf_coeffs[i   ]);
> > +        MAC16(*xout1, prev_samples[2*i+1], qmf_coeffs[11-i]);
> > +    }
> 
> > +    memmove(prev_samples, prev_samples + 2, 22*sizeof(prev_samples[0]));
> 
> please find a solution that does not need to move the array by 2 samples
> after each 2 samples.

Changed to use a longer internal buffer for these samples so that memmoves 
are done much more seldom (every 1024 samples).

> [...]
> > +static int g722_decode_frame(AVCodecContext *avctx, void *data, int *data_size, AVPacket *avpkt)
> > +{
> > +    G722Context *c = avctx->priv_data;
> > +    int16_t *out_buf = data;
> > +    const uint8_t *buf = avpkt->data;
> > +    int j, out_len = 0;
> > +    const int shift = 8 - c->bits_per_sample;
> > +    const int16_t *quantizer_table = qms[shift];
> > +
> > +    for (j = 0;  j < avpkt->size; j++) {
> > +        const int ilow = buf[j] & (0x3F >> shift);
> > +        const int rlow = av_clip(MULL(c->band[0].scale_factor, quantizer_table[ilow], 15) +
> > +                                 c->band[0].s_predictor, -16384, 16383);
> > +
> > +        update_low_predictor(&c->band[0], ilow >> (2 - shift));
> > +
> > +        if (avctx->sample_rate == 16000) {
> > +            const int ihigh = (buf[j] >> (6 - shift)) &  0x03;
> 
> all that looks a litte obfuscated, get_bits() seems like the proper choice

Changed to use get_bits

// Martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Initial-G.722-decoder-patch.patch
Type: text/x-diff
Size: 15355 bytes
Desc: 
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100806/9df5fe52/attachment.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-G.722-muxer-demuxer.patch
Type: text/x-diff
Size: 1729 bytes
Desc: 
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100806/9df5fe52/attachment-0001.patch>