[FFmpeg-devel] [RFC] AAC Encoder

Michael Niedermayer michaelni
Thu Aug 14 23:42:44 CEST 2008


On Thu, Aug 14, 2008 at 08:48:42AM +0300, Kostya wrote:
> On Wed, Aug 13, 2008 at 04:44:18PM +0200, Michael Niedermayer wrote:
> > On Wed, Aug 13, 2008 at 04:42:56PM +0300, Kostya wrote:
> > > On Wed, Aug 13, 2008 at 02:57:50PM +0200, Michael Niedermayer wrote:
> > [...]
> > > 
> > > > > 3. based on psy model suggestions, encoder performs windowing and MDCT
> > > > 
> > > > ok
> > > > 
> > > > 
> > > > > 4. encoder feeds coefficients to psy model
> > > > > 5. psy model by some magic determines scalefactors and use them to convert
> > > > > coefficients into integer form
> > > > > 6. encoder encodes obtained scalefactors and integer coefficients
> > > > > 
> > > > > There are 11 codebooks for AAC, each designed to code either pairs or quads
> > > > > of values with sign coded separately or incorporated into value,
> > > > > each has a maximum value limit.
> > > > > While it's feasible to find the best encoding (like take raw coeff, quantize
> > > > > it and round up or down, then see which vector takes less bits), I feel
> > > > > it would be too slow.
> > > > 
> > > > thats fine, you already have the fast variant implemented i do not suggest
> > > > that to be removed, what we need is a high quality variant. The encoder should
> > > > be better than other encoders ...
> > > > Also as the max value you mentioned is another example of where your code
> > > > fails fatally, a single +3 that would sound nearly as good when encoded as +2
> > > > could force a less efficient code book to be choosen. Also the +3 could be
> > > > encoded as a pulse, i dont remember if your code optimally choose between
> > > > pulse and normal codebook encodings?
> > > 
> > > not optimally, unfortunately, but it can search for pulses and encode them
> > > 
> > > in any case, here's a new encoder version
> > 
> > please commit the parts ive ok-ed and/or send a patch without them
> 
> done (there were okayed parts only in aacenc.c)
[...]

psy model review below

> /*
>  * AAC encoder psychoacoustic model
>  * Copyright (C) 2008 Konstantin Shishkov
>  *
>  * This file is part of FFmpeg.
>  *
>  * FFmpeg is free software; you can redistribute it and/or
>  * modify it under the terms of the GNU Lesser General Public
>  * License as published by the Free Software Foundation; either
>  * version 2.1 of the License, or (at your option) any later version.
>  *
>  * FFmpeg is distributed in the hope that it will be useful,
>  * but WITHOUT ANY WARRANTY; without even the implied warranty of
>  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>  * Lesser General Public License for more details.
>  *
>  * You should have received a copy of the GNU Lesser General Public
>  * License along with FFmpeg; if not, write to the Free Software
>  * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
>  */
> 
> #ifndef FFMPEG_AACPSY_H
> #define FFMPEG_AACPSY_H
> 
> #include "avcodec.h"
> #include "aac.h"
> #include "lowpass.h"
> 
> enum AACPsyModelType{
>     AAC_PSY_NULL,              ///< do nothing with frequencies
>     AAC_PSY_NULL8,             ///< do nothing with frequencies but work with short windows
>     AAC_PSY_3GPP,              ///< model following recommendations from 3GPP TS 26.403
> 
>     AAC_NB_PSY_MODELS          ///< total number of psychoacoustic models, since it's not a part of the ABI new models can be added freely
> };

ok


> 
> enum AACPsyModelMode{
>     PSY_MODE_CBR,              ///< follow bitrate as closely as possible
>     PSY_MODE_ABR,              ///< try to achieve bitrate but actual bitrate may differ significantly
>     PSY_MODE_QUALITY,          ///< try to achieve set quality instead of bitrate
> };
> 
> #define PSY_MODEL_MODE_MASK  0x0000000F ///< bit fields for storing mode (CBR, ABR, VBR)

please use bitrate tolterance/bitrate/max/min bitrate/buffer size/...
from AVCodecContext for selecting the mode


> #define PSY_MODEL_NO_PULSE   0x00000010 ///< disable pulse searching
> #define PSY_MODEL_NO_SWITCH  0x00000020 ///< disable window switching
> #define PSY_MODEL_NO_ST_ATT  0x00000040 ///< disable stereo attenuation
> #define PSY_MODEL_NO_LOWPASS 0x00000080 ///< disable low-pass filtering

How does the user pass these to the codec?
I suspect in AVCodecContext, if so above would be redundant and unneeded
as AVCodecContext is availabe to the psy model

also i think that the choice of how encode a coefficient, that is as a
pulse or not is not a psychoacoustic question but one of entropy coding.
"which way needs fewer bits has better RD"


> 
> #define PSY_MODEL_NO_PREPROC (PSY_MODEL_NO_ST_ATT | PSY_MODEL_NO_LOWPASS)
> 
> #define PSY_MODEL_MODE(a)  ((a) & PSY_MODEL_MODE_MASK)
> 
> /**
>  * context used by psychoacoustic model
>  */
> typedef struct AACPsyContext {
>     AVCodecContext *avctx;            ///< encoder context
> 
>     int flags;                        ///< model flags

>     const uint8_t *bands1024;         ///< scalefactor band sizes for long (1024 samples) frame
>     int num_bands1024;                ///< number of scalefactor bands for long frame
>     const uint8_t *bands128;          ///< scalefactor band sizes for short (128 samples) frame
>     int num_bands128;                 ///< number of scalefactor bands for short frame

This is a little AAC specific but then its called AACPsyContext
so iam not sure. Is the code supposed to be a generic psychoacoustic model
or AAC specific?

[...]
> /**
>  * Convert coefficients to integers.
>  * @return sum of coefficients
>  * @see 3GPP TS26.403 5.6.2 "Scalefactor determination"
>  */
> static inline int convert_coeffs(float *in, int *out, int size, int scale_idx)

quantize_coeffs
and scale_idx should be replaced by a quantization factor.


> {
>     int i, sign, sum = 0;
>     for(i = 0; i < size; i++){
>         sign = in[i] > 0.0;
>         out[i] = (int)(pow(FFABS(in[i]) * ff_aac_pow2sf_tab[200 - scale_idx + SCALE_ONE_POS - SCALE_DIV_512], 0.75) + 0.4054);

fabs()


>         out[i] = av_clip(out[i], 0, 8191);
>         sum += out[i];
>         if(sign) out[i] = -out[i];
>     }
>     return sum;
> }



> 
> static inline float unquant(int q, int scale_idx){
>     return (FFABS(q) * cbrt(q*1.0)) * ff_aac_pow2sf_tab[200 + scale_idx - SCALE_ONE_POS];
> }

also please replace scale_idx by a factor, repeatly doing these lookups is
likely inefficient, also it is unflexible in relation to non aac


> static inline float calc_distortion(float *c, int size, int scale_idx)
> {
>     int i;
>     int q;
>     float coef, unquant, sum = 0.0f;
>     for(i = 0; i < size; i++){
>         coef = FFABS(c[i]);
>         q = (int)(pow(FFABS(coef) * ff_aac_pow2sf_tab[200 - scale_idx + SCALE_ONE_POS - SCALE_DIV_512], 0.75) + 0.4054);
>         q = av_clip(q, 0, 8191);
>         unquant = (q * cbrt(q)) * ff_aac_pow2sf_tab[200 + scale_idx - SCALE_ONE_POS + SCALE_DIV_512];
>         sum += (coef - unquant) * (coef - unquant);
>     }
>     return sum;
> }

I think this and previous functions have some common code that can be
factorized out


[...]
> static void psy_null8_process(AACPsyContext *apc, int tag, int type, ChannelElement *cpe)
> {
>     int start;
>     int w, ch, g, i;
>     int chans = type == ID_CPE ? 2 : 1;
> 
>     //detect M/S
>     if(chans > 1 && cpe->common_window){
>         start = 0;
>         for(w = 0; w < cpe->ch[0].ics.num_windows; w++){
>             for(g = 0; g < cpe->ch[0].ics.num_swb; g++){
>                 float diff = 0.0f;
> 
>                 for(i = 0; i < cpe->ch[0].ics.swb_sizes[g]; i++)
>                     diff += fabs(cpe->ch[0].coeffs[start+i] - cpe->ch[1].coeffs[start+i]);
>                 cpe->ms.mask[w][g] = diff == 0.0;
>             }
>         }
>     }

the mid side bits should also be detected ideally by encoding both ways
and choosing by rate distortion

above really looks a little lame, one should at least calculate either
bits or distortion and choose based on that if both are not ...


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

it is not once nor twice but times without number that the same ideas make
their appearance in the world. -- Aristotle
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080814/dae05de3/attachment.pgp>



More information about the ffmpeg-devel mailing list