[FFmpeg-devel] [RFC] Generic psychoacoustic model interface

Michael Niedermayer michaelni
Sat Aug 30 16:51:10 CEST 2008


On Sat, Aug 30, 2008 at 01:21:54PM +0300, Kostya wrote:
> On Thu, Aug 28, 2008 at 10:36:57PM +0200, Michael Niedermayer wrote:
> > On Thu, Aug 28, 2008 at 08:10:26PM +0300, Kostya wrote:
> [...]
> > > /**
> > >  * windowing related information
> > >  */
> > > typedef struct FFWindowInfo{
> > 
> > >     int window_type[2];               ///< window type (short/long/transitional, etc.) - current and previous
> > 
> > How is this "transitional" going to work with many different frame lengths?
> > is there 1? N*N ?
>  
> that's for AAC (i.e. requires a bit of different windowing),
> encoder will set that to internal value

I think the psy model should not bother with what a specific format may or
may not do or need.
There are short blocks, and there are long blocks in AAC, furthermore AAC
is restricted to have short blocks in consecutive multiplies of 8. Other
codecs do not have such restrictions.
Also if AAC needs to specially mark long blocks before and after short
ones that is the problem of the AAC encoder, not the psy model.
The window shape of a block surely depends on the next and previous block,
that is not AAC specific.



> 
> [...] 
> > > /**
> > >  * Get psychoacoustic model suggestion about coding two bands as M/S
> > >  */
> > > enum FFPsyMSDecision ff_psy_suggest_ms(FFPsyContext *ctx, FFPsyBand *left, FFPsyBand *right);
> > 
> > iam a little unsure about this one, but iam not objecting ...
>  
> dropped for now, may revive later
> 
> Here's another draft - it's psychoacoustic model interface with
> partial implementation (there are some inaccuracies and debugs there,
> but's this is RFC, not a final patch).
> 
> I plan to use it this way with my encoder.
> 
> General flow:
> 

> init
> while(frame){
>   suggest window()
>   [encoder may ignore that]
>   set band info() = calculate thresholds for all bands with provided window type

so far i have no objections


>   psy analyze() = get distortions and weight for band quantized with a series of
>                   quantizers, my encoder will use that for RD-aware quantization

the distortion is only known after the RD "aware" quantization, the weight
is needed before RD "aware" quantization, so iam somewhat confused by what
you suggest

> }
> 

[...]
> /**
>  * single band psychoacoustic information
>  */
> typedef struct FFPsyBand{
>     int   bits;
>     float energy;
>     float threshold;
>     float distortion;
>     float perceptual_weight;
> }FFPsyBand;

It should be possible to provide perceptual_weight per coefficient instead
of per band in the future.


[...]
> #ifdef ENABLE_AAC_ENCODER
> #include "aac.h"
> #include "aactab.h"
> 
> /**
>  * Quantize one coefficient.
>  * @return absolute value of the quantized coefficient
>  * @see 3GPP TS26.403 5.6.2 "Scalefactor determination"
>  */
> static av_always_inline int quant(float coef, const float Q)
> {
>     return av_clip((int)(pow(fabsf(coef) * Q, 0.75) + 0.4054), 0, 8191);
> }
> 
> static inline float psy_aac_get_approximate_quant_error(const float *c, int size,
>                                                         const float Q, const float IQ)
> {

I would prefer if the psy model is not full of #if AAC or if(aac)



>     int i;
>     int q;
>     float coef, unquant, sum = 0.0f;
>     for(i = 0; i < size; i++){
>         coef = fabs(c[i]);
>         q = quant(c[i], Q);
>         unquant = (q * cbrt(q)) * IQ;
>         sum += (coef - unquant) * (coef - unquant);
>     }
>     return sum * 1.0 / 512.0;
> }
> 

> //XXX: stub
> static inline int psy_aac_get_approximate_bits(const float *c, int size, const float Q)
> {
>     int i, bits = 0;
>     for(i = 0; i < size; i += 2){
>         int idx = 0, j, q;
>         for(j = 0; j < 2; j++){
>             q = quant(c[i+j], Q);
>             q = FFABS(q);
>             if(q)
>                 bits++;
>             if(q > 16)
>                 bits += av_log2(q)*2 - 4 + 1;
>             idx = idx*17 + FFMIN(q, 16);
>         }
>         bits += ff_aac_spectral_bits[10][idx];
>     }
>     return bits;
> }

this does not belong in the psy model.
Different numbers of bits do not sound differently,
besides format specific things could be callbacks if they are needed

[...]

> /**
>  * Calculate Bark value for given line.
>  */
> static inline float calc_bark(float f)
> {
>     return 13.3f * atanf(0.00076f * f) + 3.5f * atanf((f / 7500.0f) * (f / 7500.0f));
> }

this is not speed critical rather the oppossite, it should be av_cold
it is used only during init


> 
> #define ATH_ADD 4
> /**
>  * Calculate ATH value for given frequency.
>  * Borrowed from Lame.
>  */
> static inline float ath(float f, float add)
> {
>     f /= 1000.0f;
>     return   3.64 * pow(f, -0.8)
>             - 6.8  * exp(-0.6  * (f - 3.4) * (f - 3.4))
>             + 6.0  * exp(-0.15 * (f - 8.7) * (f - 8.7))
>             + (0.6 + 0.04 * add) * 0.001 * f * f * f * f;
> }

same


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

It is dangerous to be right in matters on which the established authorities
are wrong. -- Voltaire
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080830/94dffdca/attachment.pgp>



More information about the ffmpeg-devel mailing list