[FFmpeg-devel] [RFC] Generic psychoacoustic model interface

Michael Niedermayer michaelni
Thu Aug 28 22:36:57 CEST 2008


On Thu, Aug 28, 2008 at 08:10:26PM +0300, Kostya wrote:
> On Wed, Aug 27, 2008 at 04:33:17PM +0200, Michael Niedermayer wrote:
> > On Wed, Aug 27, 2008 at 11:35:20AM +0300, Kostya wrote:
> > > Here's my first attempt to define codec-agnostic psy model.
> > > Here's an interface for it. I'm not sure about AC3, but
> > > it should be possible to use it with DCA, Vorbis,
> > > MPEG Audio Layers I-III and NBC, maybe WMA too.
> > > In case somebody codes an implementation, of course.
> > > Personally I plan to make my encoder use it backed with
> > > already implemented 3GPP model.
> > 
> > [...]
> > > /**
> > >  * windowing related information
> > >  */
> > > typedef struct FFWindowInfo{
> > >     int window_type[2];               ///< window type (short/long/transitional, etc.) - current and previous
> > >     int window_shape;                 ///< window shape (sine/KBD/whatever)
> > 
> > >     void *additional_info;            ///< codec-dependent window information
> > 
> > passing opaque data from psy to encoder is not clean, it requires
> > both to maintain a "hidden" compatible API
>  
> Of course, unless we can decide on what will be needed for all encoders. 

whenever a encoder needs somethig that isnt there it can be added.


[...]
> /*
>  * audio encoder psychoacoustic model
>  * Copyright (C) 2008 Konstantin Shishkov
>  *
>  * This file is part of FFmpeg.
>  *
>  * FFmpeg is free software; you can redistribute it and/or
>  * modify it under the terms of the GNU Lesser General Public
>  * License as published by the Free Software Foundation; either
>  * version 2.1 of the License, or (at your option) any later version.
>  *
>  * FFmpeg is distributed in the hope that it will be useful,
>  * but WITHOUT ANY WARRANTY; without even the implied warranty of
>  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>  * Lesser General Public License for more details.
>  *
>  * You should have received a copy of the GNU Lesser General Public
>  * License along with FFmpeg; if not, write to the Free Software
>  * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
>  */
> 
> #ifndef FFMPEG_AACPSY_H
> #define FFMPEG_AACPSY_H
> 
> #include "avcodec.h"
> 
> /** maximum possible number of bands */
> #define MAX_BANDS 128

ok

[...]
> /**
>  * windowing related information
>  */
> typedef struct FFWindowInfo{

>     int window_type[2];               ///< window type (short/long/transitional, etc.) - current and previous

How is this "transitional" going to work with many different frame lengths?
is there 1? N*N ?


>     int window_shape;                 ///< window shape (sine/KBD/whatever)
>     void *additional_info;            ///< codec-dependent window information, should be consistent between encoder and psy model
> }FFWindowInfo;
> 

> /**
>  * context used by psychoacoustic model
>  */
> typedef struct FFPsyContext{
>     AVCodecContext *avctx;            ///< encoder context
> 
>     FFPsyBand bands[MAX_BANDS];       ///< frame bands information
>     FFWindowInfo *win_info;           ///< frame window info
> 

>     const uint8_t *bands;             ///< scalefactor band sizes for possible fram sizes

fram?


>     const int     *num_bands;         ///< number of scalefactor bands for possible frame sizes
>     const uint8_t *short_bands;       ///< scalefactor band sizes for short frame
>     int num_short_bands;              ///< number of scalefactor bands for short frame

this looks a little odd and inconsistant, why this special short_bands?




> 
>     void* model_priv_data;            ///< psychoacoustic model implementation private data
> }FFPsyContext;
> 

> /**
>  * Initialize psychoacoustic model.
>  *
>  * @param ctx        model context
>  * @param avctx      codec context
>  * @param bands      scalefactor band lengths for all frame lengths
>  * @param num_bands  number of scalefactor bands for all frame lengths
>  *
>  * @return zero if successful, a negative value if not
>  */
> int ff_psy_init(FFPsyContext *ctx, AVCodecContext *avctx,
>                 const uint8_t **bands, const int* num_bands);

isnt that missing a the number of entries in num_bands?


> 
> /**
>  * Suggest window sequence for channel.
>  *
>  * @param ctx       model context
>  * @param audio     samples for the current frame
>  * @param la        lookahead samples (NULL when unavailable)
>  * @param channel   number of channel element to analyze
>  * @param prev_type previous window type
>  *
>  * @return suggested window information in a structure
>  */
> FFWindowInfo ff_psy_suggest_window(FFPsyContext *ctx,
>                                    const int16_t *audio, const int16_t *la,
>                                    int channel, int prev_type);
> 

> /**
>  * Get psychoacoustic model suggestion about coding two bands as M/S
>  */
> enum FFPsyMSDecision ff_psy_suggest_ms(FFPsyContext *ctx, FFPsyBand *left, FFPsyBand *right);

iam a little unsure about this one, but iam not objecting ...


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I count him braver who overcomes his desires than him who conquers his
enemies for the hardest victory is over self. -- Aristotle
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080828/936dc1d4/attachment.pgp>



More information about the ffmpeg-devel mailing list