[FFmpeg-devel] [RFC] Generic psychoacoustic model interface

Michael Niedermayer michaelni
Wed Aug 27 16:33:17 CEST 2008


On Wed, Aug 27, 2008 at 11:35:20AM +0300, Kostya wrote:
> Here's my first attempt to define codec-agnostic psy model.
> Here's an interface for it. I'm not sure about AC3, but
> it should be possible to use it with DCA, Vorbis,
> MPEG Audio Layers I-III and NBC, maybe WMA too.
> In case somebody codes an implementation, of course.
> Personally I plan to make my encoder use it backed with
> already implemented 3GPP model.

[...]
> /**
>  * windowing related information
>  */
> typedef struct FFWindowInfo{
>     int window_type[2];               ///< window type (short/long/transitional, etc.) - current and previous
>     int window_shape;                 ///< window shape (sine/KBD/whatever)

>     void *additional_info;            ///< codec-dependent window information

passing opaque data from psy to encoder is not clean, it requires
both to maintain a "hidden" compatible API



> }FFWindowInfo;
> 
> /**
>  * context used by psychoacoustic model
>  */
> typedef struct FFPsyContext{
>     AVCodecContext *avctx;            ///< encoder context
> 
>     FFPsyBand bands[MAX_BANDS];       ///< frame bands information
>     FFWindowInfo *win_info;           ///< frame window info
> 

>     const uint8_t *long_bands;        ///< scalefactor band sizes for long frame
>     int num_long_bands;               ///< number of scalefactor bands for long frame
>     const uint8_t *short_bands;       ///< scalefactor band sizes for short frame
>     int num_short_bands;              ///< number of scalefactor bands for short frame

Having only 2 band lists would be a problem for any codec that has more
than 2 window lengths (like wma)


[...]
> /**
>  * Suggest window sequence for channel.
>  *
>  * @param ctx       model context
>  * @param audio     samples for the current frame
>  * @param la        lookahead samples (NULL when unavailable)
>  * @param channel   number of channel element to analyze
>  * @param prev_type previous window type
>  *
>  * @return suggested window information in a structure
>  */
> FFWindowInfo* ff_psy_suggest_window(AACPsyContext *ctx, int16_t *audio, int16_t *la,
>                                     int channel, int prev_type);

...get/find/calculate_suggested...
audio&la should be const

and maybe the return should be FFWindowInfo instead of FFWindowInfo* to 
avoid memleak issues ...


> 
> /**
>  * Perform psychoacoustic analysis and set band info.
>  *
>  * @param ctx   model context
>  * @param tag   number of channel element to analyze
>  * @param type  channel element type (e.g. ID_SCE or ID_CPE)
>  * @param cpe   pointer to the current channel element
>  */
> void ff_psy_analyze(AACPsyContext *ctx, int tag, int type, ChannelElement *cpe);

ChannelElement is AAC specific


[...]
> /**
>  * Preprocess several channel in audio frame in order to compress it better.
>  *
>  * @param ctx      preprocessing context
>  * @param audio    samples to preprocess
>  * @param dest     place to put filtered samples
>  * @param tag      number of channel group
>  * @param channels number of channel to preprocess (some additional work may be done on stereo pair)
>  */
> void ff_aac_psy_preprocess(struct FFPsyPreprocessContext *ctx, int16_t *audio, int16_t *dest, int tag, int channels);

audio is missing a const

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

There will always be a question for which you do not know the correct awnser.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080827/f73e4036/attachment.pgp>



More information about the ffmpeg-devel mailing list