[FFmpeg-soc] AAC Encoding - Where we stand, what's left

Kostya kostya.shishkov at gmail.com
Wed Jul 8 06:37:14 CEST 2009


On Wed, Jul 08, 2009 at 12:05:50AM -0400, Alex Converse wrote:
> On Tue, Jul 7, 2009 at 4:45 AM, Kostya<kostya.shishkov at gmail.com> wrote:
> > On Mon, Jul 06, 2009 at 09:14:00PM -0400, Alex Converse wrote:
> >> I'd like to take a minute to discuss the status of the AAC encoder and
> >> where it is going.
> >>
> >> In SoC svn:
> >> --Applies cleanly to SVN HEAD
> >> --The most egregious of the artifacting is gone (sections being
> >> replaced by silence or having the wrong volume, etc.)
> >> --Lacks TNS
> >
> >> --Lacks multichannel support
> >
> > Ahem, I've added it long time ago.
> >
> 
> $ ./ffmpeg -i ../../Canyon-5.1-48khz-448kbit.ac3 canyon5.1.m4a
> FFmpeg version git-04fe5e6, Copyright (c) 2000-2009 Fabrice Bellard, et al.
>   configuration: --enable-gpl --disable-ffserver
>   libavutil     50. 3. 0 / 50. 3. 0
>   libavcodec    52.32. 0 / 52.32. 0
>   libavformat   52.36. 0 / 52.36. 0
>   libavdevice   52. 2. 0 / 52. 2. 0
>   libswscale     0. 7. 1 /  0. 7. 1
>   built on Jul  7 2009 23:49:58, gcc: 4.3.3
> Input #0, ac3, from '../../Canyon-5.1-48khz-448kbit.ac3':
>   Duration: 00:00:37.98, bitrate: 448 kb/s
>     Stream #0.0: Audio: ac3, 48000 Hz, 5.1, s16, 448 kb/s
> File 'canyon5.1.m4a' already exists. Overwrite ? [y/N] y
> Output #0, ipod, to 'canyon5.1.m4a':
>     Stream #0.0: Audio: aac, 48000 Hz, 5.1, s16, 64 kb/s
> Stream mapping:
>   Stream #0.0 -> #0.0
> Press [q] to stop encoding
> Segmentation fault

Something is broken in preprocessing. If you change line 482 in aacenc.c
like in this patch to disable IIR filtering, it will work:

Index: aacenc.c
===================================================================
--- aacenc.c	(revision 4653)
+++ aacenc.c	(working copy)
@@ -479,7 +479,7 @@
     if(s->last_frame)
         return 0;
     if(data){
-        if(!s->psypp){
+        if(1){
             memcpy(s->samples + 1024 * avctx->channels, data, 1024 * avctx->channels * sizeof(s->samples[0]));
         }else{
             start_ch = 0;
 
> >> --Lacks rate control
> >> --Lacks SBR
> >> --Produces illegal bitstreams by violating the maximum frame size
> >
> > This one could be fixed.
> >
> 
> Could be fixed but depends on rate control
> 
> >> --Below faac quality
> >> --Well below the quality of competitive encoders
> >>
> >> In my tree:
> >> --Ruggles' PARCOR
> >> --Rudimentary TNS support based on ISO 13818-7 Annex C
> >> --TNS coefficient compressor
> >> --Various performance opts
> >> --Different value for CLIPPED_ESCAPE (165113.5f * IQ)
> >> --Substantial rate control related re-factoring
> >> --Pseudo ABR rate control
> >> --Maximum frame size enforcement
> >
> >> --VBR rate control that forces comically high bitrate output.
> >
> > Heh, do you mean it's always maximum frame size?
> >
> 
> Not quite but many frames do saturate or get close.
> 
> >> TNS is not helpful at the moment. Sharp attacks are losing most of
> >> their power before we get to the TNS stage. I believe this is may be
> >> psy model related.
> >>
> >> To be frank, at this point it seems like it might be prudent for me to
> >> stop working on this and move to either replacing the
> >> non-redistributable parts of faac (to get something legal and faac
> >> quality) or improving the 3GPP code (to get something awesome but not
> >> distributable). At this point both code bases offer better quality and
> >> more features (including SBR support from 3GPP). Dsputil is awesome
> >> but developing this encoder inside ffmpeg is constricting to say the
> >> least.
> >
> > I'm stronly against it. It seems to me that it's easier to backport FAAC
> > psy model and codebook selection to our encoder to get comparative
> > output - IIRC non-LGPL parts of libfaac are exactly the basic stuff I
> > implemented.
> >
> 
> Needing replacement: bitstream.[ch], channels.c, filtbank.c,
> huffman.[ch], tns.[ch]
> Can be eliminated: backpred.[ch], ltp.[ch]

Let's see:
bitstream.[ch] - bitstream writing. Of course we have that.
channels.[ch] - something related to multichannel. We have that.
filbank.c - MDCT. We have that.
huffman.[ch] - codebook coding. We have our own methods for coding.
tns.[ch] - you have an alternative implementation.

What's left there:
aacquant.[ch] - we need to port it
backpred.[ch] - can be eliminated
fft.[ch] - we have one
frame.[ch] - nothing important
ltp.[ch] - can be eliminated
midside.[ch] - M/S coding. We have that already.
psychkni.[ch] - we need to port it
util.[ch] - useless

So, it's two or three useful files from whole bunch.

> --Alex


More information about the FFmpeg-soc mailing list