[FFmpeg-soc] AAC Encoding - Where we stand, what's left

Wed Jul 8 07:12:20 CEST 2009

On Wed, Jul 8, 2009 at 12:37 AM, Kostya<kostya.shishkov at gmail.com> wrote:
> On Wed, Jul 08, 2009 at 12:05:50AM -0400, Alex Converse wrote:
>> On Tue, Jul 7, 2009 at 4:45 AM, Kostya<kostya.shishkov at gmail.com> wrote:
>> > On Mon, Jul 06, 2009 at 09:14:00PM -0400, Alex Converse wrote:
>> >> I'd like to take a minute to discuss the status of the AAC encoder and
>> >> where it is going.
>> >>
>> >> In SoC svn:
>> >> --Applies cleanly to SVN HEAD
>> >> --The most egregious of the artifacting is gone (sections being
>> >> replaced by silence or having the wrong volume, etc.)
>> >> --Lacks TNS
>> >
>> >> --Lacks multichannel support
>> >
>> > Ahem, I've added it long time ago.
>> >
>>
>> $ ./ffmpeg -i ../../Canyon-5.1-48khz-448kbit.ac3 canyon5.1.m4a
>> FFmpeg version git-04fe5e6, Copyright (c) 2000-2009 Fabrice Bellard, et al.
>>   configuration: --enable-gpl --disable-ffserver
>>   libavutil     50. 3. 0 / 50. 3. 0
>>   libavcodec    52.32. 0 / 52.32. 0
>>   libavformat   52.36. 0 / 52.36. 0
>>   libavdevice   52. 2. 0 / 52. 2. 0
>>   libswscale     0. 7. 1 /  0. 7. 1
>>   built on Jul  7 2009 23:49:58, gcc: 4.3.3
>> Input #0, ac3, from '../../Canyon-5.1-48khz-448kbit.ac3':
>>   Duration: 00:00:37.98, bitrate: 448 kb/s
>>     Stream #0.0: Audio: ac3, 48000 Hz, 5.1, s16, 448 kb/s
>> File 'canyon5.1.m4a' already exists. Overwrite ? [y/N] y
>> Output #0, ipod, to 'canyon5.1.m4a':
>>     Stream #0.0: Audio: aac, 48000 Hz, 5.1, s16, 64 kb/s
>> Stream mapping:
>>   Stream #0.0 -> #0.0
>> Press [q] to stop encoding
>> Segmentation fault
>
> Something is broken in preprocessing. If you change line 482 in aacenc.c
> like in this patch to disable IIR filtering, it will work:
>
> Index: aacenc.c
> ===================================================================
> --- aacenc.c    (revision 4653)
> +++ aacenc.c    (working copy)
> @@ -479,7 +479,7 @@
>     if(s->last_frame)
>         return 0;
>     if(data){
> -        if(!s->psypp){
> +        if(1){
>             memcpy(s->samples + 1024 * avctx->channels, data, 1024 * avctx->channels * sizeof(s->samples[0]));
>         }else{
>             start_ch = 0;
>

$ ./ffmpeg_g -i ../../Canyon-5.1-48khz-448kbit.ac3 out.aac
...
$ mp4audec_mc out.aac out.wav 2>&1 | grep WARN | sort | uniq
WARNING: only long windows are allowed in LFEs (winseq=1)
WARNING: only long windows are allowed in LFEs (winseq=2)
WARNING: only long windows are allowed in LFEs (winseq=3)
WARNING: only sine shaped windows are allowed in LFEs (winshape=1)

>> >> --Lacks rate control
>> >> --Lacks SBR
>> >> --Produces illegal bitstreams by violating the maximum frame size
>> >
>> > This one could be fixed.
>> >
>>
>> Could be fixed but depends on rate control
>>
>> >> --Below faac quality
>> >> --Well below the quality of competitive encoders
>> >>
>> >> In my tree:
>> >> --Ruggles' PARCOR
>> >> --Rudimentary TNS support based on ISO 13818-7 Annex C
>> >> --TNS coefficient compressor
>> >> --Various performance opts
>> >> --Different value for CLIPPED_ESCAPE (165113.5f * IQ)
>> >> --Substantial rate control related re-factoring
>> >> --Pseudo ABR rate control
>> >> --Maximum frame size enforcement
>> >
>> >> --VBR rate control that forces comically high bitrate output.
>> >
>> > Heh, do you mean it's always maximum frame size?
>> >
>>
>> Not quite but many frames do saturate or get close.
>>
>> >> TNS is not helpful at the moment. Sharp attacks are losing most of
>> >> their power before we get to the TNS stage. I believe this is may be
>> >> psy model related.
>> >>
>> >> To be frank, at this point it seems like it might be prudent for me to
>> >> stop working on this and move to either replacing the
>> >> non-redistributable parts of faac (to get something legal and faac
>> >> quality) or improving the 3GPP code (to get something awesome but not
>> >> distributable). At this point both code bases offer better quality and
>> >> more features (including SBR support from 3GPP). Dsputil is awesome
>> >> but developing this encoder inside ffmpeg is constricting to say the
>> >> least.
>> >
>> > I'm stronly against it. It seems to me that it's easier to backport FAAC
>> > psy model and codebook selection to our encoder to get comparative
>> > output - IIRC non-LGPL parts of libfaac are exactly the basic stuff I
>> > implemented.
>> >
>>
>> Needing replacement: bitstream.[ch], channels.c, filtbank.c,
>> huffman.[ch], tns.[ch]
>> Can be eliminated: backpred.[ch], ltp.[ch]
>
> Let's see:
> bitstream.[ch] - bitstream writing. Of course we have that.
> channels.[ch] - something related to multichannel. We have that.
> filbank.c - MDCT. We have that.
> huffman.[ch] - codebook coding. We have our own methods for coding.
> tns.[ch] - you have an alternative implementation.
>
> What's left there:
> aacquant.[ch] - we need to port it
> backpred.[ch] - can be eliminated
> fft.[ch] - we have one
> frame.[ch] - nothing important
> ltp.[ch] - can be eliminated
> midside.[ch] - M/S coding. We have that already.

Does our mid/side work? It's not turned on.

> psychkni.[ch] - we need to port it
> util.[ch] - useless
>
> So, it's two or three useful files from whole bunch.
>

One other big thing: Improvements to faac helps many people right now.
People do (sadly) use faac for things. Improving the lavc encoder
really only helps you and me until it's fit to merge, and who knows
when that could be.

--Alex