[FFmpeg-trac] #8066(avcodec:open): Bad quality encoding of high compressed audio by AAC encoder

FFmpeg trac at avcodec.org
Tue May 5 18:13:59 EEST 2020

#8066: Bad quality encoding of high compressed audio by AAC encoder
             Reporter:  Lirk        |                    Owner:  Lynne
                 Type:  defect      |                   Status:  open
             Priority:  normal      |                Component:  avcodec
              Version:  git-master  |               Resolution:
             Keywords:  aac         |               Blocked By:
             Blocking:              |  Reproduced by developer:  1
Analyzed by developer:  1           |

Comment (by Lynne):

 I didn't say a cascaded encode wasn't useful or relevant to any scenario,
 I pointed out it stresses a single component of the encoder, and that
 specific part hasn't been touched since at least 1996 (its a line-by-line
 copy of the 3gpp example encoder aac spec). As for why it got so far: no
 one really noticed until the encoder started getting used by OBS, which
 took 3-4 years after most of the work on the encoder was done.

 I'm working on a replacement psychoacoustic system for both the opus
 encoder and the aac encoder, but I need raw samples at both 44.1 and 48khz
 that display artifacts. Obviously, the higher the bitrate and the lower
 the amount of cascading needed, the better. I'd prefer 48khz (could
 someone tell OBS people to make that the default, most streams use 44.1
 which, aside from the obvious reasons, doesn't produce nice Bark bands and
 makes writing a psychoacoustic model less general and more samplerate-

 Having said that, I would like to point out that, in the real world, where
 mixing and sub-frame-sample-offsets happen, sterile cascading tests could
 potentially give highly misleading results, especially with good encoders
 like libopus.
 The performance of cascading encoders depends highly on whether each
 decoded frame is given sample-aligned to the encoder. Even a small
 alignment difference for each successive encode can ruin the result. For
 example, if a transient AAC frame (1024 samples, split into 8 smaller
 transforms) is given to an encoder with a 64-sample offset, the block
 boundary of each smaller decoded transform, where most MDCT codec
 artifacts happen, will be in the middle of the encoder's frame. Which, if
 it decides to encode as a transient (very possible, given the artifacts
 increase the energy) will produce very annoyinh results after no more than
 5-6 encodes, regardless of the bitrate.
 Coincidences like that happen and are somewhat out of your control, unless
 you like to inject discontinuities and latency into your stream and assume
 frame sizes.
 As for Opus, it uses 120-sample overlaps on 960-sample frames, rather than
 the AAC's 512-samples at 1024 frames. With such a low amount of overlap
 (1/4 less compared to AAC), there are even higher artifacts at the frame
 boundary, even with non-transient frames (Opus too splits transient frames
 into 8 smaller transforms), and even worse, it does TF switching
 (recombines/uncombines smaller transforms) which is highly sensitives
 towards the signal (and artifacts), so Opus really benefits from
 "cheating". Thankfully, some of this can be kept under control due to its
 lossless signalling of band energy levels (so low-frequency artifacts can
 overwhelm the signal acceptably).
 In conclusion, while cascading does give you a good idea of how the
 encoder deals with codec artifacts, don't assume it won't spazz out on you
 in the field. Not saying it isn't useful, just not very useful for that
 exact case where your frames aren't aligned.

 ffmpeg is hardly an organized, focused, single-entity project where each
 contributor forms a part of a swarm mind and can work and judge anything,
 or is responsible for the actions of another. You shouldn't take some
 random contributor's word for much, let alone a bug tracker janitor. I
 don't even read the bug tracker unless something in the title strikes me
 as relevant to my field and I by chance read it.
 Certainly, while people who do research on encoders for fun (and for free)
 are few and far between, everyone knows everyone in this field, so you
 only needed to ask literally anywhere (especially on freenode) to find the
 most appropriate person who would pay attention to it, rather than do an
 angry broadcast and hope someone listens.
 As it turns out, some motivated people did exactly that, and somehow I got
 very unwelcome and demotivating private messages. While perhaps such
 things are not entirely responsible for ffmpeg developers' overall
 reputation of being an unwelcoming, they '''really''' don't help.
 Especially when it becomes a personal attack like now, since there's
 really only a single person who would do this work. Shaming companies
 where people share duties and responsibilities (or even lack such, since
 they're paid) is one thing, but when it comes to open source, there's
 usually just a single person behind a given feature.
 I spoke with 2 "leaders" behind 2 other large projects and was told to
 just ignore such messages, as every project this size gets random "you
 suck" complaints on a daily basis (seriously).
 Regarding the "best h264 encoder and worst aac encoder" comment, x264
 literally got millions from huge companies, still does to this day, and
 had several full time developers. Can't say the latter got anything.
 Regarding the "developers must be held responsible for not maintaining
 such a widely used project" comment: I can name addresses and address
 names of who to send glitter bombs to. Unfortunately, one of those
 resolves as 'NULL', at '0xffffffffffffffff', which is dependent on
 ~~undefined behavior~~, metaphysical interpretation.

Ticket URL: <https://trac.ffmpeg.org/ticket/8066#comment:14>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker

More information about the FFmpeg-trac mailing list