[FFmpeg-devel] [RFC v2 3/3] daaladec: Implement a native Daala decoder

Ganesh Ajjanagadde gajjanag at mit.edu
Wed Dec 30 03:12:57 CET 2015

On Tue, Dec 29, 2015 at 5:09 PM, Ronald S. Bultje <rsbultje at gmail.com> wrote:
> Hi,
> On Tue, Dec 29, 2015 at 6:54 PM, Andreas Cadhalpun <
> andreas.cadhalpun at googlemail.com> wrote:
>> Hi,
>> On 29.12.2015 19:46, Ronald S. Bultje wrote:
>> > On Tue, Dec 29, 2015 at 11:49 AM, Andreas Cadhalpun <
>> > andreas.cadhalpun at googlemail.com> wrote:
>> >> Do you have a sample causing overflows in the vp9 decoder?
>> >
>> >
>> > Nope, but I'm going to assume they're not hard to create, just use a few
>> > high same-sign quantized coefficients (e.g. SHORT_MAX) and a high
>> quantizer
>> > (qidx=255, dq_ac=1828). Both dequant into int16_t (1828*SHORT_MAX doesn't
>> > fit in 16 bits) as well as the idct results themselves (because you're
>> > adding and subtracting from an already near-edge value which should fit
>> in
>> > 16bits) will overflow.
>> I guess hard depends on who you ask. I've fuzzed the vp9 decoder and
>> haven't yet come across a sample that triggers a signed integer overflow
>> there.
>> In the case of this daala decoder, 3/4 of the fuzzed samples do...
>> >> (Overflows in dsp code are typically not a security concern.)
>> >>
>> >> Well, the overflows in the imdct calculation of the aac_fixed decoder
>> >> ultimately
>> >> caused crashes.
>> >
>> >
>> > I would prefer if for video codecs, unless specifically proven otherwise
>> > for that codec with specific content, we assume by default that overflows
>> > in performance-sensitive dsp code (specifically idct) are OK. ubsan may
>> not
>> > like us on fuzzed content, but so be it.
>> I prefer that input for such performance-sensitive dsp code gets sanitized,
>> so that overflows are (nearly) impossible.
> I don't think it makes sense to make the decoder 5% slower just so that
> ubsan is happy. We don't do that for other video decoders either. And yes,
> the coefficient decoding loop is typically a major hotspot in video codecs.

Basically, I think we need to strike a very fine balance here, and I
offer my own thoughts on this below:
1. On the one hand, it is difficult to reason how undefined behavior
in a tight loop can impact other places in the code. It may or may not
be safe in a practical context. If things can be changed to be safe at
negligible performance impact, it should be done as a defensive
programming measure IMHO. Theoretically, undefined means that anything
can happen, but practically the range of possibilities is far more
limited (no one runs ubsan in production). We already do not take into
account all theoretical aspects: for example, nothing dictates the
choice of the IEEE-754 floating point format, but some code does
assume it in FFmpeg. As such, we are anyway operating in a "practical"
context, and theoretical purity must be balanced with practical
impact, something I forget myself a lot.

2. If things can't be transformed to be safe without leading to
significant loss of performance, such things must be clearly
documented as comments, fully justified, and very carefully
scrutinized IMHO.

3. Note that integer overflow is a hard problem, with endless debate
all round: https://lwn.net/Articles/547649/,
- even Rust essentially cops out by default on release builds. The
only thing is that they took more responsibility regarding the
semantics than the C standard - although C leaves it undefined, in
practice Rust and C will generate the same code here AFAIK (note: I
have not studied Rust, just casually read about it). The real problem
comes in what clients do with the outputs of such overflow: in Rust
for instance array bounds derived from such arithmetic will be
checked, in C it will lead to big trouble. Then again, the funny thing
is: just having "safety" is not enough, the burden is anyway on the
programmer to handle such "weird" cases via some form of checking
code, regardless of the language, or careful thought to ensure that
such things don't happen (e.g the naive (min+max)/2 vs min +
(max-min)/2 for binary search). No language can do the "right thing"
automatically: the "right thing" varies depending on the situation.
See the next point.

4. Even with integer overflow checks, note that the method of handling
can vary significantly, as can be seen in FFmpeg. It is not easy to
audit every line, and create brittle checks that make hard to
understand code even harder to understand. Wish there were
alternatives, but I see none: some things are inherently not simple.

> Ronald
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

More information about the ffmpeg-devel mailing list