[FFmpeg-devel] [PATCH] avcodec/golomb: Mask shift amount before use in get_ue_golomb()
gajjanag at mit.edu
Fri Dec 11 22:24:43 CET 2015
On Fri, Dec 11, 2015 at 4:14 PM, Andreas Cadhalpun
<andreas.cadhalpun at googlemail.com> wrote:
> On 07.12.2015 00:27, Ganesh Ajjanagadde wrote:
>> On Sun, Dec 6, 2015 at 6:12 PM, Andreas Cadhalpun
>> <andreas.cadhalpun at googlemail.com> wrote:
>>> On 06.12.2015 22:48, Michael Niedermayer wrote:
>>>> my concern is more on h264 (CAVLC) and hevc speed
>>> I tested with 444_8bit_cavlc.h264 added 100 together with the concat demuxer,
>>> and couldn't see a measurable speed difference caused by this error check.
>> Ok, so here is my understanding of the situation.
>> I think both of you are right, but have different perspectives on this.
>> So in practice a log < 7 may be usually predicted correctly, and the
>> compiler in all likelihood will continue to inline the function. Thus,
>> excepting the first run, there should not be an issue, and maybe the
>> compiler even feeds in the "likely" information for the first run
>> Nevertheless, I also understand Michael's perspective: h264 is
>> arguably one of the most important codecs as supplied by FFmpeg. Even
>> a 0.01% speedloss in some place should be done with extreme caution,
>> since over time these may accumulate to something more sizable, say
>> 0.5%. There should be a very good justification for it, and thus
>> Michael spends effort trying to ensure that the security issue is
>> fixed at identical asm.
> I wouldn't call this a security issue, it's just undefined behavior.
Meant really from a theoretical perspective, since undefined means
anything can happen. Of course, in practice a distinction may be
drawn. But then again, I consider even these worthy of backport.
>> I personally agree with Michael here due to h264's importance (note:
>> this is modulo Michael's fix being correct, which I can't comment
> Michael's proposed patch is more of a 'disable the warning' than
> 'fix the warning': It is still not good if the condition
> happens, but no warning is emitted anymore.
>> I would have thought differently 6 months back, but with more
>> work with FFmpeg, I have shifted my views.
>> To understand this better, see e.g commit how b7fb7c45 by me: I got
>> rid of undefined behavior so as to not trigger ubsan stuff, and
>> substituted with implementation defined behavior at zero cost that
>> should be consistent across platforms in practice. Yes, one could
>> insert a branch, but there is minimal utility: anyone feeding such
>> extreme values is fuzzing or placing an irregular file in any case.
> In contrast to the change of b7fb7c45, this undefined behavior is not
> only triggered by fuzzed samples, but even by a FATE sample.
> Silencing has the disadvantage that is not noticeable anymore, if
> a correct sample triggers this condition.
Ok, I just wanted this studied and explored carefully, that is all.
>> Also, to understand Michael's views better, see e.g
>> Even superficially "cosmetic" effects can have a cost. See also commit
>> by me: 68e79b27a5ed7. Even small things can matter.
>> Really these things should be done with START/STOP timers since the
>> change is in a tight inner construct.
> OK, so I did test with START/STOP timers in get_ue_golomb, one for the
> first branch (A) and one for the second (B).
> For h264 I used again 444_8bit_cavlc.h264, but 1000 times concat'ed
> With the check in the B branch:
> 976 decicycles in get_ue_golomb B, 2047 runs, 1 skips
> 747 decicycles in get_ue_golomb B, 1024 runs, 0 skips
> 337 decicycles in get_ue_golomb A,16777054 runs, 162 skips
> Without the check:
> 922 decicycles in get_ue_golomb B, 2046 runs, 2 skips
> 2309 decicycles in get_ue_golomb B, 1022 runs, 2 skips
> 341 decicycles in get_ue_golomb A,16777036 runs, 180 skips
> (I shortened this as the low runs are anyway not significant.
> Also don't get confused by the fact that the B run count apparently
> decreases, it's just that get_ue_golomb gets inlined in several
> places, so there are several counters. However, the A counter
> with the second most runs has factor 1000 less runs, so is basically
> This shows multiple things:
> * The B branch is not executed often enough to get reproducible timings.
> * The A branch is executed about 5000 times more often than the B branch.
> (That's what I meant with less likely branch.)
> I also checked the cavs decoder, using the 100 times concat'ed cavs.mpg.
> With the check in the B branch:
> 629 decicycles in get_ue_golomb B, 4194260 runs, 44 skips
> 433 decicycles in get_ue_golomb A,268434102 runs, 1354 skips
> Without the check:
> 624 decicycles in get_ue_golomb B, 4194273 runs, 31 skips
> 433 decicycles in get_ue_golomb A,268434203 runs, 1253 skips
> This shows:
> * The B branch gets about 0.8% slower with the check.
> * The A branch is executed about 70 times more often than the B branch.
> * So on average, get_ue_golomb gets 0.8%/70 = 0.01% slower for cavs.
> * The B branch is already about 50% slower than the A branch.
> Extrapolating this to the h264 decoder, it would get slower by
> about 0.8%/5000=0.00016%, which is negligible.
> And that's only the runtime of get_ue_golomb, but the decoder
> also does other things than calling this function.
> Thus I still think that adding the error check here has practically
> no speed cost and is thus the better alternative.
If it saves effort, maybe the level of detail of the analysis can be
reduced somewhat for the future :). Thanks for your effort.
> Best regards,
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
More information about the ffmpeg-devel