[FFmpeg-devel] [PATCH]levc/hevc_cabac Optimise ff_hevc_hls_residual_coding (v2)

Christophe Gisquet christophe.gisquet at gmail.com
Tue Feb 2 12:52:15 CET 2016


Hi,

as a motus operandi for this review, I have no time for a proper one,
or at least not fitting with John's timeframe. I'll try to close as
many pending discussions, and would prefer if someone else completed
the review/validation/commit.

2016-01-22 19:33 GMT+01:00 John Cox <jc at kynesim.co.uk>:
> Fair enough - though given that your slowdowns are almost certainly
> cache-related the whole may be quite different from the sum of the
> parts.

True, they don't always translate to anything noticeable, but that's
the best tool we have to objectively decide.

>>cosmetics?
>
> I renamed the variable, because c_idx can have values 0..2 and c_idx_nz
> is a boolean with 0..1 and in the rewrite of the inc var it is important
> that we are using the _nz variant so having the var named appropriately
> seemed sensible.

I don't really mind mixing some form of cosmetics (=supposedly without
code generation consequences) although other people prefer splitting
for easier review and regression testing.

This is not a blocking item for me, just that finding the most
appropriate commit would be nice.

>>I suppose branch prediction could help here, but less likely than for
>>get_cabac_sig_coeff_flag_idxs.
>>
>>Also for this and some others: why inline over av_always_inline?
> No particularly good reason for this one - though for any fn that might
> be called from multiple places there is a strong argument for just
> "inline" as it allows the compiler to make a judgment call based on how
> big L1 cache is and how bad the call penalty.

Anyway, those kinds of micro-optimizations I'm suggesting need more
testing (sequences, platforms), so let's roll with this.

>>AV_WN64 of 0x0101010101010101ULL, or even a memset, as it would be
>>inlined by the compiler, given its size, and done the best possible
>>way.
>
> levels is int *, not char *

Ok, sorry, then 0x0000000100000001ULL. But you can ignore this, it'll
probably make no difference outside of a micro-benchmark.

>>Saturation, could be a separate patch, with which I would be ok.

btw and iirc, a comment indicated assumptions on what is a "legit"
(instead of conforming ) bitstream/coeffs, making a conscious
decision.

I know Ronald, ffvp9's author, specifically decided to handle
equivalent cases in bitstreams (hint) from Argon Designs. I have no
opinion, but others might.

>>Related to but not strictly bypass ?
>
> Not bypass per se, more the general optimisation of abs_level_remaining
> - it is pulled out because I had a slightly better arm asm version of
> the fn.  So it could go in that patch, but this allows other asm to
> override it if they so desire.

What I meant: would better be there than in another commit.

>>Doing:
>>    if (get_cabac(c, state0 + ctx_map[n]))
>>        *p++ = n;
>>    while (--n != 0) {
>>        if (get_cabac(c, state0 + ctx_map[n]))
>>            *p++ = n;
>>    }
>>is most likely faster, probably also on arm, if the branch prediction is good.
>
> Not convinced.  That will increase code size (as get_cabac will inline)
> which can be pure poison as you have found out with USE_N_END_1.

That's 300B, not 1.5KB. And I *know* it can help, just not on all
platforms and sequences. The same decision was made for ffh264's
equivalent, iirc.

But this would need more validated results, so let's ignore that discussion.

>>boolean? (not sure)
> I saw no booleans anywhere in the rest of this code so assumed they were
> (at best) deprecated.  But if there is an official ffmpeg boolean then
> that is what it should be.

Sorry for the imprecision, ffmpeg doesn't have a boolean type. I meant
this to be possibly split as its own kind of modification, not fitting
with anything else. Or not, this shouldn't be a blocking item.

>>So we decided to have #define USE_N_END_1 ARCH_ARM. As said in another
>>mail, there's a 10-20% loss for the codeblock.
>>
>>USE_BY22_DIV also strangely provides no benefit.
> Slightly surprised by that, but DIV is going to produce smaller code so
> should be preferred on general grounds as it puts off cache disasters.

There maybe pessimization (spelling?) there, if the compiler doesn't
manage to generate the idiv + picking edx for the shifted result.
I haven't checked the disassembled code.

BR,
-- 
Christophe


More information about the ffmpeg-devel mailing list