[FFmpeg-devel] [PATCH] h264.c/decode_cabac_residual optimization

Michael Niedermayer michaelni
Tue Jul 1 20:44:44 CEST 2008


On Tue, Jul 01, 2008 at 01:14:56PM -0400, Alexander Strange wrote:
>
> 1 - Don't check for encoding in hl_decode_mb, since there's no encoder.
> 2 - Turn off cloning hl_decode_mb under CONFIG_SMALL.
> 3 - Clone decode_cabac_residual depending on cat. This gets rid of several 
> cat checks and the if (qmul) for every coeff.
>
> On Core 2:
> Before-
> 24205 dezicycles in decode_cabac_residual, 1048505 runs, 71 skips
> 23079 dezicycles in decode_cabac_residual, 2096992 runs, 160 skips
> 22186 dezicycles in decode_cabac_residual, 4193933 runs, 371 skips
> 22444 dezicycles in decode_cabac_residual, 8387862 runs, 746 skips
>
> After-
> 24035 dezicycles in decode_cabac_residual, 1048502 runs, 74 skips
> 22922 dezicycles in decode_cabac_residual, 2096981 runs, 171 skips
> 22037 dezicycles in decode_cabac_residual, 4193955 runs, 349 skips
> 22293 dezicycles in decode_cabac_residual, 8387927 runs, 681 skips
>
> cabac_residual's time is definitely not normally distributed, so I wouldn't 
> trust the cycle counts for anything, but at least it went down.
>
> 4 - reindention
> 5 - Simplify "for( coeff_count--; coeff_count >= 0; coeff_count-- )".
> 6 - Reorder the cat checks based on frequency, mostly from counting block 
> types in x264 output. gcc statically predicts if == to be not taken, so the 
> more common one should be in the else block. Moving cat == 5 to the top of 
> the if/elses also makes it more obvious that it can be merged into the 
> significance map.
> 7 - Use get_cabac_bypass_sign in both branches of the loop.
>
> Before-
> 24043 dezicycles in decode_cabac_residual, 524159 runs, 129 skips
> 24462 dezicycles in decode_cabac_residual, 1048335 runs, 241 skips
> 23323 dezicycles in decode_cabac_residual, 2096708 runs, 444 skips
> 22416 dezicycles in decode_cabac_residual, 4193483 runs, 821 skips
>
> After-
> 23318 dezicycles in decode_cabac_residual, 524112 runs, 176 skips
> 23712 dezicycles in decode_cabac_residual, 1048286 runs, 290 skips
> 22608 dezicycles in decode_cabac_residual, 2096607 runs, 545 skips
> 21731 dezicycles in decode_cabac_residual, 4193412 runs, 892 skips
>
> There are some interesting things to look at after these-
> - gcc pointlessly unrolls the "while( coeff_abs < 15 && get_cabac( CC, ctx 
> ) )" loop into taking up about half of the compiled function when x86 asm 
> is used, so it could be rewritten in asm to fix that.
> - get_cabac_bypass* could use cmov, and refill() is so small it might take 
> less time than a branch mispredict penalty.
> - The asm operands for get_cabac itself could be relaxed (not requiring 
> output to eax, using +m/+g, that kind of thing). I tried this earlier and 
> got bad results, but maybe it'll work better after this.
> - refill* should use AV_RB16.
> - I'm not sure what CABAC_ON_STACK is used for.
>

> But I don't want to spend too much time not doing SoC - someone else can 
> try those if they want.

fine, whoever wants to work on this should post
proper benchmarks, not 1 run after half a dozen changes


>
>

> diff -ru --exclude='*svn*' ffmpeg-/libavcodec/h264.c ffmpeg/libavcodec/h264.c
> --- ffmpeg-/libavcodec/h264.c	2008-06-30 14:47:21.000000000 -0400
> +++ ffmpeg/libavcodec/h264.c	2008-06-30 14:47:36.000000000 -0400
> @@ -2726,10 +2726,7 @@
>      MpegEncContext * const s = &h->s;
>      const int mb_xy= h->mb_xy;
>      const int mb_type= s->current_picture.mb_type[mb_xy];
> -    int is_complex = FRAME_MBAFF || MB_FIELD || IS_INTRA_PCM(mb_type) || s->codec_id != CODEC_ID_H264 || (ENABLE_GRAY && (s->flags&CODEC_FLAG_GRAY)) || s->encoding;
> -
> -    if(!s->decode)
> -        return;
> +    int is_complex = FRAME_MBAFF || MB_FIELD || IS_INTRA_PCM(mb_type) || s->codec_id != CODEC_ID_H264 || (ENABLE_GRAY && (s->flags&CODEC_FLAG_GRAY));

the if(!decode) should be under a #ifdef h264_encoder


[...]

> diff -ru --exclude='*svn*' ffmpeg-/libavcodec/h264.c ffmpeg/libavcodec/h264.c
> --- ffmpeg-/libavcodec/h264.c	2008-06-30 14:47:53.000000000 -0400
> +++ ffmpeg/libavcodec/h264.c	2008-06-30 14:47:59.000000000 -0400
> @@ -5517,7 +5517,7 @@
>          }
>      }
>  
> -    for( coeff_count--; coeff_count >= 0; coeff_count-- ) {
> +    while( coeff_count-- ) {
>          uint8_t *ctx = coeff_abs_level1_ctx[node_ctx] + abs_level_m1_ctx_base;
>  
>          int j= scantable[index[coeff_count]];

ok if faster or same speed


[...]
> diff -ru --exclude='*svn*' ffmpeg-/libavcodec/h264.c ffmpeg/libavcodec/h264.c
> --- ffmpeg-/libavcodec/h264.c	2008-06-30 14:48:05.000000000 -0400
> +++ ffmpeg/libavcodec/h264.c	2008-06-30 14:48:11.000000000 -0400
> @@ -5552,11 +5552,9 @@
>              }
>  
>              if( is_dc ) {
> -                if( get_cabac_bypass( CC ) ) block[j] = -coeff_abs;
> -                else                                block[j] =  coeff_abs;
> +                block[j] = get_cabac_bypass_sign( CC, -coeff_abs );
>              }else{
> -                if( get_cabac_bypass( CC ) ) block[j] = (-coeff_abs * qmul[j] + 32) >> 6;
> -                else                                block[j] = ( coeff_abs * qmul[j] + 32) >> 6;
> +                block[j] = (get_cabac_bypass_sign( CC, -coeff_abs ) * qmul[j] + 32) >> 6;
>              }

ok if faster or same speed

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Democracy is the form of government in which you can choose your dictator
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080701/d9c56e45/attachment.pgp>



More information about the ffmpeg-devel mailing list