[FFmpeg-devel] [PATCH] use AV_RB16 in cabac refill

Måns Rullgård mans
Thu Mar 25 15:11:32 CET 2010


Alexander Strange <astrange at ithinksw.com> writes:

> Measured 1 cycle faster decode_cabac_residual on x86-64. Didn't try
> anywhere else, but I'd be a little interested in what arm does.
>
>
> From 539b4c39981a32f4de2c0cbccc54bf540bda398f Mon Sep 17 00:00:00 2001
> From: Alexander Strange <astrange at ithinksw.com>
> Date: Wed, 17 Mar 2010 06:06:15 -0400
> Subject: [PATCH 3/4] cabac: Use AV_RB16 instead of two byte loads in refill
>
> Less than 1 cycle faster.
> ---
>  libavcodec/cabac.h |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/libavcodec/cabac.h b/libavcodec/cabac.h
> index 2794626..3aed9fb 100644
> --- a/libavcodec/cabac.h
> +++ b/libavcodec/cabac.h
> @@ -262,7 +262,7 @@ static void put_cabac_ueg(CABACContext *c, uint8_t * state, int v, int max, int
>  
>  static void refill(CABACContext *c){
>  #if CABAC_BITS == 16
> -        c->low+= (c->bytestream[0]<<9) + (c->bytestream[1]<<1);
> +        c->low+= AV_RB16(c->bytestream)<<1;
>  #else
>          c->low+= c->bytestream[0]<<1;
>  #endif
> @@ -280,7 +280,7 @@ static void refill2(CABACContext *c){
>      x= -CABAC_MASK;
>  
>  #if CABAC_BITS == 16
> -        x+= (c->bytestream[0]<<9) + (c->bytestream[1]<<1);
> +        x+= AV_RB16(c->bytestream)<<1;
>  #else
>          x+= c->bytestream[0]<<1;
>  #endif

This is probably faster on machines with unaligned load support.  The
others I'm not so sure about.  If the compiler is clever enough, it
shouldn't make a difference, but you know, gcc...

On the other hand, many of those systems are probably not particularly
relevant here.

-- 
M?ns Rullg?rd
mans at mansr.com



More information about the ffmpeg-devel mailing list