[FFmpeg-devel] [PATCH] VP3 DC-only IDCT

Michael Niedermayer michaelni
Fri Apr 16 21:42:30 CEST 2010


On Fri, Apr 16, 2010 at 08:20:44AM -0400, David Conrad wrote:
> On Mar 13, 2010, at 2:18 PM, Michael Niedermayer wrote:
> 
> > On Sat, Mar 13, 2010 at 01:36:20AM -0500, David Conrad wrote:
> >> Hi,
> >> 
> >> This gives 2-4% faster overall decode for normal files.
> >> 
> >> Some thoughts:
> >> I can't think of any shortcuts that could make the IDCT faster with 128-byte simd that don't rely on knowing the last non-zero coefficient.
> >> 
> >> Knowing that before calling the idct, you could do a slightly faster IDCT that assumes the right and bottom of the block are all 0. This seems to be significantly faster only for mmx; for sse2 it's nearly a wash between the added check vs. the time saved.
> >> 
> >> For an average video, around a third of all idcts are DC-only, a third more could be done with that shortcut (i.e. last_nnz is under 10), and the rest require a full IDCT.
> >> 
> >> libtheora only does the 10 element shortcut, not DC-only. It also only has a mmx IDCT.
> >> 
> >> I also haven't really looked at whether a DC-only IDCT is beneficial for mpeg codecs, thus the vp3-specific dsputil function.
> >> 
> > 
> > [...]
> >> diff --git a/libavcodec/vp3dsp.c b/libavcodec/vp3dsp.c
> >> index 87b64de..606e361 100644
> >> --- a/libavcodec/vp3dsp.c
> >> +++ b/libavcodec/vp3dsp.c
> >> @@ -223,6 +223,25 @@ void ff_vp3_idct_add_c(uint8_t *dest/*align 8*/, int line_size, DCTELEM *block/*
> >>     idct(dest, line_size, block, 2);
> >> }
> >> 
> >> +void ff_vp3_idct_dc_add_c(uint8_t *dest/*align 8*/, int line_size, DCTELEM *block/*align 16*/){
> >> +    const uint8_t *cm = ff_cropTbl + MAX_NEG_CROP;
> >> +    int i, dc = block[0];
> > 
> >> +    dc = (46341*dc)>>16;
> >> +    dc = (46341*dc)>>16;
> > 
> > me searches for a bag to vomit into ...
> > do they do all x>>1 in theora that way or just selected ones?
> 
> Every multiplication in the IDCT is immediately followed by cutting the least significant 16 bits.

creepy


> 
> > [...]
> >> diff --git a/libavcodec/x86/vp3dsp_mmx.c b/libavcodec/x86/vp3dsp_mmx.c
> >> index fead8e8..e39d0a1 100644
> >> --- a/libavcodec/x86/vp3dsp_mmx.c
> >> +++ b/libavcodec/x86/vp3dsp_mmx.c
> >> @@ -395,3 +395,65 @@ void ff_vp3_idct_add_mmx(uint8_t *dest, int line_size, DCTELEM *block)
> >>     ff_vp3_idct_mmx(block);
> >>     add_pixels_clamped_mmx(block, dest, line_size);
> >> }
> >> +
> > 
> >> +void ff_vp3_idct_dc_add_mmx2(uint8_t *dest, int linesize, DCTELEM *block)
> >> +{
> >> +    int dc = block[0];
> >> +    dc = (46341*dc)>>16;
> > 
> >> +    dc = (46341*dc)>>16;
> >> +    dc = (dc + 8) >> 4;
> > 
> > you can merge these 2
> 
> Done


[...]
> @@ -1468,10 +1468,13 @@ static void render_slice(Vp3DecodeContext *s, int slice)
>                              stride,
>                              block);
>                      } else {
> +                        if (vp3_dequant(s, s->all_fragments + i, plane, 1, block))
>                          s->dsp.idct_add(
>                              output_plane + first_pixel,
>                              stride,
>                              block);
> +                        else

nitpick: {}


> +                            s->dsp.vp3_idct_dc_add(output_plane + first_pixel, stride, block);
>                      }
>                  } else {
>  
> diff --git a/libavcodec/vp3dsp.c b/libavcodec/vp3dsp.c
> index 87b64de..606e361 100644
> --- a/libavcodec/vp3dsp.c
> +++ b/libavcodec/vp3dsp.c

> @@ -223,6 +223,25 @@ void ff_vp3_idct_add_c(uint8_t *dest/*align 8*/, int line_size, DCTELEM *block/*
>      idct(dest, line_size, block, 2);
>  }
>  
> +void ff_vp3_idct_dc_add_c(uint8_t *dest/*align 8*/, int line_size, DCTELEM *block/*align 16*/){

const block


> +    const uint8_t *cm = ff_cropTbl + MAX_NEG_CROP;
> +    int i, dc = block[0];
> +    dc = (46341*dc)>>16;

> +    dc = (46341*dc)>>16;
> +    dc = (dc + 8) >> 4;

mergeable

rest ok as far as iam concerened but its maintained by others ...


[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Opposition brings concord. Out of discord comes the fairest harmony.
-- Heraclitus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100416/233af130/attachment.pgp>



More information about the ffmpeg-devel mailing list