[FFmpeg-devel] [PATCH] MMX implementation of VC-1 inverse transforms

Michael Niedermayer michaelni
Sat Jan 19 04:16:47 CET 2008


On Wed, Jan 16, 2008 at 10:16:30PM +0100, Christophe GISQUET wrote:
> Hi,
> 
> considering the amount of rework, mostly because of my oversight of the
> overflows, I'll split next patches like:
> - first, i4x4
> - then i8x8
> - then i4x8 / i8x4
> 
> This mail is therefore only for things related to i4x4. I'll start a new
> thread if requested to.
> 
> Michael Niedermayer a ?crit :
> > you do not need temporary storeage
> > the butterflies can be implemented like:
> > b+=a
> > a+=a
> > a-=b
> 
> 'Trick' used as far as I could get (which doesn't mean it's far...) in
> the current patch.
> 
> >> +static void vc1_inv_trans_8x8_mmx(DCTELEM block[64])
> >> +{
> >> +    transpose8x8_mmx(block);
> > 
> > all initial permutations (here a transpose) MUST be merged into the scantable
> > all other codecs do this too! vc1 wont become an exception
> 
> Pending a decision on how to signal that the zz scantable must be
> transposed at loading, I've left the useless transpose in there. It'll
> just be a matter of not calling the macro and propagating the new
> registers used.
> 
> >> +#define IDCT4_1D(R0, R1, R2, R3, TMP1, TMP2, TMP3, SHIFT)      \
> [...]
> > same as above the multiply can be done before the butterfly and
> > thus 1 bias add can be avoided
> 
> The solution I came up with to avoid overflow problems ((8*A+B)>>3 = 8 +
> (B)>>3) doesn't seem to allow me such trick.
> 
> This solution has its share of problems:
> - forces me to perform the butterflies twice
> - waste of mm7, but don't know where to use it
> - not very readable...
> 
> I hope I haven't missed too many obvious optimizations this time...
> Currently it clocks at 1339 dezicycles (vs 2100 for the improved C
> version), so it's 20% slower than my previous, overflowing version.
> Maybe an improved version of the later could be kept for flags2 fast...
> 
> Best regards,
> Christophe GISQUET

> Index: libavcodec/i386/vc1dsp_mmx.c
> ===================================================================
> --- libavcodec/i386/vc1dsp_mmx.c	(r?vision 11527)
> +++ libavcodec/i386/vc1dsp_mmx.c	(copie de travail)
> @@ -467,6 +467,95 @@
>  DECLARE_FUNCTION(3, 2)
>  DECLARE_FUNCTION(3, 3)
>  
> +/* in/out: B=(A+R)/2+B  TMP=(A+R)/2-B  A=2*A */
> +#define MERGE(A, B, TMP)                                     \
> +     "movq       "#A", "#TMP" \n\t"                          \
> +     "paddw      "#A", "#A" \n\t"                 /* 2A */   \
> +     "paddw      %%mm7, "#TMP" \n\t"             /* A+R */   \
> +     "psraw      $1, "#TMP" \n\t"           /* (A+R)>>1 */   \
> +     SUMSUB_BA(B, TMP)
> +
> +/* (17(s0+s2)+22s1+10s3+5)>>3 = 2(s0+s2)+2s1+s3+(s0+s2+6s1+2s3+R)>>3
> + *                            = ... + (3s1+s3+(s0+s2+R)>>1)>>2

i think, the following is safe

        t1 = src[0] + src[2];
        t2 = src[0] - src[2];
        t1= 8*t1 + (t1>>1);
        t2= 8*t2 + (t2>>1);

        t3 = 11 * src[1] + 5 * src[3];
        t4 = 11 * src[3] - 5 * src[1];

        dst[0] = (t1 + t3 + 2) >> 2;
        dst[1] = (t2 - t4 + 2) >> 2;
        dst[2] = (t2 + t4 + 2) >> 2;
        dst[3] = (t1 - t3 + 2) >> 2;

the spec requires dst>>4 to be within -512 .. 510
so the triplet sums are limited to -32768 .. 32703
hence the value prior to the >>2 fits in 16bit
the values before the >>1 are also limited to +-8192 per spec so they
are fine as well

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

When the tyrant has disposed of foreign enemies by conquest or treaty, and
there is nothing more to fear from them, then he is always stirring up
some war or other, in order that the people may require a leader. -- Plato
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080119/ab192e9d/attachment.pgp>



More information about the ffmpeg-devel mailing list