[FFmpeg-devel] [RFC] [PATCH] Indicate better when transposing zigzag scantables is needed in some codecs

Michael Niedermayer michaelni
Sat Jan 19 02:26:25 CET 2008


On Mon, Jan 14, 2008 at 10:09:51PM +0100, Christophe GISQUET wrote:
> Forking this particular topic, as it involves more than vc1.
> 
> > > +          "m"(mm_rnd1), "m"(mm_rnd2), "m"(mm_shift)
> > > > +    );
> > > > +}
> > > > +
> > > > +static void vc1_inv_trans_8x8_mmx(DCTELEM block[64])
> > > > +{
> > > > +    transpose8x8_mmx(block);
> >
> > all initial permutations (here a transpose) MUST be merged into the
> > scantable
> > all other codecs do this too! vc1 wont become an exception
> 
> I understand that, yet not all codecs actually do it the same way.
> 
> ffmpeg has 2 methods for this:
> - (mpegvideo) ff_init_table, idct_permutation_type and ScanTable stuff;
> hardcoded to use 64 values
> - (h264) explicit check for transform method, and if not c version,
> loads zz scantable in a transposed way
> 
> Here's an snippet of the code for the second one:
>     if(s->dsp.h264_idct_add == ff_h264_idct_add_c){ //FIXME little ugly
>         memcpy(h->zigzag_scan, zigzag_scan, 16*sizeof(uint8_t));
>         memcpy(h-> field_scan,  field_scan, 16*sizeof(uint8_t));
>     }else{
>         for(i=0; i<16; i++){
> #define T(x) (x>>2) | ((x<<2) & 0xF)
>             h->zigzag_scan[i] = T(zigzag_scan[i]);
>             h-> field_scan[i] = T( field_scan[i]);
> #undef T
>         }
>     }
> 
> For such codec, it seems the mechanisms available with ScanTable are not
> used.
> 
> Either h264 way (2nd method) gets improved or the 1st method does.
> Either way, it must allow specific optimizations, and address several
> transforms at a time. An example would be that 4x4 transforms may not
> need zz transpose while 8x8 needs it. h264 has 2 transforms, vc1 4.

until someone writes a SIMD h264/vc1 idct which is faster with a permutation
different from the transpose, id assume that its fine to transpose all or
none
alternatively someone could try to write a c idct which is faster with
transposed input, i wouldnt be surprised if with 64bit HW it might be
possible to write a idct based on the same idea as the SIMD idcts in
plain C. That is working with 4 16bit values at a time in a 64bit int

so i do not see any real need to change anything, the ugly solution we
have is simple and does exactly what is needed that is if not the C idct
do transpose

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

If you really think that XML is the answer, then you definitly missunderstood
the question -- Attila Kinali
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080119/81253995/attachment.pgp>



More information about the ffmpeg-devel mailing list