[FFmpeg-devel] [PATCH] H.264: x264 SSE2 iDCT functions

Michael Niedermayer michaelni
Sat Jan 3 00:12:13 CET 2009


On Fri, Jan 02, 2009 at 04:36:11PM -0500, Jason Garrett-Glaser wrote:
> On Fri, Jan 2, 2009 at 4:14 PM, Guillaume POIRIER <poirierg at gmail.com> wrote:
> > Hello,
> >
> > On Fri, Jan 2, 2009 at 9:37 PM, Jason Garrett-Glaser
> > <darkshikari at gmail.com> wrote:
> >
> >> Patch attached.
> >
> >> +#if defined(CONFIG_GPL) && defined(HAVE_YASM)
> >> +static void ff_h264_idct_dc_add8_mmx2(uint8_t *dst, int16_t *block, int stride)
> >> +{
> >>
> >
> > This is just to avoid having unreferenced code, right? I assume you're
> > not doing to license ff_h264_idct_dc_add8_mmx2 under GPL...
> 
> Yes, of course.  I can't license it under GPL, it's basically copied
> from the original idct_dc.
> 
> Also, Michael, why isn't idct_add8 (the chroma 8-4x4idct function)
> used at all?  Did you forget to insert it when you added
> add16/add16_intra into h264.c?

i benchmarked it, and it was slower, thats why its not in svn ...
I dont know why it was slower ...

related hunk i had laying around below (doesnt apply anymore though)

@@ -2549,26 +2587,25 @@
 
         if((simple || !ENABLE_GRAY || !(s->flags&CODEC_FLAG_GRAY)) && (h->cbp&0x30)){
             uint8_t *dest[2] = {dest_cb, dest_cr};
-            if(transform_bypass){
-                idct_add = idct_dc_add = s->dsp.add_pixels4;
-            }else{
-                idct_add = s->dsp.h264_idct_add;
-                idct_dc_add = s->dsp.h264_idct_dc_add;
+            if(!transform_bypass){
                 chroma_dc_dequant_idct_c(h->mb + 16*16, h->chroma_qp[0], h->dequant4_coeff[IS_INTRA(mb_type) ? 1:4][h->chroma_qp[0]][0]);
                 chroma_dc_dequant_idct_c(h->mb + 16*16+4*16, h->chroma_qp[1], h->dequant4_coeff[IS_INTRA(mb_type) ? 2:5][h->chroma_qp[1]][0]);
             }
             if(is_h264){
-                if(transform_bypass && IS_INTRA(mb_type) && h->sps.profile_idc==244 && (h->chroma_pred_mode==VERT_PRED8x8 || h->chroma_pred_mode==HOR_PRED8x8)){
+                if(transform_bypass){
+                    if(IS_INTRA(mb_type) && h->sps.profile_idc==244 && (h->chroma_pred_mode==VERT_PRED8x8 || h->chroma_pred_mode==HOR_PRED8x8)){
                     h->hpc.pred8x8_add[h->chroma_pred_mode](dest[0], block_offset + 16, h->mb + 16*16, uvlinesize);
                     h->hpc.pred8x8_add[h->chroma_pred_mode](dest[1], block_offset + 20, h->mb + 20*16, uvlinesize);
                 }else{
+                        idct_add = s->dsp.add_pixels4;
                     for(i=16; i<16+8; i++){
-                        if(h->non_zero_count_cache[ scan8[i] ])
+                        if(h->non_zero_count_cache[ scan8[i] ] || h->mb[i*16])
                             idct_add   (dest[(i&4)>>2] + block_offset[i], h->mb + i*16, uvlinesize);
-                        else if(h->mb[i*16])
-                            idct_dc_add(dest[(i&4)>>2] + block_offset[i], h->mb + i*16, uvlinesize);
                     }
                 }
+                }else{
+                    s->dsp.h264_idct_add8(dest, block_offset, h->mb, uvlinesize, h->non_zero_count_cache);
+                }
             }else{
                 for(i=16; i<16+8; i++){
                     if(h->non_zero_count_cache[ scan8[i] ] || h->mb[i*16]){

[...]
-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Complexity theory is the science of finding the exact solution to an
approximation. Benchmarking OTOH is finding an approximation of the exact
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20090103/b89e5331/attachment.pgp>



More information about the ffmpeg-devel mailing list