[FFmpeg-devel] [PATCH] SSE2 Xvid idct

Alexander Strange astrange
Sun Apr 6 18:41:03 CEST 2008


On Apr 6, 2008, at 8:58 AM, Michael Niedermayer wrote:
> On Sun, Apr 06, 2008 at 12:19:58AM -0400, Alexander Strange wrote:
>> This adds skal's sse2 idct and uses it as the xvid idct when  
>> available.
>>
>> I merged two shuffles into the permutation and changed the zero- 
>> skipping
>> some - it's fastest in MMX and not really worth doing for the first  
>> three
>> rows. Their right halfs are still usually all zero, but adding the  
>> branch
>> to check for it is a net loss. The best thing for speed would be  
>> switching
>> IDCTs by counting the last nonzero coefficient position, but that's
>> something for later.
>>
>> xvididctheader - makes a new header so I don't add any more extern
>> declarations in .c files.
>> sse2-permute - the new permutation; it might not have a specific  
>> enough
>> name, but it should work as well for simpleidct as this if I can  
>> get back
>> to that.
>> sse2-xvid-idct.diff + idct_sse2_xvid.c - the IDCT
>
> Can you also post dct-test -i 0/1/2 output please!

All the same as xvidmmx (also checked by comparing actual clips):

0

   -98  -125  -197  -115  -216  -104  -140  -105
   108   117   137   123   107    94   105   109
   199   168   110   113   135    74    85   110
  -156   -94   -87  -102   -91   -94   -73   -97
  -204   -91  -110   -98   -98   -95   -81  -109
   114   114    84   104    77   120   117    94
   150   125   110   102   115   119    83    99
  -128   -91   -78   -87   -83   -81   -95  -126
IDCT XVID-MMX2: err_inf=1 err2=0.00919531 syserr=0.01080000 maxout=260  
blockSumErr=5
IDCT XVID-MMX2: 6672.4 kdct/s

   -98  -125  -197  -115  -216  -104  -140  -105
   108   117   137   123   107    94   105   109
   199   168   110   113   135    74    85   110
  -156   -94   -87  -102   -91   -94   -73   -97
  -204   -91  -110   -98   -98   -95   -81  -109
   114   114    84   104    77   120   117    94
   150   125   110   102   115   119    83    99
  -128   -91   -78   -87   -83   -81   -95  -126
IDCT XVID-SSE2: err_inf=1 err2=0.00919531 syserr=0.01080000 maxout=260  
blockSumErr=5
IDCT XVID-SSE2: 7549.3 kdct/s

-
1

   318   380   372   344   346   388   356   349
   116   152   159   136   127   150   129   137
   147   147   144   153   144   149   154   138
  -143  -149  -124  -118  -143  -155  -150  -148
  -139  -137  -119  -135  -119  -163  -138  -134
   211   189   265   233   187   209   206   215
   231   208   212   267   193   221   230   213
    15    16    63    54    35    51    46    61
IDCT XVID-MMX2: err_inf=1 err2=0.01372969 syserr=0.01940000 maxout=241  
blockSumErr=48
IDCT XVID-MMX2: 6665.2 kdct/s

   318   380   372   344   346   388   356   349
   116   152   159   136   127   150   129   137
   147   147   144   153   144   149   154   138
  -143  -149  -124  -118  -143  -155  -150  -148
  -139  -137  -119  -135  -119  -163  -138  -134
   211   189   265   233   187   209   206   215
   231   208   212   267   193   221   230   213
    15    16    63    54    35    51    46    61
IDCT XVID-SSE2: err_inf=1 err2=0.01372969 syserr=0.01940000 maxout=241  
blockSumErr=48
IDCT XVID-SSE2: 9357.7 kdct/s


-
2

     0  2474     0     0     0     0     0  2474
     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0
     0     0     0 -2478     0     0     0     0
     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0
  2474     0     0     0     0     0     0     0
IDCT XVID-MMX2: err_inf=1 err2=0.00773437 syserr=0.12390000 maxout=256  
blockSumErr=3
IDCT XVID-MMX2: 6683.2 kdct/s

     0  2474     0     0     0     0     0  2474
     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0
     0     0     0 -2478     0     0     0     0
     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0
  2474     0     0     0     0     0     0     0
IDCT XVID-SSE2: err_inf=1 err2=0.00773437 syserr=0.12390000 maxout=256  
blockSumErr=3
IDCT XVID-SSE2: 9362.3 kdct/s





More information about the ffmpeg-devel mailing list