[Ffmpeg-cvslog] CVS: ffmpeg/libavcodec dsputil.c, 1.121, 1.122 dsputil.h, 1.114, 1.115 h264.c, 1.127, 1.128

Loren Merritt lorenm
Wed May 18 12:13:18 CEST 2005


On Wed, 18 May 2005, Guillaume POIRIER wrote:

> On 5/18/05, Loren Merritt CVS <lorenm at mplayerhq.hu> wrote:
>> Update of /cvsroot/ffmpeg/ffmpeg/libavcodec
>> In directory mail:/var2/tmp/cvs-serv8431
>>
>> Modified Files:
>>         dsputil.c dsputil.h h264.c
>> Log Message:
>> H.264 deblocking optimizations (mmx for chroma_bS4 case, convert existing cases to 8-bit math)
>
> Out of curiosity, do you know the speed-up brought by that optimization?
>
> If you want, I can try to benchmark it with Apple's HD trailers.

Benchmarked on an AMD Barton 2500, using DVD-res videos of quant 20-30:

total decode speed: +4%

2287 dezicycles in loop_vl_16bit, 1048386 runs, 190 skips
1644 dezicycles in loop_vl_8bit, 1048318 runs, 258 skips

3403 dezicycles in loop_hl_16bit, 1048309 runs, 267 skips
3036 dezicycles in loop_hl_8bit, 1048287 runs, 289 skips

  867 dezicycles in loop_vc_16bit, 1048389 runs, 187 skips
  538 dezicycles in loop_vc_8bit, 1047827 runs, 749 skips

1374 dezicycles in loop_hc_16bit, 1048419 runs, 157 skips
1176 dezicycles in loop_hc_8bit, 1048408 runs, 168 skips

  707 dezicycles in loop_vci_16bit, 1048394 runs, 182 skips
  428 dezicycles in loop_vci_8bit, 1048281 runs, 295 skips

1543 dezicycles in loop_hci_16bit, 1048151 runs, 425 skips
1069 dezicycles in loop_hci_8bit, 1047907 runs, 669 skips

note: This is not to suggest that they are called equally often; luma is 
only slightly more used than chroma, but chroma_intra is much rarer 
(~10x or more depending on the video). 
Luma_intra is still not mmxed, because it is the most complicated 
case, and also as rare as chroma_intra.

You may also note that the time needed to transpose vertical edges now 
exceeds the time needed to filter them. There are more potential 
optimizations there.

--Loren Merritt





More information about the ffmpeg-cvslog mailing list