[Ffmpeg-devel] VP3/Theora Perfection

Måns Rullgård mru
Thu May 19 12:41:22 CEST 2005


Mike Melanson <mike at multimedia.cx> writes:

> Hi,
> 	I have replaced unpack_token() with a series of lookup tables
> 	in vp3.c. Now vp3data.h has more lines than vp3.c. Again,
> 	please test as I do not have great testing facilities right
> 	now. However, I did run a series of tests that validated a
> 	bunch of decoded tokens against the old function.
>
> 	Numbers for the speed freaks:
>
> [original]
> 1223 dezicycles in unpack_token, 32757 runs, 11 skips
> 1202 dezicycles in unpack_token, 65512 runs, 24 skips
> [new]
> 845 dezicycles in unpack_token, 32735 runs, 33 skips
> 841 dezicycles in unpack_token, 65466 runs, 70 skips
>
> 	What should I optimize next?

Perhaps some profiling data can give some hints:

samples  %        image name               symbol name
79906    20.6758  libc-2.3.4.so            (no symbols)
64232    16.6201  libavcodec-0.4.9-pre1.so apply_loop_filter
62827    16.2566  libavcodec-0.4.9-pre1.so unpack_vlcs
58066    15.0247  libavcodec-0.4.9-pre1.so render_fragments
26620     6.8880  libavcodec-0.4.9-pre1.so put_pixels8_mmx
21309     5.5137  libavcodec-0.4.9-pre1.so ff_vp3_idct_sse2
18442     4.7719  libavcodec-0.4.9-pre1.so reverse_dc_prediction
8021      2.0754  libavcodec-0.4.9-pre1.so unpack_superblocks
6187      1.6009  libavcodec-0.4.9-pre1.so __udivdi3
5489      1.4203  libavcodec-0.4.9-pre1.so unpack_vectors
5093      1.3178  libavcodec-0.4.9-pre1.so unpack_modes
4986      1.2901  libavcodec-0.4.9-pre1.so put_no_rnd_pixels8_l2_c
4801      1.2423  libavcodec-0.4.9-pre1.so vp3_decode_frame
4523      1.1703  libavcodec-0.4.9-pre1.so ff_vp3_idct_add_sse2
2342      0.6060  libavcodec-0.4.9-pre1.so put_no_rnd_pixels8_y2_mmx2
2094      0.5418  libavcodec-0.4.9-pre1.so put_no_rnd_pixels8_x2_mmx2

Any idea what is being called in libc?  I guess it's memcpy and/or
memset.

-- 
M?ns Rullg?rd
mru at inprovide.com





More information about the ffmpeg-devel mailing list