[FFmpeg-devel] MMX accelerated DSP functions for VC1/WMV3 decoders

Christophe GISQUET christophe.gisquet
Sat Jun 30 14:37:53 CEST 2007


Hello,

the attached patch provides some mmx functions (pshuw from mmx2 would
only marginally be faster) for those decoders. They could also be used
in the encoder, but I didn't bother with this, as there are probably
people more fit than me to accommodate this with the build system.

Tests and benchmarks were performed on
http://samples.mplayerhq.hu/V-codecs/WMV9/highdef/Robotica_720.wmv

I have tested decoding accuracy with a cmp (as I don't know nor plan to
introduce this in the regression tests), and used the following command
to measure speed/profile:
./ffmpeg -benchmark -i Robotica_720.wmv -an -f rawvideo -y /dev/null

And now for the row figures...
without patch, utime: 7.44 7.35 7.16 7.37 7.27
with:                 5.32 5.37 5.33 5.31 5.41

And the profiling (oprofile results)...
without patch:
samples  %        symbol name
    129666   40.5939  vc1_mspel_mc
    45812    14.3422  vc1_inv_trans_8x8_c
    26404     8.2662  vc1_decode_p_blocks
    21967     6.8771  put_no_rnd_h264_chroma_mc8_c
    21336     6.6796  vc1_decode_ac_coeff
    8582      2.6867  vc1_decode_intra_block
    8273      2.5900  vc1_decode_p_block
    8157      2.5537  clear_blocks_mmx
    6896      2.1589  put_h264_chroma_mc8_mmx
    6748      2.1126  vc1_inv_trans_8x4_c
    6254      1.9579  vc1_inv_trans_4x8_c

with:
    samples  %        symbol name
    6095     17.8169  vc1_inv_trans_8x8_c
    3769     11.0176  vc1_decode_p_blocks
    3565     10.4212  put_no_rnd_h264_chroma_mc8_c
    3380      9.8804  vc1_decode_ac_coeff
    1365      3.9902  vc1_inv_trans_8x4_c
    1348      3.9405  vc1_decode_p_block
    1260      3.6832  clear_blocks_mmx
    1146      3.3500  put_h264_chroma_mc8_mmx
    1046      3.0577  vc1_inv_trans_4x8_c
    938       2.7420  ff_emulated_edge_mc
    849       2.4818  ff_put_vc1_mspel_mc22_mmx
    791       2.3123  vc1_mc_1mv
    774       2.2626  vc1_decode_intra_block
    746       2.1807  ff_put_vc1_mspel_mc00_mmx
    698       2.0404  ff_put_vc1_mspel_mc20_mmx
    576       1.6838  ff_put_vc1_mspel_mc21_mmx
    500       1.4616  ff_put_vc1_mspel_mc23_mmx
    481       1.4061  ff_put_vc1_mspel_mc12_mmx
    476       1.3914  ff_put_vc1_mspel_mc32_mmx
    339       0.9910  ff_put_vc1_mspel_mc31_mmx
    334       0.9764  ff_put_vc1_mspel_mc11_mmx
    334       0.9764  ff_put_vc1_mspel_mc13_mmx
    333       0.9734  ff_put_vc1_mspel_mc02_mmx
    318       0.9296  ff_init_block_index
    305       0.8916  ff_put_vc1_mspel_mc10_mmx
    304       0.8887  add_pixels_clamped_mmx
    274       0.8010  ff_put_vc1_mspel_mc33_mmx
    267       0.7805  ff_put_vc1_mspel_mc30_mmx
    233       0.6811  vc1_decode_i_blocks
    180       0.5262  ff_put_vc1_mspel_mc01_mmx
    165       0.4823  ff_put_vc1_mspel_mc03_mmx
    154       0.4502  vc1_inv_trans_4x4_c

The new total for the ff_put_vc1_mspel_mc* functions is now just above
20%. There is some unoptimal stuff left of course, like filter 0 being
just a source/destination modification, put_pixels8_mmx being
duplicated, or some useless register loads, but code complexity would
increase beyond what I'm willing to put in.

vc1_inv_trans_8x8_c would be a next follow-up candidate but the code
looks bothersome. On the other hand, put_no_rnd_h264_chroma_mc8_c would
benefit other codecs. I do have an mmx1/2 implementation for it, but I'm
holding it until this patch gets in svn, if it ever does.

Best regards,
Christophe GISQUET
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vc1dsp_mmx.diff
Type: text/x-patch
Size: 14848 bytes
Desc: not available
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070630/40e0816f/attachment.bin>



More information about the ffmpeg-devel mailing list