[FFmpeg-devel] [PATCH] avutil/mathematics: speed up av_gcd by using Stein's binary GCD algorithm

Michael Niedermayer michael at niedermayer.cc
Sun Oct 11 00:50:06 CEST 2015


On Sat, Oct 10, 2015 at 11:32:06PM +0200, Henrik Gramner wrote:
> On Sat, Oct 10, 2015 at 11:06 PM, Ganesh Ajjanagadde
> <gajjanagadde at gmail.com> wrote:
> > This uses Stein's binary GCD algorithm:
> > https://en.wikipedia.org/wiki/Binary_GCD_algorithm
> > to get a reported 1.7-2.1x speedup over Euclidean GCD on standard architectures.
> > Have not benchmarked, so I can't comment
> 
> Before you submit a patch that's supposed to make something faster,
> you should benchmark it to verify that it is in fact faster. Do this
> with inputs of various sizes on both 32- and 64-bit architectures and
> both with and without compilers that support __builtin_ctzll(v).

a quick benchmark with
--- a/tests/fate-run.sh
+++ b/tests/fate-run.sh
@@ -296,7 +296,7 @@ if test $err != 0 && test $gen != "no" ; then
 fi

 if test $err = 0; then
-    rm -f $outfile $errfile $cmpfile $cleanfiles
+    rm -f $outfile $cmpfile $cleanfiles

and START/STOP_TIMER and running make fate -j12 -k
shows that the new code appears significantly faster (x86-64 sandybridge + gcc)

old:
aac-al18_44.err:   1748 decicycles in gcd,    4096 runs,      0 skips
aac-am00_88.err:   2496 decicycles in gcd,    4096 runs,      0 skips
aac-am00_88.err:   2498 decicycles in gcd,    8192 runs,      0 skips
aac-er_eld1001np_44_ep0.err:   2433 decicycles in gcd,    1023 runs,      1 skips
aac-er_eld1001np_44_ep0.err:   2467 decicycles in gcd,    2047 runs,      1 skips
aac-er_eld1001np_44_ep0.err:   2352 decicycles in gcd,    4095 runs,      1 skips
aac-er_eld2000np_48_ep0.err:   1737 decicycles in gcd,    4096 runs,      0 skips
aac-er_eld2100np_48_ep0.err:   1052 decicycles in gcd,    4096 runs,      0 skips
aac-fixed-al18_44.err:   1764 decicycles in gcd,    4096 runs,      0 skips
aac-fixed-er_eld1001np_44_ep0.err:   2484 decicycles in gcd,    4096 runs,      0 skips
aac-fixed-er_eld2000np_48_ep0.err:   1749 decicycles in gcd,    4096 runs,      0 skips
acodec-adpcm-adx.err:   1411 decicycles in gcd,    4096 runs,      0 skips
acodec-adpcm-adx.err:   1392 decicycles in gcd,    8192 runs,      0 skips
acodec-adpcm-adx.err:   1403 decicycles in gcd,   16384 runs,      0 skips
acodec-adpcm-adx-trellis.err:   1414 decicycles in gcd,    4096 runs,      0 skips
acodec-adpcm-adx-trellis.err:   1416 decicycles in gcd,    8192 runs,      0 skips
acodec-adpcm-adx-trellis.err:   1417 decicycles in gcd,   16384 runs,      0 skips
acodec-adpcm-ima_qt.err:   1399 decicycles in gcd,    4096 runs,      0 skips
acodec-adpcm-ima_qt.err:   1406 decicycles in gcd,    8192 runs,      0 skips
acodec-adpcm-ima_qt-trellis.err:   1487 decicycles in gcd,    4096 runs,      0 skips
acodec-adpcm-ima_qt-trellis.err:   1489 decicycles in gcd,    8192 runs,      0 skips
adpcm_ms-stereo.err:   2751 decicycles in gcd,    4096 runs,      0 skips
alac-24-level-0.err:    684 decicycles in gcd,    4096 runs,      0 skips
alac-24-level-1.err:    684 decicycles in gcd,    4096 runs,      0 skips
alac-24-level-2.err:    696 decicycles in gcd,    4096 runs,      0 skips
alac-24-lpc-orders.err:    680 decicycles in gcd,    4096 runs,      0 skips
amrwb-8k85.err:   1052 decicycles in gcd,    1023 runs,      1 skips
dca-core.err:   1800 decicycles in gcd,     511 runs,      1 skips
dca-core.err:   1850 decicycles in gcd,    1023 runs,      1 skips
dca-xll.err:   1439 decicycles in gcd,    4096 runs,      0 skips
filter-metadata-scenedetect.err:    794 decicycles in gcd,    4096 runs,      0 skips
filter-pixdesc-gbrp14le.err:    660 decicycles in gcd,     127 runs,      1 skips
filter-pixfmts-hflip.err:    761 decicycles in gcd,      63 runs,      1 skips
filter-pixfmts-hflip.err:    731 decicycles in gcd,     127 runs,      1 skips
flac-24-comp-8.err:    665 decicycles in gcd,    4096 runs,      0 skips
h264-conformance-canl3_sony_c.err:    824 decicycles in gcd,    1023 runs,      1 skips
h264-conformance-canl3_sony_c.err:    772 decicycles in gcd,    2047 runs,      1 skips
h264-conformance-frext-hpcafl_bcrm_c.err:    767 decicycles in gcd,    4096 runs,      0 skips
h264-conformance-frext-hpcaflnl_bcrm_c.err:    724 decicycles in gcd,    4096 runs,      0 skips
h264-conformance-frext-hpcamapalq_bcrm_b.err:    812 decicycles in gcd,    4096 runs,      0 skips
h264-conformance-frext-hpcvfl_bcrm_a.err:    761 decicycles in gcd,    4096 runs,      0 skips
h264-conformance-frext-hpcvflnl_bcrm_a.err:    764 decicycles in gcd,    4096 runs,      0 skips
h264-conformance-ls_sva_d.err:    815 decicycles in gcd,    4096 runs,      0 skips
h264-conformance-ls_sva_d.err:    820 decicycles in gcd,    8192 runs,      0 skips
h264-conformance-ls_sva_d.err:    808 decicycles in gcd,   16383 runs,      1 skips
lagarith-yuy2.err:    789 decicycles in gcd,      15 runs,      1 skips
lagarith-yuy2.err:    726 decicycles in gcd,      31 runs,      1 skips
lossless-meridianaudio.err:   1268 decicycles in gcd,    4096 runs,      0 skips
lossless-meridianaudio.err:   1279 decicycles in gcd,    8189 runs,      3 skips
lossless-meridianaudio.err:   1305 decicycles in gcd,   16381 runs,      3 skips
lossless-truehd-5.1-downmix-2.0.err:   1036 decicycles in gcd,    4096 runs,      0 skips
lossless-truehd-5.1.err:   1016 decicycles in gcd,    4096 runs,      0 skips
opus-testvector01.err:   1246 decicycles in gcd,    4096 runs,      0 skips
opus-testvector05.err:   1025 decicycles in gcd,    4096 runs,      0 skips
opus-testvector07.err:   1002 decicycles in gcd,    4096 runs,      0 skips
opus-testvector07.err:   1008 decicycles in gcd,    8191 runs,      1 skips
opus-testvector10.err:   1241 decicycles in gcd,    2047 runs,      1 skips
ra-288.err:    656 decicycles in gcd,    4096 runs,      0 skips
sipr-16k.err:    681 decicycles in gcd,    4096 runs,      0 skips
swr-resample-fltp-44100-2626.err:    704 decicycles in gcd,     127 runs,      1 skips
twinvq.err:   2410 decicycles in gcd,    4096 runs,      0 skips
vqf-demux.err:   2441 decicycles in gcd,     511 runs,      1 skips
vqf-demux.err:   2506 decicycles in gcd,    1023 runs,      1 skips
vqf-demux.err:   2478 decicycles in gcd,    2047 runs,      1 skips
vqf-demux.err:   2405 decicycles in gcd,    4095 runs,      1 skips
vsynth2-r210.err:    697 decicycles in gcd,     127 runs,      1 skips
vsynth2-r210.err:    698 decicycles in gcd,     255 runs,      1 skips
vsynth_lena-ffv1-v0.err:    723 decicycles in gcd,     127 runs,      1 skips
vsynth_lena-mpeg2-ilace.err:   1655 decicycles in gcd,    2047 runs,      1 skips
wavpack-cuesheet.err:   1199 decicycles in gcd,    4096 runs,      0 skips
wavpack-cuesheet.err:   1173 decicycles in gcd,    8192 runs,      0 skips

new:

aac-al18_44.err:    742 decicycles in gcd,    4096 runs,      0 skips
aac-am00_88.err:    743 decicycles in gcd,    4096 runs,      0 skips
aac-am00_88.err:    742 decicycles in gcd,    8192 runs,      0 skips
aac-er_eld1001np_44_ep0.err:    757 decicycles in gcd,    4096 runs,      0 skips
aac-er_eld2000np_48_ep0.err:    768 decicycles in gcd,    4096 runs,      0 skips
aac-er_eld2100np_48_ep0.err:    598 decicycles in gcd,    4096 runs,      0 skips
aac-fixed-al18_44.err:    745 decicycles in gcd,    4096 runs,      0 skips
aac-fixed-er_eld1001np_44_ep0.err:    881 decicycles in gcd,    4096 runs,      0 skips
aac-fixed-er_eld2000np_48_ep0.err:    813 decicycles in gcd,    4096 runs,      0 skips
ac3-fixed-2.0.err:    462 decicycles in gcd,      31 runs,      1 skips
ac3-fixed-2.0.err:    548 decicycles in gcd,      63 runs,      1 skips
ac3-fixed-2.0.err:    586 decicycles in gcd,     127 runs,      1 skips
ac3-fixed-2.0.err:    590 decicycles in gcd,     255 runs,      1 skips
acodec-adpcm-adx.err:    736 decicycles in gcd,    4096 runs,      0 skips
acodec-adpcm-adx.err:    733 decicycles in gcd,    8192 runs,      0 skips
acodec-adpcm-adx.err:    739 decicycles in gcd,   16384 runs,      0 skips
acodec-adpcm-adx-trellis.err:    878 decicycles in gcd,    4096 runs,      0 skips
acodec-adpcm-adx-trellis.err:    877 decicycles in gcd,    8192 runs,      0 skips
acodec-adpcm-adx-trellis.err:    877 decicycles in gcd,   16384 runs,      0 skips
acodec-adpcm-ima_qt.err:    724 decicycles in gcd,    4096 runs,      0 skips
acodec-adpcm-ima_qt.err:    724 decicycles in gcd,    8192 runs,      0 skips
acodec-adpcm-ima_qt-trellis.err:    702 decicycles in gcd,    4096 runs,      0 skips
acodec-adpcm-ima_qt-trellis.err:    724 decicycles in gcd,    8192 runs,      0 skips
adpcm_ms-stereo.err:    736 decicycles in gcd,    4096 runs,      0 skips
alac-24-level-0.err:    406 decicycles in gcd,    4096 runs,      0 skips
alac-24-level-1.err:    415 decicycles in gcd,    4096 runs,      0 skips
alac-24-level-2.err:    418 decicycles in gcd,    4096 runs,      0 skips
alac-24-lpc-orders.err:    434 decicycles in gcd,    4096 runs,      0 skips
dca-xll.err:    953 decicycles in gcd,    4096 runs,      0 skips
delphine-cin-video.err:    786 decicycles in gcd,     511 runs,      1 skips
filter-metadata-scenedetect.err:    655 decicycles in gcd,    4096 runs,      0 skips
filter-pixfmts-il.err:    584 decicycles in gcd,     127 runs,      1 skips
filter-pixfmts-null.err:    866 decicycles in gcd,       3 runs,      1 skips
filter-pixfmts-null.err:    548 decicycles in gcd,       7 runs,      1 skips
filter-pixfmts-null.err:    671 decicycles in gcd,      15 runs,      1 skips
filter-pixfmts-null.err:    548 decicycles in gcd,      31 runs,      1 skips
filter-pixfmts-null.err:    504 decicycles in gcd,      63 runs,      1 skips
filter-pixfmts-null.err:    497 decicycles in gcd,     127 runs,      1 skips
filter-pixfmts-scale.err:    537 decicycles in gcd,      63 runs,      1 skips
filter-pixfmts-scale.err:    555 decicycles in gcd,     127 runs,      1 skips
flac-16-chmode-left_side.err:    899 decicycles in gcd,     510 runs,      2 skips
flac-24-comp-8.err:    366 decicycles in gcd,    4096 runs,      0 skips
h264-conformance-frext-hpcafl_bcrm_c.err:    583 decicycles in gcd,    4096 runs,      0 skips
h264-conformance-frext-hpcaflnl_bcrm_c.err:    691 decicycles in gcd,    4096 runs,      0 skips
h264-conformance-frext-hpcamapalq_bcrm_b.err:    735 decicycles in gcd,    4096 runs,      0 skips
h264-conformance-frext-hpcvfl_bcrm_a.err:    699 decicycles in gcd,    4096 runs,      0 skips
h264-conformance-frext-hpcvflnl_bcrm_a.err:    679 decicycles in gcd,    4096 runs,      0 skips
h264-conformance-ls_sva_d.err:    590 decicycles in gcd,    4096 runs,      0 skips
h264-conformance-ls_sva_d.err:    589 decicycles in gcd,    8192 runs,      0 skips
h264-conformance-ls_sva_d.err:    585 decicycles in gcd,   16383 runs,      1 skips
interplay-mve-16bit.err:    695 decicycles in gcd,     511 runs,      1 skips
lavf-ffm.err:    918 decicycles in gcd,    2047 runs,      1 skips
lossless-meridianaudio.err:    668 decicycles in gcd,    4096 runs,      0 skips
lossless-meridianaudio.err:    635 decicycles in gcd,    8192 runs,      0 skips
lossless-meridianaudio.err:    670 decicycles in gcd,   16384 runs,      0 skips
lossless-truehd-5.1-downmix-2.0.err:    511 decicycles in gcd,    4096 runs,      0 skips
lossless-truehd-5.1.err:    632 decicycles in gcd,    4096 runs,      0 skips
opus-testvector01.err:    849 decicycles in gcd,    4096 runs,      0 skips
opus-testvector05.err:    579 decicycles in gcd,    4096 runs,      0 skips
opus-testvector07.err:    669 decicycles in gcd,    4096 runs,      0 skips
opus-testvector07.err:    661 decicycles in gcd,    8192 runs,      0 skips
ra-288.err:    422 decicycles in gcd,    4096 runs,      0 skips
sipr-16k.err:    436 decicycles in gcd,    4096 runs,      0 skips
twinvq.err:    775 decicycles in gcd,    4096 runs,      0 skips
vqf-demux.err:    708 decicycles in gcd,    4096 runs,      0 skips



-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

He who knows, does not speak. He who speaks, does not know. -- Lao Tsu
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20151011/a62b6aa4/attachment.sig>


More information about the ffmpeg-devel mailing list