[FFmpeg-devel] [PATCH 0/6] x86 SIMD for dirac 10-bit wavelet transforms

James Darnley jdarnley at obe.tv
Thu Jul 19 17:52:46 EEST 2018


I tested the speed gains by using ffmpeg to decode a 720p yuv422p10 file encoded
with the relevant transform.  The summary is below.

Haar
C:    119fps
SSE2: 204fps
AVX:  206fps
AVX2: 221fps

5_3
C:     94fps
SSE2: 118fps
AVX2: 121fps

9_7
C:     84fps
SSE2: 111fps
AVX2: 115fps

Is the AVX worth it in Haar?  Is the AVX2 worth it in the latter two?  I added
those later which is why they are separate patches.  I will squash them before
pushing if I keep them.

James Darnley (6):
  diracdec: add 10-bit Haar SIMD functions
  diracdec: add 10-bit Legall 5,3 (5_3) SIMD functions
  diracdec: add 10-bit Deslauriers-Dubuc 9,7 (9_7) vertical high-pass
    function
  diracdec: avx2 legall
  diracdec: avx2 dd97
  diracdec: increase rodata alignment for avx2

 libavcodec/dirac_dwt.c                |   7 +-
 libavcodec/dirac_dwt.h                |   1 +
 libavcodec/x86/Makefile               |   6 +-
 libavcodec/x86/dirac_dwt_10bit.asm    | 209 +++++++++++++++++++++++++
 libavcodec/x86/dirac_dwt_init_10bit.c | 210 ++++++++++++++++++++++++++
 5 files changed, 430 insertions(+), 3 deletions(-)
 create mode 100644 libavcodec/x86/dirac_dwt_10bit.asm
 create mode 100644 libavcodec/x86/dirac_dwt_init_10bit.c

-- 
2.17.1



More information about the ffmpeg-devel mailing list