[FFmpeg-devel] [PATCH 0/6] x86 SIMD for dirac 10-bit wavelet transforms

Rostislav Pehlivanov atomnuker at gmail.com
Thu Jul 19 18:23:26 EEST 2018


On 19 July 2018 at 15:52, James Darnley <jdarnley at obe.tv> wrote:

> I tested the speed gains by using ffmpeg to decode a 720p yuv422p10 file
> encoded
> with the relevant transform.  The summary is below.
>
> Haar
> C:    119fps
> SSE2: 204fps
> AVX:  206fps
> AVX2: 221fps
>
> 5_3
> C:     94fps
> SSE2: 118fps
> AVX2: 121fps
>
> 9_7
> C:     84fps
> SSE2: 111fps
> AVX2: 115fps
>
> Is the AVX worth it in Haar?  Is the AVX2 worth it in the latter two?  I
> added
> those later which is why they are separate patches.  I will squash them
> before
> pushing if I keep them.
>
> James Darnley (6):
>   diracdec: add 10-bit Haar SIMD functions
>   diracdec: add 10-bit Legall 5,3 (5_3) SIMD functions
>   diracdec: add 10-bit Deslauriers-Dubuc 9,7 (9_7) vertical high-pass
>     function
>   diracdec: avx2 legall
>   diracdec: avx2 dd97
>   diracdec: increase rodata alignment for avx2
>
>  libavcodec/dirac_dwt.c                |   7 +-
>  libavcodec/dirac_dwt.h                |   1 +
>  libavcodec/x86/Makefile               |   6 +-
>  libavcodec/x86/dirac_dwt_10bit.asm    | 209 +++++++++++++++++++++++++
>  libavcodec/x86/dirac_dwt_init_10bit.c | 210 ++++++++++++++++++++++++++
>  5 files changed, 430 insertions(+), 3 deletions(-)
>  create mode 100644 libavcodec/x86/dirac_dwt_10bit.asm
>  create mode 100644 libavcodec/x86/dirac_dwt_init_10bit.c
>
> --
> 2.17.1
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>

Could you provide standard overall transform results using START/STOP_TIMER
rather than overall decoding speed?
Coefficients sizes and therefore golomb unpacking speed changes with
respect to the transform so potentially there could be somewhat of a
bottleneck on decoding before the inverse transform.


More information about the ffmpeg-devel mailing list