[FFmpeg-devel] [PATCH] swscale/ppc: VSX-optimize hscale_fast

Lauri Kasanen cand at gmx.com
Tue Apr 30 14:38:28 EEST 2019


On Wed, 24 Apr 2019 14:02:16 +0300
Lauri Kasanen <cand at gmx.com> wrote:

> ./ffmpeg -f lavfi -i yuvtestsrc=duration=1:size=1200x1440 -sws_flags fast_bilinear \
>         -s 2400x720 -f rawvideo -vframes 5 -pix_fmt abgr -nostats test.raw
>
> 4.27 speedup for hyscale_fast:
>   24796 UNITS in hyscale_fast,    4096 runs,      0 skips
>    5797 UNITS in hyscale_fast,    4096 runs,      0 skips
>
> 4.48 speedup for hcscale_fast:
>   19911 UNITS in hcscale_fast,    4095 runs,      1 skips
>    4437 UNITS in hcscale_fast,    4096 runs,      0 skips
>
> Signed-off-by: Lauri Kasanen <cand at gmx.com>
> ---
>  libswscale/ppc/swscale_vsx.c | 196 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 196 insertions(+)
>
> This has the same limit as the x86 version, same width or larger only.
> Shrinking would require a gather load, which doesn't exist on PPC and is slow
> even on x86 AVX. I tried a manual gather load, and the vector function was 20%
> slower than C.

Applying.

- Lauri


More information about the ffmpeg-devel mailing list