[FFmpeg-devel] [PATCH] sws/aarch64: add ff_hscale_8_to_15_neon

Clément Bœsch u at pkh.me
Thu Mar 24 14:40:49 CET 2016


On Thu, Mar 24, 2016 at 09:35:01AM -0400, Ronald S. Bultje wrote:
> Hi,
> 
> On Mar 24, 2016 8:28 AM, "Clément Bœsch" <u at pkh.me> wrote:
> >
> > From: Clément Bœsch <clement at stupeflix.com>
> >
> > ./ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf
> bench=start,scale=1024x1024,bench=stop -f null -
> >
> >     before: t:0.489726 avg:0.489883 max:0.491852 min:0.489482
> >     after:  t:0.256515 avg:0.256458 max:0.256999 min:0.253755
> > ---
> > Changes:
> > - FIX: not using the v8-v15 registers
> > - writing directly from the SIMD register (thx Martin)
> > - misc reordering
> >
> > I'm looking at the vscale part now.
> > ---
> >  libswscale/aarch64/Makefile   |  6 +++--
> >  libswscale/aarch64/hscale.S   | 59
> +++++++++++++++++++++++++++++++++++++++++++
> >  libswscale/aarch64/swscale.c  | 37 +++++++++++++++++++++++++++
> >  libswscale/swscale.c          |  2 ++
> >  libswscale/swscale_internal.h |  1 +
> >  libswscale/utils.c            |  4 ++-
> >  6 files changed, 106 insertions(+), 3 deletions(-)
> >  create mode 100644 libswscale/aarch64/hscale.S
> >  create mode 100644 libswscale/aarch64/swscale.c
> Do you intend to create special versions for specific filter widths (e.g.
> x86 has special versions for filter_width=4 and 8). That helped speed up
> the default filters (bicubic) a little more.
> 
> This version looks OK already for the default case.
> 

I don't need these cases immediately (my use case is filter size of 11 and
26), so no plan so far. I'm actually looking at yuv2planeX_8 to get more
impact on that specific case.

-- 
Clément B.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: not available
URL: <http://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20160324/0150db1b/attachment.sig>


More information about the ffmpeg-devel mailing list