[FFmpeg-devel] DSP function ARM NEON patches for hevc

Michael Niedermayer michael at niedermayer.cc
Tue Feb 17 11:44:41 CET 2015


On Tue, Feb 17, 2015 at 07:33:04AM +0000, Tomperi Seppo wrote:
> 
> > On 16 Feb 2015, at 19:54, Michael Niedermayer <michael at niedermayer.cc> wrote:
> > 
> > On Mon, Feb 16, 2015 at 12:47:36PM +0000, Tomperi Seppo wrote:
> >> More NEON optimizations for testing. fate-hevc passes on Tegra K1, but these haven't been tested for NEON clobbering.
> >> 
> >> -Seppo
> >> 
> >> ________________________________________
> >> From: Tomperi Seppo
> >> Sent: Monday, February 16, 2015 1:30 PM
> >> To: Michael Niedermayer
> >> Cc: Michael Niedermayer; FFmpeg development discussions and patches; Mickaël Raulet
> >> Subject: RE: [FFmpeg-devel]  DSP function ARM NEON patches for hevc
> >> 
> >> Hi Michael,
> >> 
> >> Here is a totally shot in a dark fix attempt for NEON register clobbering for deblocking. Could you test it with qemu and check if it works.
> >> 
> >> 
> >> -Seppo
> >> 
> >> ________________________________________
> >> From: Michael Niedermayer [michael at niedermayer.cc]
> >> Sent: Monday, February 16, 2015 3:28 AM
> >> To: Tomperi Seppo
> >> Cc: Michael Niedermayer; FFmpeg development discussions and patches; Mickaël Raulet
> >> Subject: Re: [FFmpeg-devel]  DSP function ARM NEON patches for hevc
> >> 
> >> Hi
> >> 
> >> On Sun, Feb 15, 2015 at 08:31:32PM +0000, Tomperi Seppo wrote:
> >>> Hi!
> >>> 
> >>> The reason is chroma deblocking which is using q4 without pushing it to stack. :/
> >>> Unfortunately I am in Geneve this week and don't have ARM linux board with me so it is not easy to test.
> >>> 
> >>> Mickael Raulet: maybe guys at INSA could run tests this week if I make a fix? Could you ask?
> >> 
> >> If they cant, then i probably can test it too if its a patch which
> >> applies cleanly to ffmpeg and testing fate-hevc with
> >> --enable-neon-clobber-test under qemu is what is needed
> >> i could test on a arm board too if needed
> >> 
> >> 
> >>> 
> >>> I also have SAO, qpel and epel NEON patches for latest FFmpeg. They pass fate-hevc on Jetson TK1, but should be iOS and clobber checked.
> >>> 
> >>> -Seppo
> >>> 
> >>> 
> >>> ________________________________________
> >>> From: Michael Niedermayer [michaelni at gmx.at]
> >>> Sent: Friday, February 13, 2015 5:38 PM
> >>> To: FFmpeg development discussions and patches
> >>> Cc: Tomperi Seppo; Mickaël Raulet
> >>> Subject: Re: [FFmpeg-devel]  DSP function ARM NEON patches for hevc
> >>> 
> >>> On Thu, Feb 05, 2015 at 02:22:28PM +0100, Mickaël Raulet wrote:
> >>>> Michael,
> >>>> 
> >>>> Please find some commits that can be cherry picked from
> >>>> https://github.com/OpenHEVC/FFmpeg/commits/ffmpeg_patch
> >>>> 
> >>> 
> >>>> Optimized deblocking filter (8bits only)
> >>>> 1b9ee47d2f43b0a029a9468233626102eb1473b8
> >>> 
> >>> this breaks the neon clobber test see:
> >>> fate.ffmpeg.org/report.cgi?time=20150211030204&slot=armv7l-panda-gcc4.6-cortexa8-clobber
> >>> 
> >>> [...]
> >>> --
> >>> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> >>> 
> >>> The worst form of inequality is to try to make unequal things equal.
> >>> -- Aristotle
> >>> 
> >> 
> >> --
> >> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> >> 
> >> Opposition brings concord. Out of discord comes the fairest harmony.
> >> -- Heraclitus
> > 
> >> Makefile            |    3 
> >> hevcdsp_init_neon.c |  159 ++++++++
> >> hevcdsp_qpel_neon.S |  999 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> 3 files changed, 1160 insertions(+), 1 deletion(-)
> >> 9fb0b3c33edf085845b7a0fba3ca77d1ba55dd6c  0001-hevcdsp-ARM-NEON-optimized-qpel-functions.patch
> >> From ce06cb2bea4b051995608b11651b185e7a825a4c Mon Sep 17 00:00:00 2001
> >> From: Seppo Tomperi <seppo.tomperi at vtt.fi>
> >> Date: Wed, 11 Feb 2015 10:20:26 +0000
> >> Subject: [PATCH] hevcdsp: ARM NEON optimized qpel functions
> >> 
> >> ---
> >> libavcodec/arm/Makefile            |   3 +-
> >> libavcodec/arm/hevcdsp_init_neon.c | 159 ++++++
> >> libavcodec/arm/hevcdsp_qpel_neon.S | 999 +++++++++++++++++++++++++++++++++++++
> >> 3 files changed, 1160 insertions(+), 1 deletion(-)
> >> create mode 100644 libavcodec/arm/hevcdsp_qpel_neon.S
> > 
> > 
> > seems to fail building:
> > 
> >        libavformat/utils.o
> > CC      libavcodec/arm/hevcdsp_init_neon.o
> > AS      libavcodec/arm/hevcdsp_qpel_neon.o
> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S: Assembler messages:
> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: expected } -- `vld1.32 {d0[0]d0[1]d1[0]d1[1]},[r2],r3'
> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or quad precision register expected -- `vld1.32 {},[r2],r3'
> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or quad precision register expected -- `vld1.32 {},[r2],r3'
> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or quad precision register expected -- `vld1.32 {},[r2],r3'
> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: expected } -- `vst1.32 {d0[0]d0[1]d1[0]d1[1]},[r0],r1'
> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or quad precision register expected -- `vst1.32 {},[r0],r1'
> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or quad precision register expected -- `vst1.32 {},[r0],r1'
> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or quad precision register expected -- `vst1.32 {},[r0],r1'
> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:994: Error: expected } -- `vld1.32 {d1[0]d2},[r2]'
> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:994: Error: Neon double or quad precision register expected -- `vld1.32 {},[r2]'
> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:994: Error: expected } -- `vst1.32 {d1[0]d2},[r0]'
> > ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:994: Error: Neon double or quad precision register expected -- `vst1.32 {},[r0]'
> > make: *** [libavcodec/arm/hevcdsp_qpel_neon.o] Error 1
> > make: *** Waiting for unfinished jobs....
> > 
> > 
> 
> These macros compiled for me with Jetson TK1 toolchain and with latest GAS preprocessor, so I thought they are finally ok.
> But it looks like passing register lists to macros is not handled well by all preprocessors.

plain "arm-linux-gnueabi-gcc-4.5 (Ubuntu/Linaro 4.5.3-12ubuntu2) 4.5.3"
here, with no preprocessor


> 
> These are quite simple functions copying varying width blocks of pixels using NEON. I could either write out the macros (lots of almost identical functions) or leave the optimisation out totally for now. Or do you have any other ideas?

the following seems to fix it, but i sure do not know why these 2
lines failed while the others do not seem to fail
adding , to all works as well

diff --git a/libavcodec/arm/hevcdsp_qpel_neon.S b/libavcodec/arm/hevcdsp_qpel_neon.S
index 14116a6..7b0df2e 100644
--- a/libavcodec/arm/hevcdsp_qpel_neon.S
+++ b/libavcodec/arm/hevcdsp_qpel_neon.S
@@ -989,9 +989,9 @@ function ff_hevc_put_qpel_uw_pixels_w\width\()_neon_8, export=1
 endfunc
 .endm

-put_qpel_uw_pixels    4 d0[0] d0[1] d1[0] d1[1]
+put_qpel_uw_pixels    4 d0[0], d0[1], d1[0], d1[1]
 put_qpel_uw_pixels    8 d0 d1 d2 d3
-put_qpel_uw_pixels_m 12 d0 d1[0] d2 d3[0]
+put_qpel_uw_pixels_m 12 d0, d1[0], d2, d3[0]
 put_qpel_uw_pixels   16 q0 q1 q2 q3
 put_qpel_uw_pixels   24 d0-d2 d3-d5 d16-d18 d19-d21
 put_qpel_uw_pixels   32 q0-q1 q2-q3 q8-q9 q10-q11

[...]

-- 
Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Awnsering whenever a program halts or runs forever is
On a turing machine, in general impossible (turings halting problem).
On any real computer, always possible as a real computer has a finite number
of states N, and will either halt in less than N cycles or never halt.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <https://ffmpeg.org/pipermail/ffmpeg-devel/attachments/20150217/1941e120/attachment.asc>


More information about the ffmpeg-devel mailing list