[FFmpeg-devel] DSP function ARM NEON patches for hevc

Tomperi Seppo Seppo.Tomperi at vtt.fi
Tue Feb 17 08:33:04 CET 2015


> On 16 Feb 2015, at 19:54, Michael Niedermayer <michael at niedermayer.cc> wrote:
> 
> On Mon, Feb 16, 2015 at 12:47:36PM +0000, Tomperi Seppo wrote:
>> More NEON optimizations for testing. fate-hevc passes on Tegra K1, but these haven't been tested for NEON clobbering.
>> 
>> -Seppo
>> 
>> ________________________________________
>> From: Tomperi Seppo
>> Sent: Monday, February 16, 2015 1:30 PM
>> To: Michael Niedermayer
>> Cc: Michael Niedermayer; FFmpeg development discussions and patches; Mickaël Raulet
>> Subject: RE: [FFmpeg-devel]  DSP function ARM NEON patches for hevc
>> 
>> Hi Michael,
>> 
>> Here is a totally shot in a dark fix attempt for NEON register clobbering for deblocking. Could you test it with qemu and check if it works.
>> 
>> 
>> -Seppo
>> 
>> ________________________________________
>> From: Michael Niedermayer [michael at niedermayer.cc]
>> Sent: Monday, February 16, 2015 3:28 AM
>> To: Tomperi Seppo
>> Cc: Michael Niedermayer; FFmpeg development discussions and patches; Mickaël Raulet
>> Subject: Re: [FFmpeg-devel]  DSP function ARM NEON patches for hevc
>> 
>> Hi
>> 
>> On Sun, Feb 15, 2015 at 08:31:32PM +0000, Tomperi Seppo wrote:
>>> Hi!
>>> 
>>> The reason is chroma deblocking which is using q4 without pushing it to stack. :/
>>> Unfortunately I am in Geneve this week and don't have ARM linux board with me so it is not easy to test.
>>> 
>>> Mickael Raulet: maybe guys at INSA could run tests this week if I make a fix? Could you ask?
>> 
>> If they cant, then i probably can test it too if its a patch which
>> applies cleanly to ffmpeg and testing fate-hevc with
>> --enable-neon-clobber-test under qemu is what is needed
>> i could test on a arm board too if needed
>> 
>> 
>>> 
>>> I also have SAO, qpel and epel NEON patches for latest FFmpeg. They pass fate-hevc on Jetson TK1, but should be iOS and clobber checked.
>>> 
>>> -Seppo
>>> 
>>> 
>>> ________________________________________
>>> From: Michael Niedermayer [michaelni at gmx.at]
>>> Sent: Friday, February 13, 2015 5:38 PM
>>> To: FFmpeg development discussions and patches
>>> Cc: Tomperi Seppo; Mickaël Raulet
>>> Subject: Re: [FFmpeg-devel]  DSP function ARM NEON patches for hevc
>>> 
>>> On Thu, Feb 05, 2015 at 02:22:28PM +0100, Mickaël Raulet wrote:
>>>> Michael,
>>>> 
>>>> Please find some commits that can be cherry picked from
>>>> https://github.com/OpenHEVC/FFmpeg/commits/ffmpeg_patch
>>>> 
>>> 
>>>> Optimized deblocking filter (8bits only)
>>>> 1b9ee47d2f43b0a029a9468233626102eb1473b8
>>> 
>>> this breaks the neon clobber test see:
>>> fate.ffmpeg.org/report.cgi?time=20150211030204&slot=armv7l-panda-gcc4.6-cortexa8-clobber
>>> 
>>> [...]
>>> --
>>> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>>> 
>>> The worst form of inequality is to try to make unequal things equal.
>>> -- Aristotle
>>> 
>> 
>> --
>> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>> 
>> Opposition brings concord. Out of discord comes the fairest harmony.
>> -- Heraclitus
> 
>> Makefile            |    3 
>> hevcdsp_init_neon.c |  159 ++++++++
>> hevcdsp_qpel_neon.S |  999 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 3 files changed, 1160 insertions(+), 1 deletion(-)
>> 9fb0b3c33edf085845b7a0fba3ca77d1ba55dd6c  0001-hevcdsp-ARM-NEON-optimized-qpel-functions.patch
>> From ce06cb2bea4b051995608b11651b185e7a825a4c Mon Sep 17 00:00:00 2001
>> From: Seppo Tomperi <seppo.tomperi at vtt.fi>
>> Date: Wed, 11 Feb 2015 10:20:26 +0000
>> Subject: [PATCH] hevcdsp: ARM NEON optimized qpel functions
>> 
>> ---
>> libavcodec/arm/Makefile            |   3 +-
>> libavcodec/arm/hevcdsp_init_neon.c | 159 ++++++
>> libavcodec/arm/hevcdsp_qpel_neon.S | 999 +++++++++++++++++++++++++++++++++++++
>> 3 files changed, 1160 insertions(+), 1 deletion(-)
>> create mode 100644 libavcodec/arm/hevcdsp_qpel_neon.S
> 
> 
> seems to fail building:
> 
>        libavformat/utils.o
> CC      libavcodec/arm/hevcdsp_init_neon.o
> AS      libavcodec/arm/hevcdsp_qpel_neon.o
> ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S: Assembler messages:
> ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: expected } -- `vld1.32 {d0[0]d0[1]d1[0]d1[1]},[r2],r3'
> ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or quad precision register expected -- `vld1.32 {},[r2],r3'
> ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or quad precision register expected -- `vld1.32 {},[r2],r3'
> ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or quad precision register expected -- `vld1.32 {},[r2],r3'
> ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: expected } -- `vst1.32 {d0[0]d0[1]d1[0]d1[1]},[r0],r1'
> ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or quad precision register expected -- `vst1.32 {},[r0],r1'
> ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or quad precision register expected -- `vst1.32 {},[r0],r1'
> ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:992: Error: Neon double or quad precision register expected -- `vst1.32 {},[r0],r1'
> ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:994: Error: expected } -- `vld1.32 {d1[0]d2},[r2]'
> ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:994: Error: Neon double or quad precision register expected -- `vld1.32 {},[r2]'
> ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:994: Error: expected } -- `vst1.32 {d1[0]d2},[r0]'
> ffmpeg/libavcodec/arm/hevcdsp_qpel_neon.S:994: Error: Neon double or quad precision register expected -- `vst1.32 {},[r0]'
> make: *** [libavcodec/arm/hevcdsp_qpel_neon.o] Error 1
> make: *** Waiting for unfinished jobs....
> 
> 

These macros compiled for me with Jetson TK1 toolchain and with latest GAS preprocessor, so I thought they are finally ok.
But it looks like passing register lists to macros is not handled well by all preprocessors.

These are quite simple functions copying varying width blocks of pixels using NEON. I could either write out the macros (lots of almost identical functions) or leave the optimisation out totally for now. Or do you have any other ideas?
 
-Seppo Tomperi



> [...]
> 
> -- 
> Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
> 
> The real ebay dictionary, page 1
> "Used only once"    - "Some unspecified defect prevented a second use"
> "In good condition" - "Can be repaird by experienced expert"
> "As is" - "You wouldnt want it even if you were payed for it, if you knew ..."



More information about the ffmpeg-devel mailing list