[FFmpeg-devel] [PATCH 2/2] swscale/aarch64: Add rgb24 to yuv implementation
Zhao Zhili
quinkblack at foxmail.com
Mon Jun 3 16:11:15 EEST 2024
> On Jun 3, 2024, at 16:07, Martin Storsjö <martin at martin.st> wrote:
>
> On Mon, 3 Jun 2024, Zhao Zhili wrote:
>
>> diff --git a/libswscale/aarch64/input.S b/libswscale/aarch64/input.S
>> new file mode 100644
>> index 0000000000..0a46475723
>> --- /dev/null
>> +++ b/libswscale/aarch64/input.S
>> @@ -0,0 +1,229 @@
>> +/*
>> + * Copyright (c) 2024 Zhao Zhili <quinkblack at foxmail.com>
>> + *
>> + * This file is part of FFmpeg.
>> + *
>> + * FFmpeg is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU Lesser General Public
>> + * License as published by the Free Software Foundation; either
>> + * version 2.1 of the License, or (at your option) any later version.
>> + *
>> + * FFmpeg is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
>> + * Lesser General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU Lesser General Public
>> + * License along with FFmpeg; if not, write to the Free Software
>> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
>> + */
>> +
>> +#include "libavutil/aarch64/asm.S"
>> +
>> +.macro rgb24_to_yuv_load_rgb, src
>> + ld3.16b { v16, v17, v18 }, [\src]
>> + ushll.8h v19, v16, #0 // v19: r
>> + ushll.8h v20, v17, #0 // v20: g
>> + ushll.8h v21, v18, #0 // v21: b
>> + ushll2.8h v22, v16, #0 // v22: r
>> + ushll2.8h v23, v17, #0 // v23: g
>> + ushll2.8h v24, v18, #0 // v24: b
>
> Don't use this nonstandard, Apple specific aarch64 syntax. This was used by Apple tools at the start, when the proper standardized aarch64 syntax wasn't quite settled yet, and it is still accepted. (And apparently this is still the preferred form to disassemble things in, for apple platforms.)
>
> With this syntax, the assembly is rejected by GNU binutils and MSVC.
>
>> +function ff_rgb24ToY_neon, export=1
>> + cmp w4, #0 // check width > 0
>> + b.le 4f
>> +
>> + ldp w10, w11, [x5], #8 // w10: ry, w11: gy
>> + dup v0.8H, w10
>> + dup v1.8H, w11
>> + ldr w12, [x5] // w12: by
>> + dup v2.8H, w12
>
> Don't use uppercase .8H for field layout configurations, we prefer to stick to all lowercase here - see 184103b3105f02f1189fa0047af4269e027dfbd6. The same goes for a number of places in this patch.
>
>> + add w9, w9, #1 // i++
>> + add x3, x3, #6 // src += 6
>> +3:
>> + cmp w9, w5
>> + b.lt 2b
>> +4:
>
> Incorrect indentation for the cmp/b.lt instructions here.
>
>
> I have set up a bunch of github actions for testing aarch64 assembly - see https://github.com/mstorsjo/ffmpeg/commits/gha-aarch64. If you have a github account, grab a copy of this branch into your repo, add your own commits on top, and push to your fork (and if necessary, activate running the actions), then you should get a wide testing of your patches.
>
> See https://github.com/mstorsjo/FFmpeg/actions/runs/9346228714 for one example run of these actions with your patches.
Wow, it’s very helpful. This is the action result of the updated patch:
https://github.com/quink-black/FFmpeg/actions/runs/9350348848
https://ffmpeg.org/pipermail/ffmpeg-devel/2024-June/328786.html
The test still failed on x86, but success on all arm64 platform and
longarch. I have tried to call rgb24ToY_c and ff_rgb24ToY_avx
directly and compare the results, they don't match. I’m confused.
>
> // Martin
>
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel at ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request at ffmpeg.org with subject "unsubscribe".
More information about the ffmpeg-devel
mailing list