[FFmpeg-devel] [PATCH 1/4] lavc/aarch64: new optimization for 8-bit hevc_epel_uni_v
Martin Storsjö
martin at martin.st
Sun Sep 17 00:46:07 EEST 2023
On Thu, 14 Sep 2023, Logan.Lyu wrote:
> Hi Martin,
>
> You can try the attached patchset. If that doesn't work, My code branch
> address is https://github.com/myais2023/FFmpeg/tree/hevc-aarch64
Thanks for the patches. Functionally, they seem to work, and the issues i
saw in the code are relatively minor. Unfortunately, some of the issues
are issues that we've been through in many earlier patches, so I would
hope that you would pay attention to them in the future before posting
more patches.
In patch 1, you've got a bunch of sxtw instructions for src/dst stride
parameters that have the type ptrdiff_t - that shouldn't be necessary?
In patch 2, you're moving the macros calc_epelh, calc_epelh2,
load_epel_filterh - can you split out the move into a separate commit?
(This isn't strictly necessary but would make things even clearer.)
In patch 2, you're storing below the stack, then decrementing it
afterwards - e.g. like this:
> + stp x0, x30, [sp, #-16]
> + stp x1, x2, [sp, #-32]
> + stp x3, x4, [sp, #-48]
> + stp x5, x6, [sp, #-64]!
Please change that so that you're first predecrementing the whole area,
then storing the other elements above that stack pointer, e.g. like this:
stp x0, x30, [sp, #-64]!
stp x1, x2, [sp, #16]
stp x3, x4, [sp, #32]
etc.
The same issue also appears in variouos places within functions like this:
> + stp x0, x1, [sp, #-16]
> + stp x4, x6, [sp, #-32]
> + stp xzr, x30, [sp, #-48]!
Please fix all of these cases - you can search through your patches for
anything related to storing on the stack. Also, storing xzr here seems
superfluous - if you've got an odd number of registers to store, just make
one instruction str instead of stp (but keep the stack aligned).
Then in patch 4, you've got yet another pattern for doing these stores,
where you have superfluous consecutive stack decrements like this:
> + stp x6, x30, [sp, #-16]!
> + mov x7, #16
> + stp x0, x1, [sp, #-16]!
> + stp x2, x3, [sp, #-16]!
> + stp x4, x5, [sp, #-16]!
Please just do one stack decrement covering all the stack space you need.
I believe these issues have been raised in earlier reviews as well.
// Martin
More information about the ffmpeg-devel
mailing list