[FFmpeg-devel] [PATCH] lavu/x86/lls: add fma3 optimizations for update_lls
jamrial at gmail.com
Thu Jan 14 17:16:52 CET 2016
On 1/14/2016 11:12 AM, Ganesh Ajjanagadde wrote:
> On Thu, Jan 14, 2016 at 5:02 AM, Henrik Gramner <henrik at gramner.com> wrote:
>> Use the x86inc syntax for FMA instructions (basically FMA4 syntax that
>> gets assembled as FMA3) since normal FMA3 opcodes are horrible to
>> read, nobody ever remembers the ordering of operands.
> 1. It is very easy to remember: take fmadd231pd x, y, z for instance.
> This means 2*3 + 1, so x = y*z+x. How the macro is more readable is
> beyond me; especially with some side cases that are undocumented, see
fmaddps dst, src1, src2, src3 is always going to be easier to read for anyone
without having to think about what number belongs to what operation and what
operand. And it will output either FMA4 or FMA3 depending on the value passed
> 2. If anything, the macro is harder, since it is not Intel supported,
Of course it wont be there, it's not defined by them. Non-destructive four
operand fma is defined by AMD.
> I can't look it up at
Neither are any of the dozens other compat macros in x86utils. And many of
them are also undocumented within x86utils. This point is absurd.
> 3. The macro does not seem to take care of the mov's (if any), still
> requiring explicit thought on the part of the programmer.
Yes, and? It's not an emulation macro like the uppercase ones that become
several instructions. It translate a single FMA4-like instruction into
either an FMA4 or FMA3 one.
fmaddps xmm0, xmm0, xmm1, xmm2
vfmaddps xmm0, xmm0, xmm1, xmm2 if FMA4
vfmadd132ps xmm0, xmm2, xmm1 if FMA3
If you try to use it with four different operands, it will work with FMA4
but not FMA3, since as i said it's not trying to emulate anything.
> 4. The macro lacks documentation. In particular, it is not a thorough
> fma4 emulation in the spirit of
> Or put in other words, IMO not good.
No, it's good and what's done in every other asm file precisely for being
more flexible and readable. Especially since it allows one to write both
FMA4 and FMA3 functions without duplicating code.
More information about the ffmpeg-devel