[FFmpeg-devel] [RFC] clobbers for XMM registers

Måns Rullgård mans
Thu Sep 30 19:42:48 CEST 2010


Ramiro Polla <ramiro.polla at gmail.com> writes:

> 2010/9/30 M?ns Rullg?rd <mans at mansr.com>:
>> Alexander Strange <astrange at ithinksw.com> writes:
>>> On Thursday, September 30, 2010, M?ns Rullg?rd <mans at mansr.com> wrote:
>>>> "Ronald S. Bultje" <rsbultje at gmail.com> writes:
>>>>> 2010/9/28 M?ns Rullg?rd <mans at mansr.com>:
>>>>>> Michael Niedermayer <michaelni at gmx.at> writes:
>>>>>>> On Tue, Sep 28, 2010 at 09:36:40AM -0400, Ronald S. Bultje wrote:
>>>>>>>> On Tue, Sep 28, 2010 at 8:34 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
>>>>>>>> > you want to execute code from vp3dsp_sse2.c on a pre SSE cpu?
>>>>>>>>
>>>>>>>> All _sse2 files are templates files that are included in dsputil_mmx.c
>>>>>>>> or similar.
>>>>>>>
>>>>>>> we could add the flags to dsputil_mmx then
>>>>>>
>>>>>> That would allow the compiler to use SSE instructions in functions
>>>>>> that should be MMX only.
>>>>>
>>>>> I'm gonna start kicking this subject until it's solved. Come on guys,
>>>>> keep this moving. Why don't we make it (the clobbering) a macro and
>>>>> only enable this on x86-64. Don't forget all xmm registers are
>>>>> caller-save on x86-32 and x86-64 has no issues with marking clobbers
>>>>
>>>> The issue is not fundamentally about caller vs callee saved
>>>> registers. ?It is about telling the compiler which registers are
>>>> clobbered, so that it can save and restore them if necessary.
>>>>
>>>> The missing clobber lists caused the FFT to fail with suncc, despite
>>>> all the used registers being caller-saved. ?Apparently the compiler
>>>> was using them for something outside the asm block.
>>>>
>>>>> (and even if it did, -msse is fine, there is no single x86-64 CPU that
>>>>> does not support SSE). We could consider making it as simple as :::
>>>>> CLOBBER_IF_X86_64("%xmm6", "%xmm7",) "%eax" which evaluates to the
>>>>> string in it (including commas) on x86-64 and nothing on x86-32 (and
>>>>> omit the comma if that's the only thing in the clobberlist).
>>>>
>>>> We obviously need a conditional of some kind, but it should be tested
>>>> in configure and applied whenever the compiler recognises xmm registers.
>>>> It is, however, not quite as straight forward as you make it out.
>>>> Stray commas are not allowed, nor is an empty list.
>>>>
>>>> One possible solution is to have the macro always include "cc". ?Most
>>>> of the asm blocks do clobber the condition flags, and for any that do
>>>> not, it is unlikely to make any difference. ?It also seems that
>>>> including the stack pointer in the clobber list is ignored, although
>>>> relying on this seems dubious at best.
>>>
>>> asm blocks always clobber cc whether or not you put it in the list, so
>>> the "cc" clobber is a no-op.
>>
>> In that case always adding it is certainly harmless, and allows a
>> single macro to be used.
>
> What about
> #if HAVE_XMM_CLOBBERS
> #    define XMM_CLOBBERS(a, ...) __VA_ARGS__
> #else
> #    define XMM_CLOBBERS(a, ...) a
> #endif
>
> to be used as in lavc/x86/fft_sse.c:
>         :"+r"(j), "+r"(k)
>         :"r"(output+n4), "r"(output+n4*3),
>          "m"(*m1m1m1m1)
>         XMM_CLOBBERS(, : "%xmm0", "%xmm1", "%xmm7")
>     );

That falls over if any other clobbers are needed.  Here's my idea:

#if HAVE_XMM_CLOBBERS
#   define XMM_CLOBBERS(...) "cc", __VA_ARGS__
#else
#   define XMM_CLOBBERS(...) "cc"
#endif
[...]
__asm__ ("..." ::: XMM_CLOBBERS("%xmm0", "%xmm1"), "other");

This macro can be called anywhere in a clobber list, and it requires
no obscure syntax at call site.  A comment about "cc" being a no-op
might be in order, or course.

-- 
M?ns Rullg?rd
mans at mansr.com



More information about the ffmpeg-devel mailing list