[Ffmpeg-devel] Re: [PATCH] Machine endian bytestream functions

Ramiro Polla ramiro
Sun Mar 11 04:55:18 CET 2007


Hello,

Michael Niedermayer escreveu:
> Hi
>
> On Sat, Mar 10, 2007 at 05:15:44PM -0300, Ramiro Polla wrote:
>   
>> Hello,
>>
>> Reimar D?ffinger escreveu:
>>     
>>> Hello,
>>> On Sat, Mar 10, 2007 at 11:06:41PM -0300, ramiro at lisha.ufsc.br wrote:
>>>  
>>>       
>>>> Attached patch makes the AV_{R,W}{L,B}xx macros have a machine endian for
>>>> the simple 16 and 32 bit types. Those macros are then #ifdef'd for the
>>>> correct endianess. 24 bit remains the same, as it would be more complex.
>>>>    
>>>>         
>>> They completely ignore alignment issues...
>>>
>>>  
>>>       
>> You're right.
>>
>> Attached patch makes use of machine endianess where unaligned data 
>> accesses are possible, and faster than what gcc is currently doing.
>>
>> I have only tested this on a p4, but the following program should detect 
>> this on any architecture. Compile bytes.c and main.c with the same 
>> options FFmpeg gives to libavcodec files, link them, and test the speed 
>> both for patched and unpatched FFmpeg. bytes.c should be changed to 'be' 
>> on big-endian architectures.
>>
>> Regression tests pass.
>>
>> Ramiro Polla
>>     
>
>   
>> Index: configure
>> ===================================================================
>> --- configure	(revis?o 8316)
>> +++ configure	(c?pia de trabalho)
>> @@ -602,6 +602,7 @@
>>      dlopen
>>      fast_64bit
>>      fast_cmov
>> +    fast_unaligned
>>      freetype2
>>      imlib2
>>      inet_aton
>> @@ -737,6 +738,7 @@
>>  mmx="default"
>>  cmov="no"
>>  fast_cmov="no"
>> +fast_unaligned="no"
>>  armv5te="default"
>>  armv6="default"
>>  iwmmxt="default"
>> @@ -951,9 +953,11 @@
>>  case "$arch" in
>>    i386|i486|i586|i686|i86pc|BePC)
>>      arch="x86_32"
>> +    enable fast_unaligned
>>    ;;
>>    x86_64|amd64)
>>      arch="x86_32"
>> +    enable fast_unaligned
>>      canon_arch="`$cc -dumpmachine | sed -e 's,\([^-]*\)-.*,\1,'`"
>>      if [ x"$canon_arch" = x"x86_64" -o x"$canon_arch" = x"amd64" ]; then
>>        if [ -z "`echo $CFLAGS | grep -- -m32`"  ]; then
>>     
>
> maybe configure should rather have a generic test which checks which version
> is faster? (it would be much easier to maintain instead of keeping track what
> is faster for which cpu ...)
>
>
>   

Sorry, but I failed to find a simple way for this in configure. Three 
issues came up:
1. Unaligned data accesses will crash on some processors, and I don't 
think it's a good idea to have configure throw exceptions. (e.g. it 
would open the "Send report" dialog on Windows).
2. Checking the speed for an x ammount of time would slow down configure.
3. Different cpu loads during the configure script would cause 
unreliable results.

So, for the moment, I'm sending the patch with the same check. (It's 
just like the fast_64bit or fast_cmov check).

>> Index: libavutil/intreadwrite.h
>> ===================================================================
>> --- libavutil/intreadwrite.h	(revis?o 8316)
>> +++ libavutil/intreadwrite.h	(c?pia de trabalho)
>> @@ -47,8 +47,8 @@
>>  #define AV_RB8(x)  (((uint8_t*)(x))[0])
>>  #define AV_WB8(p, d)  { ((uint8_t*)(p))[0] = (d); }
>>  
>> -#define AV_RB16(x) ((((uint8_t*)(x))[0] << 8) | ((uint8_t*)(x))[1])
>> -#define AV_WB16(p, d) { \
>> +#define AV_RBE16(x) ((((uint8_t*)(x))[0] << 8) | ((uint8_t*)(x))[1])
>> +#define AV_WBE16(p, d) { \
>>     
>
> cosmetic, rejected
>
>   

Cosmetics now in separate patch. (altough I've always had problems with 
sending patches not made with "svn diff". I hope these ones are fine)

> [...]
>   
>>  
>> +#ifdef HAVE_FAST_UNALIGNED
>> +# ifdef WORDS_BIGENDIAN
>> +#  define AV_RL16(x)    AV_RLE16(x)
>> +#  define AV_RL32(x)    AV_RLE32(x)
>> +#  define AV_RB16(x)    LD16(x)
>> +#  define AV_RB32(x)    LD32(x)
>> +#  define AV_WL16(p, d) AV_WLE16(p, d)
>> +#  define AV_WL32(p, d) AV_WLE32(p, d)
>> +#  define AV_WB16(p, d) ST16(p, d)
>> +#  define AV_WB32(p, d) ST32(p, d)
>> +# else
>> +#  define AV_RL16(x)    LD16(x)
>> +#  define AV_RL32(x)    LD32(x)
>> +#  define AV_RB16(x)    AV_RBE16(x)
>> +#  define AV_RB32(x)    AV_RBE32(x)
>> +#  define AV_WL16(p, d) ST16(p, d)
>> +#  define AV_WL32(p, d) ST32(p, d)
>> +#  define AV_WB16(p, d) AV_WBE16(p, d)
>> +#  define AV_WB32(p, d) AV_WBE32(p, d)
>> +# endif
>> +#else
>> +# define AV_RL16(x)    AV_RLE16(x)
>> +# define AV_RL32(x)    AV_RLE32(x)
>> +# define AV_RB16(x)    AV_RBE16(x)
>> +# define AV_RB32(x)    AV_RBE32(x)
>> +# define AV_WL16(p, d) AV_WLE16(p, d)
>> +# define AV_WL32(p, d) AV_WLE32(p, d)
>> +# define AV_WB16(p, d) AV_WBE16(p, d)
>> +# define AV_WB32(p, d) AV_WBE32(p, d)
>> +#endif
>>     
>
> #ifdef HAVE_FAST_UNALIGNED
> #   ifdef WORDS_BIGENDIAN
>         define AV_RB16(x)    LD16(x)
>         define AV_RB32(x)    LD32(x)
>         define AV_WB16(p, d) ST16(p, d)
>         define AV_WB32(p, d) ST32(p, d)
> #   else
>         define AV_RL16(x)    LD16(x)
>         define AV_RL32(x)    LD32(x)
>         define AV_WL16(p, d) ST16(p, d)
>         define AV_WL32(p, d) ST32(p, d)
> #   endif
> #endif
>
> also it might be worth to check bswap(LD32(x)) style too ...
>
>   
Done as suggested (with bswap_xx).

cosmetics.diff reorders definitions from endianess to bit-depth.
functional.diff makes special cases for fast_unaligned.

Regression tests pass. Final ffmpeg.exe is a few kbs smaller. Altough 
not benchmarked with real-world applications, that simple test I made 
showed from 20% to 40% speedup on the macros alone.

Ramiro Polla
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: cosmetics.diff
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070311/55d705e9/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: functional.diff
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20070311/55d705e9/attachment.asc>



More information about the ffmpeg-devel mailing list