[FFmpeg-devel] [RFC] support encrypted asf

Måns Rullgård mans
Mon Oct 8 09:30:35 CEST 2007


Reimar D?ffinger <Reimar.Doeffinger at stud.uni-karlsruhe.de> writes:

> Hello,
> since it was such an interesting problem I could not stop myself from
> optimizing the inverse() function.
> It probably is easier to understand without algebra knowledge, but a
> bit more obfuscated for those with.
> On x86 the speed difference is only about 10%, multiplication is just
> too fast on those ;-).
> Here it is (tell me if you'd prefer an updated complete patch instead):
>
> static uint32_t inverse(uint32_t v) {
>     uint32_t factor = 1;
>     uint32_t product = v;
>     uint32_t mask = 2;
>     do {
>       v <<= 1;
>       factor |= product & mask;
>       product += v & -(product & mask);
>       // should be mask <<= 1; but then gcc misses the
>       // optimization opportunity to use the Z-flag for the while test
>       mask += mask;
>     } while (mask);
>     return factor;
> }

For the record, gcc compiling for ARM turns mask += mask into <<= 1.

Curiously enough, if I replace that line with mask <<= 1, gcc
generates much worse code for ARM (and x86, as you've seen).  Instead
of simply branching depending on the value of mask, it uses another
register to count iterations of the loop, breaking when it reaches 31.
This uses 10 instructions (11 on x86) instead of the 8 (9) used by the
first version.

On the strict portability side, it should be noted that your code only
works on hardware using two's complement for negative numbers.  In
other words, it won't work on some Cray machines.

-- 
M?ns Rullg?rd
mans at mansr.com




More information about the ffmpeg-devel mailing list