[FFmpeg-devel] [FFmpeg-devel-irc] IRC log for 2010-02-19

Jason Garrett-Glaser darkshikari
Mon Feb 22 00:06:43 CET 2010


On Sun, Feb 21, 2010 at 6:57 AM, Michael Niedermayer <michaelni at gmx.at> wrote:
> On Sat, Feb 20, 2010 at 12:00:54AM +0000, irc at mansr.com wrote:
>> [00:00:16] <mru> if speed matters you should use asm
>> [00:00:27] <Dark_Shikari> we only have inline asm on x86 atm
>> [00:00:47] <mru> sometimes inline asm is the best solution
>> [00:01:17] <Dark_Shikari> hmm. michael's idea seems to hurt in x264.
>> [00:02:02] <Dark_Shikari> in fact, michael's idea would hurt in ffmpeg if ffmpeg had inline SIMD like x264 did
>> [00:02:05] <Dark_Shikari> for that code
>> [00:02:11] <Dark_Shikari> we use simd for the following
>> [00:02:12] <Dark_Shikari> MIN(((x+28)*2184)>>16,2) = (x>2) + (x>32)
>> [00:02:20] <Dark_Shikari> on two values at once
>> [00:02:28] <Dark_Shikari> left side is simd, right side is C
>
> any volunteers who would send a patch?

Note that since, I've changed that code locally due to some
inspiration from your patch ;)

Here's the current asm, which calculates (x>2)+(x>32) for two values
at once.  I don't think it's much better than C anymore; the main
advantage before was that it saved 2 abs() calls, but your idea
eliminates the need for that.

    static const uint64_t pb_2    = 0x0202020202020202ULL;
    static const uint64_t pb_32   = 0x2020202020202020ULL;
    int amvd;
    asm(
        "movd         %1, %%mm0 \n"
        "movd         %2, %%mm1 \n"
        "paddb     %%mm1, %%mm0 \n"
        "pxor      %%mm2, %%mm2 \n"
        "movq  %%mm0, %%mm1 \n"
        "pcmpgtb %3, %%mm0 \n"
        "pcmpgtb  %4, %%mm1 \n"
        "psubb      %%mm0, %%mm2 \n"
        "psubb      %%mm1, %%mm2 \n"
        "movd      %%mm2, %0    \n"
        :"=r"(amvd)
        :"m"(M16( mvdleft )),"m"(M16( mvdtop )),
         "m"(pb_2),"m"(pb_32)
    );

Note how the input is bytes (!).  Here's the trick: MVD values only
have to be 0 to 33; any larger value tells us nothing.  Maybe there's
some scaling that goes on with MBAFF, but even then it's only 0-65.
As a result, you can store MVD values as uint8_ts, saving enormous
amounts of memory and cache and making fill_rectangle faster.
Obviously this requires a little bit of extra clipping, but my
benchmarks in x264 show that it's worth it there.

This change is probably vastly more useful than the above asm (which I
make available under LGPL in case anyone cares, but it's probably
near-useless once the other changes are done).

Dark Shikari



More information about the ffmpeg-devel mailing list