[FFmpeg-devel] [PATCH] MMX VP3 Loop Filter
Sat Oct 11 12:03:07 CEST 2008
On Sat, Oct 11, 2008 at 04:53:24AM -0400, David Conrad wrote:
> On Oct 8, 2008, at 1:59 AM, David Conrad wrote:
>> On Oct 7, 2008, at 5:43 AM, Jason Garrett-Glaser wrote:
>>>> Here's an 8-bit version. However, checking for the C fallback negates
>>>> small speedup on my Penryn compared to the 16-bit version.
>>> Most of the code is still 16-bit. Are you sure this can't be done
>>> x264-style with emulation of extra bits and 8-bit math (reference for
>>> an example of how to do this: common/x86/deblock-a.asm in x264 tree)?
>>> This would eliminate the need for all unpacks, all packs, and all
>>> multiplication, and probably increase speed dramatically. I strongly
>>> suspect that it can be done, as the deblocking formulas seem very
>>> similar to those used in H.264.
>> It seems like you're right; the only difference between DEBLOCK_P0_Q0 and
>> VP3 is a *3 vs. a *4 in H.264.
>> I don't quite fully understand x264's implementation, so it'll take
>> another bit to adapt it.
> And here's an entirely 8-bit implementation. ~3 cycles faster than the last
> patch I posted.
> I'm not sure the best way to avoid the duplication of ff_pb_1/3/7
> constants; there aren't enough registers to pass the address of all of the
> constants I need.
> + "movd "#flim", %%mm5 \n\t" \
> + "punpcklbw %%mm5, %%mm5 \n\t" \
you could pass the thing from mm5 at the end of the bounding_values array,
this also would make filter_limit unneeded, avoid the *0x02020202 and the
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
No human being will ever know the Truth, for even if they happen to say it
by chance, they would not even known they had done so. -- Xenophanes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: Digital signature
More information about the ffmpeg-devel