[Ffmpeg-devel] [PATCH] Snow mmx+sse2 asm optimizations
Thu Mar 16 17:37:19 CET 2006
Robert Edele wrote:
> On Thu, 2006-03-16 at 10:45 +0100, Michael Niedermayer wrote:
>> On Tue, Mar 14, 2006 at 01:44:03PM +0200, Ivan Kalvachev wrote:
>>> 2006/3/14, Robert Edele <yartrebo at earthlink.net>:
>>>> On Mon, 2006-03-13 at 02:52 +0100, Michael Niedermayer wrote:
>>>>> ok, first patch looks mostly ok, iam not particulary happy about the
>>>>> inclusion of snow.h in dsputil.h but i dont really care
>>>>> as dsputil.h was never supposed to be a public header, so whoever
>>>>> came up with that idea can fix the snow.h inclusion (installing snow.h
>>>>> along with avcodec.h is not ok)
>>>> snow.h is included to get access to the DWTELEM #define. Would you have
>>>> any ideas on a better way of doing this?
>>> Maybe right after DCTELEM in dsputil.h ?
>> yes, seems like the simplest solution ...
> Oded, you have my permission to commit it. If you want to fix the
> snow.h/DWTELEM issue, please post back to the ml before committing,
> because Michael wasn't too happy with the last fix. Thanks.
> I'll post the next tranche of the patch once it's committed, which shall
> be switching the obmc arrays to use 8 bits instead of 6 (but keeping 6-
> bit precision) along with a few bugfixes by pengvado to allow this to
> happen. The purpose of the patch is that it simplifies every asm
> implementation that I've seen so far (and the C) by 1 or 2 instructions
> per innermost loop.
if we could get the blocks aligned a bit probably the inner loop could
be reduced even more (currently for altivec is 2x load to address
misalignment and a single op for doing the actual work.
More information about the ffmpeg-devel