[FFmpeg-soc] [PATCH] 4/4 Split sws_getContext_altivec_alloc_context from sws_getContext

Luca Barbato lu_zero at gentoo.org
Sun Jun 15 12:26:32 CEST 2008


Michael Niedermayer wrote:
> On Sun, Jun 15, 2008 at 01:55:57AM +0200, Luca Barbato wrote:
>> Keiji Costantini wrote:
>>> Luca Barbato ha scritto:
>>>> Michael Niedermayer wrote:
>>>>> On Wed, Jun 11, 2008 at 02:36:08AM +0200, Keiji Costantini wrote:
>>>>>> -                p[j] = c->vLumFilter[i];
>>>>>> -                p[j] = c->vChrFilter[i];
>>>>> Whichever way this is done and whereever, it should be done at the
>>>>> same place where lum/chrMmxFilter is initialized.
>>>>> And of course both altivec & mmx should use the same array for the same data.
>>>>>
>>>>> But looking again it seems these arrays are practically unused and the
>>>>> code using it looks like it shouldnt use them in the first place.
>>>>>
>>>>> So, correct cleanup seems to be to remove vCCoeffsBank and vYCoeffsBank.
>>>> The *Banks are just a copy from aligned memory to another, so just using 
>>>> the vLumFilter and vChrFilter directly won't cause problems.
>>>>
>>>> lu
>>>>
>>> extract from code:
>>>
>>>      for (i=0;i<c->vLumFilterSize*c->dstH;i++) {
>>>          int j;
>>>          short *p = (short *)&c->vYCoeffsBank[i];
>>>          for (j=0;j<8;j++)
>>>              p[j] = c->vLumFilter[i];
>>>      }
>>>
>>> I see *Banks are *filters copied 8 times each...
>> I'm an idiot =P
> 
> At least i now know why i didnt understand your earlier reply :) 

Happens when I try to read code and I'm just awake or about to sleep ^^;

>> Well they could go away adding 2 vec_splats, but I'm pretty sure it 
>> would slow things down. I'd consider this later -_-
> 
> I wouldnt be so sure that the splats are slower than the cache trashing the
> array causes.
> Also if done properly (like in the mmx code) then there are rather few splats.

Now I'm just awake so I'll write something stupid again but:

if I just use the original vector I'd have:

(dumb way)
- one full unaligned load (2 loads, 1 table lookup, 1 permute)
- a splat

or
(smarter way)
- one simple load
- address mask to get the which is the element I care about
- a splat

right now I have a simple load and what's equivalent to the address mask 
more or less (one &15 more), so you are right I should be able to kill 
those vector and don't lose much.

lu - am I insane?

-- 

Luca Barbato
Gentoo Council Member
Gentoo/linux Gentoo/PPC
http://dev.gentoo.org/~lu_zero




More information about the FFmpeg-soc mailing list