[Ffmpeg-devel] [PATCH] put_mpeg4_qpel16_h_lowpass altivec implementation

Luca Barbato lu_zero
Mon Nov 20 02:11:56 CET 2006


Brian Foley wrote:
> Hi there,
> 
> please find attached a first cut at some Altivec acceleration for the
> mpeg4_qpel functions. To get things started, I've translated
> put_mpeg4_qpel16_h_lowpass from the C version, as it was the most CPU
> intensive function that showed up when playing some 720p Xvid.

Great =)

> 
> It should be a safe enough patch to apply, as I've tested it fairly
> carefully with a large set of random inputs, focussing on things that
> could cause overflow/rounding errors. As far as I can tell, it gives
> exactly the same outputs as the C version in every case.

Perfect

> 
> Other obvious candidates to Altivec-ify are put_mpeg4_qpel16_v_lowpass,
> all the avg_mpeg4 equivalents, and the mpeg4_qpel8 variants. I'll try
> to get around to doing some of those soon if someone doesn't beat me to
> it :)

I like your plan but:

- please attach patches w/out compressing them so is easier comment them
from email.

- create a separate file for everything, a name could be mpeg4_altivec.c
or qpel_altivec.c and make it have a init function like the others.

- try to stay on 79cols

- benchmark if calling the c version instead of duplicating them is faster.

- if you can produce a constant using a combination for at most 4 ops
(like vec_splat_{u,s}{8,16,32}() and vec_{add,sr,sl,...}), check if that
results in better performance (should). [gcc-4 may do that for you when
it is simple like for AVV(16, 16, 16, 16, 16, 16, 16, 16);

- put_mpeg4_qpel16_v_lowpass_altivec ?

that's all I could see, it builds on gcc-4.1.1 on linux/ppc

tomorrow I'll try with some samples

lu

-- 

Luca Barbato

Gentoo/linux Gentoo/PPC
http://dev.gentoo.org/~lu_zero





More information about the ffmpeg-devel mailing list