[FFmpeg-devel] [PATCH] Merge some computations in C code for VC-1 inverse transforms
Fri Jan 18 02:08:34 CET 2008
On Thu, Jan 17, 2008 at 08:52:53PM +0100, Christophe GISQUET wrote:
> > t3= 10*(src[ 8] + src);
> > t4= 32*src - t3;
> > t3+= 12*src[ 8];
> > is faster?
> > its 3 add, 2 mul, 1 shift vs. 2 add, 4 mul
> Should have been at first glance, but this seems to cost 10-30
> dezicycles more per loop
> Again, maybe it could explained by checking the generated asm code, but
> another CPU might see another result with the same code...
tests from something like ARM would be interresting, there the reduction
of multiplies should make a difference, x86 will have mmx code anyway ...
also you should set --arch / --cpu for configure correctly. i dont think it
will make any difference (and iam not proposing that you redo any tests)
but its better to do future testing with the compiler being aware of the
> Index: libavcodec/i386/vc1dsp_mmx.c
> --- libavcodec/i386/vc1dsp_mmx.c (r?vision 11527)
> +++ libavcodec/i386/vc1dsp_mmx.c (copie de travail)
> @@ -467,6 +467,95 @@
> DECLARE_FUNCTION(3, 2)
> DECLARE_FUNCTION(3, 3)
> +/* in/out: B=(A+R)/2+B TMP=(A+R)/2-B A=2*A */
> +#define MERGE(A, B, TMP) \
> + "movq "#A", "#TMP" \n\t" \
and that patch does what in this thread about optimizations of the C code?
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
When the tyrant has disposed of foreign enemies by conquest or treaty, and
there is nothing more to fear from them, then he is always stirring up
some war or other, in order that the people may require a leader. -- Plato
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: Digital signature
More information about the ffmpeg-devel