[Ffmpeg-devel] Using Intel's fDCT
Sat Nov 19 22:00:53 CET 2005
g. <the_ether at lycos.co.uk> writes:
> I've been trying to use Intel's fDCT from their IPP libs to see if it is
> faster than the SSE2 one in ffmpeg. I tried simply replacing the line from
> RENAMEl(ff_fdct) (block); //cant be anything else ...
> with Intel's function
> ippiDCT8x8Fwd_16s_C1I( block );
> All runs okay (and noticeably faster) but the resulting MPEG2 video
> produced is a mess.
> The Intel routine simply does a fDCT on a 8x8 block and writes the
> results in the same place as the original data. There is no
> initialisation required.
> What is going on in ff_fdct_sse2() other than a pure fDCT transform,
> and have you any tips of how I could integrate Intel's routine?
IIRC, the output from the MMX/SSE DCT functions is permuted because of
some design quirk of the CPU. There's a flag somewhere indicating
this. Make sure it is set correctly.
mru at inprovide.com
More information about the ffmpeg-devel