[FFmpeg-devel] [PATCH] SSE optimization for DCA decoder

Michael Niedermayer michaelni
Mon Sep 1 15:12:18 CEST 2008

On Mon, Sep 01, 2008 at 01:36:16PM +0600, Alexander E. Patrakov wrote:
> Michael Niedermayer wrote:
> > nice, but as you probably already know, my highlevel optimizations
> > broke your patch.
> > 
> > If you want to update it, also look at ff_mpa_synth_filter() which
> > performs the same windowing operation but with a quite different
> > implementation, i do not know which way is more efficient in SIMD,
> > actually i dont know which is better for C either ...
> IMHO, it is still too early to do this, because of missed
> high-level "optimizations" (quoted because no further speed gain on
> the "window" operation seems possible). As I said earlier, the funky
> indexing seems to mean either two transforms at once, or maybe simply a
> longer transform than written. 

As ive said the funky indexing is due to the use of half_imdct instead
of full. There is no magic here at all. There are not 2 transforms,
if you consider the full period imdct a longer transform than the half
period is of course a question of the viewpoint.

> In support of this view, here is the
> rewritten (according to
> http://ccrma.stanford.edu/~jos/sasp/Pseudo_QMF_Cosine_Modulation_Filter.html,
> thanks to Benjamin Larsson for the impotrant keywords!) inverse subband
> transform (for the encoder), that still uses naive form of the DCT:
> Note especially these lines:
> for (k = 0; k < 32; k++)
>   accum[k] = accum[k] - accum[64 + k] - accum[63 - k] + accum[127 - k];

this looks wrong, as if the less significant terms had been dropped, though
of course i could be wrong. If my guess is correct you should see a difference
in the 4th or so digit after the decimal point assuming random
input. And probably there wont be much wrong by pure listening to it
(my guess is based on the orthoginality of the transform, which i suspect but
have not verified and that i didnt miss some tricks in your code)

It of course would be trivial to similarly drop the window tails in the

Also as far as i can tell your encoder is just applying the windowed mdct
as inverse of the windowed imdct of the decoder. It just has some sign
flips removed from the window and done explicitly in the code.


> BTW, is it an absolute requirement that the decoder uses the raw official
> table for prCoeff[]? Maybe, for clarity, it should first derive the
> prototype lowpass filter from it, and then use this filter according to the
> definition of a pseudo-QMF cosine modulation filter? Attached are the plots
> of the original data table and the lowpass filter kernel extracted from it,
> for the case of "perfect-reconstruction FIR". I think you can immediately
> get the meaning of the "lowpass" plot, 

the lowpass plot looks like the absolute values of the window

> but "official data" is simply a
> strange plot with no obvious meaning.

I suspect such funky shape is needed for TDAC to work with longer windows

Michael     GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

I am the wisest man alive, for I know one thing, and that is that I know
nothing. -- Socrates
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20080901/914a5ec0/attachment.pgp>

More information about the ffmpeg-devel mailing list