[FFmpeg-devel] [RFC] VC-1 inverse transforms and vectorization
Thu Jan 10 21:40:28 CET 2008
code would be better as a reference, but still I prefer to request some
comments in order to avoid wasting too much time investigating too
deeply dead ends. Everything I'm going to write may or may not be
possible. It may even be impossible to determine the veracity of some
propositions. But this is a RFC after all.
My only prior experience is vectorizing H.264's iDCT using SSE2, which
is not that much of a challenge.
Those transforms use 1d filters, working on either 4 or 8 values, either
horizontally or vertically. The horizontal and vertical versions differ
on the bias and shift values used, as well as potential saturated packing.
A macro per dct is possible, with transposition putting the data in the
proper layout for the 1d dct. It may be possible to make the 2 macros
functions instead. Only 4x4 dct could keep data in registers throughout
the function, I think.
For both macros, the filtering/convolutions required cannot be
implemented with only shift and additions/subtractions. The consequence
will be that coefficients will mostly need to be in memory rather than
MMX registers. Almost obvious, not worrisome.
Of course, the 1d dct on 8 samples is the more complex:
- a 8x8 transposition is needed
- some temporary storage is needed to free some registers for butterfly
The 8x8 transpose function can be implemented with 4 4x4 transposes and
a temporary buffer for data swap. This transpose is I think used for
H.264, and probably implemented in transpose4x4.
The temporary buffer could be a 4x4 block, used for both the 8x8
transposition and the temporary buffer during DCT8. But maybe it is
possible to pipeline transpose/1d filter/transpose.
I have absolutely no schedule in mind about this implementation, not
even if I'll ever complete it.
More information about the ffmpeg-devel