[FFmpeg-devel] [WIP] [PATCH 0/6] sse2/xmm version of 8-bit simple_idct
jdarnley at obe.tv
Mon Jun 5 14:23:39 EEST 2017
To answer the couple of questions that were asked over the weekend.
Rostislav, about the performance. I can see how to force a particular
IDCT implementation for real world decoding (the -idct option) but the
MPEG2 HD sample I've been working with mostly uses the "idct add"
function which doesn't exist for the functions in simple_idct10.asm. So
for a next best thing, these are the results from the dct testing
utility over several runs.
> SIMPLE-C: 9124.8 ± 7.52
> SIMPLE-MMX: 11281.9 ± 32.67
> SIMPLE-SSE2: 15453.3 ± 78.86 (the adaption in the first 3 patches)
> SIMPLE8-SSE2: 15684.2 ± 7.52 (from simple_idct10.asm)
> SIMPLE8-AVX: 15398.4 ± 6.36 (simple_idct10.asm again)
I will try to get some real world results, eventually.
Ronald, yes. I was thinking that the first 3 could be ignored if I can
get the latter patches to work correctly (pass fate that is).
I forgot to mention in my cover letter that although the dct test
passes, fate does not. As I mentioned on IRC, changing them causes
errors elsewhere in fate. I am currently looking into this problem and
I'm sure I will speak to you or others about it.
More information about the ffmpeg-devel